QCon SF 2024 - Scale Out Batch GPU Inference with Ray
Briefly

"The demand for batch inference is getting higher and higher. This is mainly because we now have multi-modality data sources. You have cameras, mic sensors, and PDF files. And then, by processing these files, you will get different kinds of raw data in different formats, which are either unstructured or structured." - Cody Yu
"By optimizing batch sizes and employing chunk-based batching, the system has been fine-tuned for maximum efficiency. This approach not only improves throughput but also strategically manages computational resources across heterogeneous systems."
"Ray Data offers scalable data processing solutions that maximize GPU utilization and minimize data movement costs through optimized task scheduling and streaming execution."
"A case study on generating embeddings from PDF files... where the process costs less than $1 for processing with ~20 GPUs."
Read at InfoQ
[
|
]