Accelerating Python Data Science at NVIDIA

"Python's data stack is getting a serious GPU turbo boost. In this episode, Ben Zaitlen from NVIDIA joins us to unpack RAPIDS, the open source toolkit that lets pandas, scikit-learn, Spark, Polars, and even NetworkX execute on GPUs. We trace the project's origin and why NVIDIA built it in the open, then dig into the pieces that matter in practice: cuDF for DataFrames, cuML for ML, cuGraph for graphs, cuXfilter for dashboards, and friends like cuSpatial and cuSignal."

"We talk real speedups, how the pandas accelerator works without a rewrite, and what becomes possible when jobs that used to take hours finish in minutes. You'll hear strategies for datasets bigger than GPU memory, scaling out with Dask or Ray, Spark acceleration, and the growing role of vector search with cuVS for AI workloads. If you know the CPU tools, this is your on-ramp to the same APIs at GPU speed."

RAPIDS provides GPU-accelerated implementations of familiar Python data libraries, enabling pandas, scikit-learn, NetworkX, Spark, and Polars to run on GPUs. Core components include cuDF for DataFrames, cuML for machine learning, cuGraph for graph analytics, cuXfilter for interactive dashboards, and specialized libraries like cuSpatial, cuSignal, and cuVS for vector search. RAPIDS supports zero-code-change acceleration through a pandas accelerator that intercepts imports and falls back to CPU when needed. Strategies include out-of-core processing for datasets larger than GPU memory, scaling via Dask or Ray, and Spark integration to deliver orders-of-magnitude speedups and faster iteration on large workloads.

#rapids #gpu-acceleration #pandas #daskray-scaling #vector-search-cuvs

Read at Talkpython

Unable to calculate read time

Collection

[

...

]

Accelerating Python Data Science at NVIDIAAccelerating Python Data Science at NVIDIA Briefly

Accelerating Python Data Science at NVIDIA
Accelerating Python Data Science at NVIDIA
Briefly