RAPIDS in one line: RAPIDS executes end-to-end data pipelines entirely on GPUs.
RAPIDS is a set of libraries and APIs that runs on NVIDIA CUDA® GPUs. It offers GPU-enabled substitutes for popular Python libraries, such as cuDF for Pandas, cuML for scikit-learn, and cuGraph for NetworkX. It adheres to the standard Python APIs and provides familiar methods from those packages.
RAPIDS accelerates the entire data pipeline by aggregating the steps from ETL to visualization, and from model training to inference on GPUs. It’s fast because it relies not only on GPUs, but also on Numba and Apache Arrow.
RAPIDS contributes to frequently used Python packages (Dask, Dash, XGBoost, scikit-learn). It also supports cybersecurity, signal processing, and geospatial analytics use cases with CLX, cuSignal, and cuSpatial APIs.
Tools to support GPU-enabled machine learning training have been around for a while, but none of them enabled GPU-use over an entire machine learning pipeline. Instead, they focused on the compute-heavy training and inference parts of a pipeline.
This created a performance problem, since copying data and operations between CPU-based (say, visualization and feature engineering) and GPU-based (say, model training) steps causes significant overhead due to serialization and deserialization compute costs.
RAPIDS handles the ML pipeline as a whole on GPUs, so anyone can benefit from its speed without having to learn the details of CUDA programming or any other new tools.
RAPIDS also supports multi-node and multi-GPU deployments so you can scale up and out on much larger dataset sizes, which drastically reduces run-time for each data pipeline iteration. This yields higher model accuracy with more frequent deployments.