A voyage - and invitation -
into NVIDIA’s GPU-accelerated ecosystem
author
Juan Medina, Team Lead in Data
Science & ML
More and more disruptive companies are looking at
data science to leverage
large-scale datasets and
generate precious insights. While data science is
incredibly valuable to businesses, it can also
be an
arduous, costly process.
Building and training models can take months, and
productionizing models can compound that time
exponentially.
This can drastically impact budget, speed to production,
and time to market.
No matter the industry, startups and innovation arms of
big brands want to train machine learning
models
at higher speeds. If they could, it would help them
extract insights quicker, keep costs lower,
and give them an instant advantage in the market.
As data scientists and machine learning engineers at
Loka, we want to be ready
to help our
customers make that happen. The exciting news: We are
seeing that we can accelerate time
to market at lower costs with GPUs.
Our RAPIDS origin story
Back in January, our team leader at Loka introduced us
to
NVIDIA’s new suite of
open-source
libraries that lets you execute end-to-end data science
and analytics pipelines entirely on
their GPUs. NVIDIA called it
RAPIDS and it blew our
minds.
The more we dug in, the more we were impressed with
improvements in the computing times we could achieve
using RAPIDS compared to CPU solutions. (We’re saving
the results until our next post, but we’ll say
that not even CPU-based
Apache Spark came
close.)
Then we thought, wouldn’t it be cool to speed up the
processing time on our own
internal projects?
So we went ahead and applied RAPIDS to our own work.
Specifically, on image augmentation.
Sadly, we found that RAPIDS was more suitable for
structured data processing. But this was a turning
point for us; this is when we started to dream big.
At
Loka, we deeply value
courageous innovation and
constant curiosity.
In that spirit, we pondered how we could augment those
images using a GPU.
If not RAPIDS, then what?
Our curiosity rabbit hole led us to OpenCV—a standard
library for image processing—and we ended up
finding a way to use this library on GPU. (Admittedly,
it’s a bold move to undertake such a task in
the absence of much information about it. Not to worry,
we’ll share our findings with you! :D)
A few Zoom calls later, we decided to start building an
ecosystem focused
on NVIDIA GPU-acceleration for data science and machine
learning.
The idea was to use RAPIDS for
structured data, and
OpenCV for image processing.
The next step was to build a proper environment to be
able to start using these
libraries for our internal projects.
This exploration isn’t just for
us—it’s a catalyst for a larger RAPIDS
community
And this is the journey we want to share with you. We
want to be transparent about our exploration into
this tool and what our findings are. The speeds, the use
cases, the insights, how RAPIDS can make data
science and pulling valuable insights more feasible and
accessible.
On this quest toward more efficient computing, you’ll
find benchmarks using RAPIDS, insights on applying
it,
OpenCV implementations on
GPU, and every other cool thing we are working on in
this space.
We will share our findings (and failures) with you in
near real-time, illustrating to you and everyone else
what our data scientists are capable of doing with
NVIDIA’s GPU-acceleration. What is next for AI, ML and
humanity starts today, with you, with us, with our data
scientists.
Upcoming posts from our journey
through GPU acceleration:
- So how fast is RAPIDS: Benchmarking for NVIDIA
GPU-accelerated functions to augment images
- Using RAPIDS to run an EDA about Sloan Digital
Sky Survey - CPU vs GPU
- Minimizing costs by building a CI/CD pipeline on
demand for GPU-accelerated libraries
We hope you join us on our journey. Stay tuned.