Juan Medina, Team Lead in Data Science & ML
More and more disruptive companies are looking at data science
to leverage large-scale datasets and
generate precious insights. While data science is incredibly valuable to businesses, it can also
be an arduous
, costly process.
Building and training models can take months, and productionizing models can compound that time exponentially.
This can drastically impact budget, speed to production, and time to market.
No matter the industry, startups and innovation arms of big brands want to train machine learning models
at higher speeds. If they could, it would help them extract insights quicker, keep costs lower,
and give them an instant advantage in the market.
As data scientists and machine learning engineers at Loka
, we want to be ready to help our
customers make that happen. The exciting news: We are seeing that we can accelerate time
to market at lower costs with GPUs.
Our RAPIDS origin story
Back in January, our team leader at Loka introduced us to NVIDIA’s new suite
libraries that lets you execute end-to-end data science and analytics pipelines entirely on
their GPUs. NVIDIA called it RAPIDS
and it blew our minds.
The more we dug in, the more we were impressed with improvements in the computing times we could achieve
using RAPIDS compared to CPU solutions. (We’re saving the results until our next post, but we’ll say
that not even CPU-based Apache Spark
Then we thought, wouldn’t it be cool to speed up the processing time on our own internal projects
So we went ahead and applied RAPIDS to our own work. Specifically, on image augmentation.
Sadly, we found that RAPIDS was more suitable for structured data processing. But this was a turning
point for us; this is when we started to dream big.
, we deeply value courageous innovation and constant curiosity
In that spirit, we pondered how we could augment those images using a GPU.
If not RAPIDS, then what?
Our curiosity rabbit hole led us to OpenCV—a standard library for image processing—and we ended up
finding a way to use this library on GPU. (Admittedly, it’s a bold move to undertake such a task in
the absence of much information about it. Not to worry, we’ll share our findings with you! :D)
A few Zoom calls later, we decided to start building an ecosystem focused
on NVIDIA GPU-acceleration for data science and machine learning.
The idea was to use RAPIDS for structured data
, and OpenCV for image processing.
The next step was to build a proper environment to be able to start using these
libraries for our internal projects.
This exploration isn’t just for us—it’s a catalyst for a larger RAPIDS community
And this is the journey we want to share with you. We want to be transparent about our exploration into
this tool and what our findings are. The speeds, the use cases, the insights, how RAPIDS can make data
science and pulling valuable insights more feasible and accessible.
On this quest toward more efficient computing, you’ll find benchmarks using RAPIDS, insights on applying
implementations on GPU, and every other cool thing we are working on in this space.
We will share our findings (and failures) with you in near real-time, illustrating to you and everyone else
what our data scientists are capable of doing with NVIDIA’s GPU-acceleration. What is next for AI, ML and
humanity starts today, with you, with us, with our data scientists.
Upcoming posts from our journey through GPU acceleration:
- So how fast is RAPIDS: Benchmarking for NVIDIA GPU-accelerated functions to augment images
- Using RAPIDS to run an EDA about Sloan Digital Sky Survey - CPU vs GPU
- Minimizing costs by building a CI/CD pipeline on demand for GPU-accelerated libraries
We hope you join us on our journey. Stay tuned.