We know that training powerful machine learning models takes
a lot of time and massive computing power—and these things
inherently cost a lot of money and take a
toll on our planet.
For computer science students interested in learning how to
build off these models and progressing innovation in these spaces,
the time and capital required are demoralizing roadblocks.
If companies like Facebook and Google publish their pretrained models,
the path toward developing their own deep learning models becomes pretty clear.
Daniel Larremore, an assistant professor in the Department of Computer Science
at the University of Colorado Boulder, says his students are really excited
about training machine learning models like the ones they read about in papers,
and want to train their own. But that requires a lot of time on massive computers
or cloud computing platforms.
“What I find is that a lot of students run out of credits before their models
actually finish training,” Larremore said. “One thing that we’re coming into
contact more and more in my classes are pretrained models.”
Using a pretrained model means someone doesn’t have to train it from scratch—which
accounts for a bulk of the computing power and data. Instead, for instance,
Larremore’s students can build off of these models, saving them time and resources
that they might not have. Larremore cites
NVIDIA
as an example of a tech companythat shares their manuscripts, papers, code,
and trained models online.
“So you and I can download some image processing neural net and just use it without
ever having to train it, which is kind of cool,” Larremore says. “The prepacking
of trained models means the cost of training only has to happen once.”
Open-sourcing trained models benefits student researchers that don’t have the time
and power supply to begin from square one. These are
the next generation of machine learning
experts and data scientists. Why not give them the tools they need to join
a tech company with an intimate familiarity and insight into deep learning models.
As Dipam Vasani, a self-taught deep learning practitioner,
wrote, it’s not always
necessary to train a model on the basics—like learning how to identify a straight
or slanted line. It’s the more intricate learnings, ones specific to someone’s project
, that you can build off of that pretrained data. Allowing people to develop from existing
work expedites innovation.
Katharina Kann, an assistant professor of Computer Science at University of Colorado Boulder,
described a hypothetical in which someone might be exploring whether certain low-level
information can be integrated into a deep learning model. Kann cites things like about
part-of-speech, sentence structure or morphology as examples of these low-level data.
Using a smaller model as a proxy might not work, because the results don’t speak to what
would happen if this information was integrated into a larger model. “So basically,
without models being available, students might not be able to investigate the original
research question in a meaningful way,” Kann said, referring to open-source training
models of a larger scale. And that’s just one example of the advantages of transparency
and access to this
data.
Loka, Inc spoke with Kann in more detail on why saving time, money, and electricity
are just a few reasons why companies with teeming resources should post their trained
machine learning models online. Kann described how this access benefits researchers
not affiliated with the big tech companies while pleasing the researchers within
these companies and potentially leveling up their brand integrity. Below is a
condensed version of our conversation.
Loka: Why should tech giants such as Facebook, Google, and Amazon post their trained
machine learning models online?
Kann: The publication of models makes it possible to independently verify claims made about the models by companies
without having to invest a lot of resources to reproduce the models first.
The publication of trained machine learning models makes research easier—in some cases even just possible!—for researchers working with limited computing resources.
For the companies, there are additional benefits:
Free publicity, which makes recruitment easier and in general increases the company's reputation.
Increased happiness of their (researcher) employees, since researchers generally like to publish, even when they decide to work for a company.
To some extent, it increases the trustworthiness of companies, since their results are reproducible.
The last three points are especially important whenever other scandals make companies unattractive to potential applicants, e.g., to graduate students; such as the recent discussions around the
firing of researcher Timnit Gebru from Google.
"For the companies there's
additional benefits: Free
publicity... increased happiness
of their (research)
employees... trustworthiness."
Now there could be reasons for companies to not publish their models. For instance:
Privacy concerns. It has been shown that it's possible to reconstruct the training data of machine learning models to some extent as soon as one has access to the models. This could lead to serious problems for companies.
Companies could keep a monopoly on their models if this gives them advantages, e.g., for their products, or in terms of the research they can publish, but nobody else can afford.
However, I don't think these are very good arguments.
Loka: How does this open-source access to information benefit student researchers?
Kann: I would say that there is a general tendency for this to be more helpful for researchers with fewer computing resources.
At the very least, this enables researchers to easily compare to state-of-the-art models, such as the models which currently obtain the best results on given datasets.
Publishing of pretrained models makes it possible for researchers to develop and evaluate their proposed methods in realistic settings.
Loka: Have you or your students benefited from an open-source deep learning model?
Kann: I have published
multiple papers
about research projects that would have been
extremely difficult or impossible without access to open-source deep learning models.
For example, what we do a lot in NLP is to take a large, publicly available model
which has been pretrained on raw text data—for example, from Wikipedia—and then train
it on task-specific data. The goal of this project was to find out if we can improve
performance if we additionally train a pretrained model on labeled data not belonging
to our task of interest, before doing the last training step on task-specific data.
I am fairly confident that we would not have conducted this study if the initial
pretrained model hadn't been available to us.
Loka: Are you seeing any trends when it comes to access to deep learning models online?
Kann: At least in NLP, companies generally make their most recent models available.
The last famous example where
this wasn't the case was GPT-3. The last time I checked
one needed to pay for access. However, something else that's becoming relevant is
that models are starting to be so big that even with them being available, many groups
are not able to easily run them. For instance, students in my group were unable
to work with a model for machine translation, called
mBart, since it was just
too huge to be finetuned on our GPUs. We have solved this problem now by getting
better GPUs, but I expect this to more and more become a serious problem for many groups.
Loka: Thank you for your time, Katharina.
What’s clear is that access to pretrained models has a number of benefits for students,
researchers, and even the massive companies that publish them. The advantages for the
little guys alone should motivate those with goodwill and a vested interest in a faster
pace of progress in machine learning. And if it’s not, the brand equity in practicing
such transparency is typically good for a tech company’s bottom line. While one can
always find an argument to not do something, the arguments here aren’t a meaningful
tradeoff given the gains.
These companies are also being tasked with—and taking on the mantle of—leading environmental
initiatives to offset their carbon footprints. If these are more than merely lip service,
then progressive practices like sharing trained models should be on the table for companies big and small.