Meta is giving away some of the family jewels: Thatâs the gist of an announcement from the company formerly known as Facebook this week. In a blog post on the Meta AI site, the companyâs researchers announced that theyâve created a massive and powerful language AI system and are making it available free to all researchers in the artificial-intelligence community. Meta describes the move as an effort to democratize access to a powerful kind of AIâbut some argue that not very many researchers will actually benefit from this largesse. And even as these models become more accessible to researchers, many questions remain about the path to commercial use.
Large language models are one of the hottest things in AI right now. Models like OpenAIâs GPT-3 can generate remarkably fluid and coherent text in just about any format or style: They can write convincing news articles, legal summaries, poems, and advertising copy, or hold up their end of conversation as customer-service chatbots or video-game characters. GPT-3, which broke the mold with its 175 billion parameters, is available to academic and commercial entities only via OpenAIâs application and vetting process.
Metaâs Open Pretrained Transformer (known as OPT-175B) matches GPT-3 with 175 billion parameters of its own. Meta is offering the research community not only the model itself, but also its codebase and extensive notes and logbooks about the training process. The model was trained on 800 gigabytes of data from five publicly available data sets, which are described in the âdata cardâ that accompanies a technical paper posted by the Meta researchers to the ArXiv online preprint server.
Joelle Pineau, director of Meta AI Research Labs, tells IEEE Spectrum that she expects researchers to make use of this treasure trove in several ways. âThe first thing I expect [researchers] to do is to use it to build other types of language-based systems, whether itâs machine translation, a chatbot, something that completes textâall of these require this kind of state-of-the-art language model,â she says. Rather than training their own language models from scratch, Pineau says, they can build applications and run them âon a relatively modest compute budget.â
Joelle PineauMeta
The second thing she expects researchers to do, Pineau says, is âpull it apartâ to examine its flaws and limitations. Large language models like GPT-3 are famously capable of generating toxic language full of stereotypes and harmful bias; that troubling tendency is a result of training data that includes hateful language found in Reddit forums and the like. In their technical paper, Metaâs researchers describe how they evaluated the model on benchmarks related to hate speech, stereotypes, and toxic-content generation, but Pineau says âthereâs so much more to be done.â She adds that the scrutiny should be done âby community researchers, not inside closed research labs.â
The paper states that âwe still believe this technology is premature for commercial deployment,â and says that by releasing the model with a noncommercial license, Meta hopes to facilitate the development of guidelines for responsible use of large language models âbefore broader commercial deployment occurs.â
Within Meta, Pineau acknowledges that thereâs a lot of interest in using OPT-175B commercially. âWe have a lot of groups that deal with text,â she notes, that might want to build a specialized application on top of the language model. Itâs easy to imagine product teams salivating over the technology: It could power content-moderation tools or text translation, could help suggest relevant content, or could generate text for the creatures of the metaverse, should it truly come to pass.
There have been other efforts to make an open-source language model, most notably from EleutherAI, an association that has released a 20-billion-parameter model in February. Connor Leahy, one of the founders of EleutherAI and founder of an AI startup called Conjecture, calls Metaâs move a good step for open science. âEspecially the release of their logbook is unprecedented (to my knowledge) and very welcome,â he tells Spectrum in an email. But he notes that Metaâs conditional release, making the model available only on request and with a noncommercial license, âfalls short of truly open.â EleutherAI doesnât comment on its plans, but Leahy says the group will continue working on its own language AI, and adds that OPT-175B will be helpful for some of its research. âOpen research is synergistic in that way,â he says.
âSecurity through obscurity is not security, as the saying in the computer-security world goes. And studying these models and finding ways to integrate their existence into our world is the only feasible path forward.â
âConnor Leahy, EleutherAI
EleutherAI is a something of an outlier in AI research in that itâs a self-organizing group of volunteers. Much of todayâs cutting-edge AI work is done within the R&D departments of big players like Meta, Google, OpenAI, Microsoft, Nvidia, and other deep-pocketed companies. Thatâs because it takes enormous amount of energy and compute infrastructure to train big AI systems.
Meta claims that its training of OPT-175 required 1/7th the carbon footprint of that required for training GPT-3, yet as Metaâs paper notes, thatâs still a significant energy expenditure. The paper says that OPT-175B was trained on 992 80-gigabyte A100 GPUs from Nvidia, with a carbon-emissions footprint of 75 tons, as compared to an estimated carbon budget of 500 tons for GPT-3 (that figure has not been confirmed by OpenAI).
Metaâs hope is that by offering up this âfoundation modelâ for other entities to build on top of, it will at least reduce the need to build huge models from scratch. Deploying the model, Meta says in its blog post, requires only 16 Nvidia 32GB V100 GPUs. The company is also releasing smaller scale versions of OPT-175B that can be used by researchers who donât need the full-scale model or by those who are investigating the behavior of language models at different scales.
Maarten Sap, a researcher at the Allen Institute for Artificial Intelligence (AI2) and in incoming assistant professor at Carnegie Mellon Universityâs Language Technologies Institute, studies large language models and has worked on methods to detoxify them. In other words, heâs exactly the kind of researcher that Meta is hoping to attract. Sap says that heâd âlove to use OPT-175B,â but âthe biggest issue is that few research labs actually have the infrastructure to run this model.â If it were easier to run, he says, heâd use it to study toxic language risks and social intelligence within language models.
While Sap applauds Meta for opening up the model to the community, he thinks it could go a step further. âIdeally, having a demo of the system and an API with much more control/access than [OpenAIâs API for GPT-3] would be great for actual accessibility,â he says. However, he notes that Metaâs release of smaller versions is a good âsecond-best option.â
Whether models like OPT-175B will ever become as safe and accessible as other kinds of enterprise software is still an open question, and there are different ideas about the path forward. EleutherAIâs Leahy says that preventing broad commercial use of these models wonât solve the problems with them. âSecurity through obscurity is not security, as the saying in the computer-security world goes,â says Leahy, âand studying these models and finding ways to integrate their existence into our world is the only feasible path forward.â
Meanwhile, Sap argues that AI regulation is needed toâprevent researchers, people, or companies from using AI to impersonate people, generate propaganda or fake news, or other harms.â But he notes that âitâs pretty clear that Meta is against regulation in many ways.â
Sameer Singh, an associate professor at University of California, Irvine, and a research fellow at AI2 who works on language models, praises Meta for releasing the training notes and logbooks, saying that process information may end up being more useful to researchers than the model itself. Singh says he hopes that such openness will become the norm. He also says he supports providing commercial access to at least smaller models, since such access can be useful for understanding modelsâ practical limitations.
âDisallowing commercial access completely or putting it behind a paywall may be the only way to justify, from a business perspective, why these companies should build and release LLMs in the first place,â Singh says. âI suspect these restrictions have less to do with potential damage than claimed.â
- Timnit Gebru Is Building a Slow AI Movement - IEEE Spectrum âº
- EleutherAI: When OpenAI Isn't Open Enough - IEEE Spectrum âº
- OpenAI's GPT-3 Speaks! (Kindly Disregard Toxic Language) - IEEE ... âº
- Metaâs AI Takes an Unsupervised Step Forward - IEEE Spectrum âº
- AI Language Models Are Struggling to âGetâ Math - IEEE Spectrum âº
- ChatGPTâs AI Can Help Screen For Alzheimerâs - IEEE Spectrum âº
- Intel and Nvidia Square Off in GPT-3 Time Trials - IEEE Spectrum âº
- Meta's AI Agents Learn To Move By Copying Toddlers - IEEE Spectrum âº
- DeepSeek Revolutionizes AI with Open Large Language Models - IEEE Spectrum âº
Eliza Strickland is a senior editor at IEEE Spectrum, where she covers AI, biomedical engineering, and other topics. She holds a masterâs degree in journalism from Columbia University.