EleutherAI Says New NLP Model Approaches GPT-3 Level Performance


Elevate your technology and enterprise data strategy to Transform 2021.

Language systems powered by AI have transformative potential, especially in the enterprise. They are already used to drive chatbots, translate natural language into structured query language, create application layouts and spreadsheets, and improve the accuracy of web search products. The best-known AI text generator, OpenAI’s GPT-3, is perhaps used in more than 300 different applications by tens of thousands of developers and produces 4.5 billion words per day.

As business interest in AI grows, consulting firm Mordor Intelligence predicts that the natural language processing (NLP) market will be more than triple its revenue by 2025. But non-commercial open source efforts are gaining ground, as evidenced by the progress made by EleutherAI. A core collection of AI researchers, EleutherAI this week released GPT-J-6B (GPT-J), a model that the group says works almost on par with a GPT-3 model of the same size on various tasks.

“We think it’s probably fair to say that this is currently the best open source autoregressive language model you can get with a fairly wide margin,” said Connor Leahy, one of the founding members of EleutherAI, at VentureBeat.

GPT-J is what is called a Transformer model, which means that it weighs the influence of different parts of the input data rather than treating all input data the same. Transformers do not need to process the beginning of a sentence before the end. Instead, they identify the context that gives meaning to a word in the sentence, which allows them to process the input data in parallel.

The Transformer architecture forms the backbone of language models that include GPT-3 and Google’s BERT, but EleutherAI says GPT-J took less time to train compared to other large-scale model developments. . Researchers attribute this to using Jax, DeepMind’s Python library designed for machine learning research, as well as training on Google’s tensor processing units (TPUs), application-specific integrated circuits. (ASIC) developed specifically to accelerate AI.

GPT-J training

EleutherAI reports that GPT-J contains around 6 billion parameters, the parts of the machine learning model learned from historical training data. It was formed over the course of five weeks on 400 billion tokens from a dataset created by EleutherAI called The Pile, an 835 GB collection of 22 smaller datasets – including academic sources (e.g. , Arxiv, PubMed), communities (StackExchange, Wikipedia), code repositories (Github), and more. Tokens are a way to separate chunks of text into smaller natural language units, and they can be words, characters, or parts of words.

Above: GPT-J can solve basic math problems.

Image Credit: EleutherAI

For the computation, EleutherAI was able to take advantage of the TPU Research Cloud, a Google Cloud initiative that supports projects in the hope that research results will be shared through code and models. The GPT-J code and trained model are open source under the MIT license and can be used for free via HuggingFace’s Transformers platform or the EleutherAI website.

GPT-J outperforms the two previously released EleutherAI models, GPT-Neo 1.3B and GPT-Neo 2.7B. For example, he can perform addition and subtraction and prove simple mathematical theorems, such as “Every cyclic group is abelian”. It can also answer quantitative reasoning questions from a popular test data set (BoolQ) and generate pseudocode.


Above: GPT-J proving a theorem.

Image Credit: EleutherAI

“[OpenAI’s] GPT-2 had around 1.5 billion settings and doesn’t have the best performance because it’s a bit old. GPT-Neo had around 2.7 billion settings, but somewhat underperforms GPT-3 models of equal size. GPT-J, the new one, is now 6B – similar in size to OpenAI’s Curie model, we believe, ”Leahy said.

Look ahead

EleutherAI plans to eventually deliver the code and weights necessary to run a model similar, though not identical, to the full “DaVinci” GPT-3. (Weights are parameters within a neural network that transform the input data.) Compared to GPT-J, the full GPT-3 contains 175 billion parameters and has been trained on 499 billion tokens at from a 45TB dataset.

Language models like GPT-3 often amplify the biases encoded in the data. Some of the training data often comes from communities with pervasive gender, race and religion biases. OpenAI notes that this can lead to placing words like “naughty” or “sucked” near female pronouns and “Islam” near words like “terrorism”. Other studies, like the one published in April by researchers at Intel, MIT and the Canadian Institute for Advanced Research (CIFAR), found high levels of stereotypical bias in some of the more popular models.


Above: GPT-J responding to a verbal problem.

Image Credit: EleutherAI

But EleutherAI says it performed “extensive bias analysis” on The Pile and made “difficult editorial decisions” to exclude data sets that they said were “unacceptably negatively biased” towards certain groups or viewpoints.

While EleutherAI’s model may not be cutting edge in terms of capabilities, it could go a long way to solving a common technological problem: the disconnect between research and engineering teams. As Hugging Face CEO Clément Delangue told VentureBeat in a recent interview, tech giants are providing black box NLP APIs while releasing open source repositories that may or may not be difficult to use. well maintained. EleutherAI’s efforts could help companies realize the business value of NLP without having to do a lot of legwork themselves.


VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the topics that interest you
  • our newsletters
  • Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
  • networking features, and more

Become a member


About Author

Leave A Reply