avatarPrakhar Mishra

Summary

This web page content discusses text generation using GPT-Neo, a transformer-based neural network, and how to infer and fine-tune it using Happy Transformers.

Abstract

The content of the web page titled "Text Generation using GPT-Neo" discusses the use of GPT-Neo, a transformer-based neural network, for text generation. GPT-Neo is an open-source language model developed by EleutherAI as an alternative to the commercially available GPT-3. The article explains how to use Happy Transformers, a package built on top of Hugging Face's transformer library, to utilize state-of-the-art NLP models for inference and training tasks such as text generation, text classification, question answering, and word prediction. The article provides examples of how to load the pre-trained GPT-Neo model, perform inference, and fine-tune the model using Google Colab notebook.

Bullet points

  • GPT-Neo is an open-source language model developed by EleutherAI as an alternative to commercially available GPT-3.
  • Happy Transformers is a package built on top of Hugging Face's transformer library that makes it easy to utilize state-of-the-art NLP models for various tasks.
  • The article provides examples of how to load the pre-trained GPT-Neo model and perform inference using tunable parameters such as top-k, do_sample, etc.
  • The article also explains how to fine-tune the GPT-Neo model using Happy Transformers' train() and eval() methods.
  • The article provides a link to the official GitHub repository for Happy Transformers and encourages readers to support the author by buying him a "chai."

Text Generation using GPT-Neo

Inferencing and Fine-tuning GPT-Neo using Happy Transformers

Image from Source

Introduction

Last year, OpenAI’s GPT-3 was released and as of today, it is the second largest language model that exists (Google Brain’s 1.6 trillion parameters language model is the largest). It is a transformer-based neural network that is trained on the simple objective of predicting the next word in a given sequence of words. This model came with pretty good generalized few-shot learning abilities. Researchers were able to achieve brilliant results by directly applying GPT-3 to tasks like Answering math questions, generating SQL code, etc without even explicitly showing these task-specific training data to this model.

While everyone waited for it to be released as open-source just like its predecessor GPT-1/2, considering this mammoth's generalization, performance, and the possibilities of misuse if released in the open, they decided to exclusively sell its source code rights to Microsoft and commercialize it.

EleutherAI in mid-2020 came out with GPT-Neo and tweeted —

and it took up by the storm. GPT-Neo was also trained in an autoregressive fashion just like GPT-3. Also, read this awesome blog(GPT-Neo Vs GPT-3) for a task-level comparison between GPT-Neo and GPT-3.

Also, if you want to keep enjoying reading awesome articles related to Data Science and Machine Learning, you can always purchase the Medium membership through my referral link :)

Happy Transformers

Happy Transformer is a package built on top of Hugging Face’s transformer library that makes it easy to utilize state-of-the-art NLP models for inference as well as training them on a large variety of tasks such as Text Generation, Text Classification, Question Answering, Word Prediction, etc. For the purpose of this blog post, we will only walk through the Text Generation method and see how to infer and fine-tune the GPT-Neo model on Google Colab notebook.

You can install this package using the below-mentioned command —

> pip install happytransformer

Pre-trained GPT-Neo

Since we are using Google Colab free instance we choose “EleutherAI/gpt-neo-125M” as our choice of model for playing around with the already trained GPT-Neo model. You can also choose other versions from this list. GPT-Neo 125M is a transformer model designed using EleutherAI’s replication of the GPT-3 architecture.

We first load the model and create its instance using the below snippet —

Next, we perform inference using this instance by putting in necessary tunable parameters such as top-k, do_sample, etc under GENSettings class. I have provided a little descriptor for whosoever is not aware of what these parameters mean. You could also play around with other parameters as mentioned here. Also, Neural Text DeGeneration talks about various decoding techniques for generating natural language text.

  • do_sample — When True, picks words based on their conditional probability.
  • top-k — How many potential answers are considered when performing sampling from the peak.
  • max_length — Maximum number of generated tokens
  • min_length Minimum number of generated tokens

For testing, we provide the model with some sample starter context text i.e. “Iphone ” and print the next set of words that follows this based on our model’s prediction.

Clearly, the model doesn’t output anything very specific to iPhone here :D, but yeah, still the generation makes some sense at least. However, you can play around with larger sized model and see how it goes.

Next, we discuss the fine-tuning procedure for GPT-Neo.

Fine-tuning GPT-Neo

Happy Transformers gives us train() and eval() methods that can be used for training and evaluating our model. train() method takes in training parameters and path to the text file that contains the text for training the model.

Here’s the code that you can use to do the same —

Please refer to this document for other parameters which you can use for training purposes. Also, for evaluating the trained model, one needs to pass a test file (containing text) and the cross-entropy loss returned can help you judge the quality of information learned by the model. Minimum the loss, better the model.

So, yeah that’s it for this blog. You can also checkout my blog on Text Generation using GPT-J (A more advanced Text Generation model)

If you like reading research papers then you might want to checkout some of the research paper summaries that i have written —

Graph-based Text Similarity Method

Pegasus for Abstractive Text Summarization

Grammar Correction System for Mobile Devices

Beyond Accuracy: Behavioral Testing of NLP Models using CheckList

BERT for Extractive Text Summarization

Understanding T5 Model: Text-to-Text Transfer Transformer

Also, do check out the official GitHub repository —

Also, in case you enjoyed reading this article and if you want, you can buy me a “chai” on https://www.buymeacoffee.com/TechvizCoffee — because I don’t actually drink coffee :) Thank you very much! It’s totally optional and voluntary :)

I hope the read was worth your time. Thank You!

NLP
Gpt 3
Machine Learning
Text Generation
Pytorch
Recommended from ReadMedium