Fine Tuning a T5 transformer for any Summarization Task
Doing cool things with data!

Introduction
I am amazed with the power of the T5 transformer model! T5 which stands for text to text transfer transformer makes it easy to fine tune a transformer model on any text to text task. Any NLP task event if it is a classification task, can be framed as an input text to output text problem.
In this blog, I show how you can tune this model on any data set you have. In particular, I demo how this can be done on Summarization data sets. I have personally tested this on CNN-Daily Mail and the WikiHow data sets. The code is publicly available on my Github here.
T5-small trained on Wikihow writes amazing summaries. See snippet below of actual text, actual summary and predicted summary. This model is also available on HuggingFace Transformers model hub here. The link provides a convenient way to test the model on input texts as well as a JSON endpoint.
WikiHow Text: Make sure you've got all the cables disconnected from the back of your console,
especially the power cord., You'll need the straight end to be about 2-3 inches long.You will need a
large size paper clip for this method because it will need to go in about 1 and a half inches to
push the disc out., It's located on the left side of the console, right behind the vents.The eject
hole on an Xbox One S is located at the second hole on the left from the right corner and the third
hole up from the bottom. It can be more difficult to spot, so it's best to have a good amount of
light available. Doing so will cause the disc to pop out a little bit., Carefully pull the disc the
rest of the way out with your fingers. It might be a good idea to use a cloth or soft fabric to
protect the disc from fingerprints and scratching.
Actual Summary: Unplug all cables from your Xbox One.Bend a paper clip into a straight line.Locate the orange circle.Insert the paper clip into the eject hole.Use your fingers to pull the disc out.
Predicted Summary: Gather the cables.Place the disc on your console.Section the eject hole on the left side of the console.Pull out the disc.Remove from the back of the console.I run a machine learning consulting, Deep Learning Analytics. At Deep Learning Analytics, we are very passionate about using data science and machine learning to solve real world problems. Please reach out to us if you are looking for NLP expertise for your business projects. Original full story published on our website here.
T5 Transformer Model
T5 model which was released by google research adds the following to existing research:
- It creates a clean version of the massive common crawl data set called Colossal Cleaned Common crawl(C4). This data set is s two orders of magnitude larger than Wikipedia.
- It pretrains T5 on common crawl
- It proposes reframing of all NLP tasks as an input text to output text formulation
- It shows that fine tuning on different tasks — summarization, QnA, reading comprehension using the pretrained T5 and the text-text formulation results in state of the art results
- The T5 team also did a systematic study to understand best practices for pre training and fine tuning. Their paper details what parameters matter most for getting good results.
The figure below from T5 paper explains this input text to output text problem formulation.

This blog from Google also explains the paper well. Lets deep dive into the code now!
T5 Fine Tuning Pipeline
We will use the HuggingFace Transformers implementation of the T5 model for this task. A big thanks to this awesome work from Suraj that I used as a starting point for my code.
Getting the data
To make it simple to extend this pipeline to any NLP task, I have used the HuggingFace NLP library to get the data set. This makes it easy to load many supporting data sets. The HuggingFace NLP library also has support for many metrics. I have used it rouge score implementation for my model.
The full code is available on my Github. For this demo, I will show how to process the WikiHow data set. The code though is flexible to be extended to any summarization task.
The Main Steps involved are:
- Load the Wikihow data. Please note for this dataset, two files need to be download to a local data folder
- The dataset object created by NLP library can be used to see sample examples
- We want to look at the average length of the text to decide if input can be tokenized to a max length of 512







