The web content describes the use of Simple Transformers, a library built on top of Hugging Face's Transformers, for implementing question answering (QA) systems using BERT, XLNet, XLM, and DistilBERT models, with training and evaluation using the SQuAD 2.0 dataset.
Abstract
The article discusses the application of Transformer models for question answering tasks, emphasizing the simplicity of using the Simple Transformers library. It outlines the process of setting up the environment, preparing the SQuAD 2.0 dataset, and training a QA model. The library leverages pre-trained models like BERT and XLNet, allowing for efficient fine-tuning on the QA task. The article also provides guidance on data preparation, model training, and evaluation, including how to format data and submit predictions to the SQuAD leaderboard for performance assessment. The results section presents the model's performance metrics, acknowledging the challenge of the SQuAD 2.0 benchmark and suggesting potential improvements through hyperparameter tuning and using larger models.
Opinions
The author endorses the Simple Transformers library for its ease of use in implementing question answering systems.
The author suggests that transfer learning with pre-trained Transformer models is essential for state-of-the-art performance in NLP tasks, including question answering.
The article implies that the SQuAD 2.0 dataset is a reputable benchmark for evaluating the performance of QA models.
The author recommends using larger models, such as the 'large' variant, to achieve better results on the SQuAD 2.0 dataset.
The author provides a cost-effective alternative to ChatGPT Plus (GPT-4) by recommending an AI service named ZAI.chat, which offers similar performance at a lower price point.
Question Answering with BERT, XLNET, XLM, and DistilBERT using Simple Transformers
Question: How to use Transformers for Question Answering? Answer: Simple Transformers, duh! (See what I did there?)
Context: Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language.
Human: What is a Question Answering system?
System: systems that automatically answer questions posed by humans in a natural language
QA has applications in a vast array of tasks including information retrieval, entity extraction, chatbots, and dialogue systems to name but a few. While question answering can be done in various ways, perhaps the most common flavour of QA is selecting the answer from a given context. In other words, the system will pick a span of text from the context that correctly answers the question. If a correct answer cannot be found from the context, the system will merely return an empty string.
Transfer learning with pre-trained Transformer models has become ubiquitous in NLP problems and question answering is no exception. With that in mind, we are going to use BERT to tackle task of question answering!
We’ll be using the Simple Transformers library to easily work with Transformer models.
We will be using the Stanford Question Answering Dataset (SQuAD 2.0) for training and evaluating our model. SQuAD is a reading comprehension dataset and a standard benchmark for QA models. The dataset is publicly available on the website.
Download the dataset and place the files (train-v2.0.json, dev-v2.0.json) in the data/ directory.
Data Preparation
In order to perform QA in Simple Transformers, the data has to be in JSON files or in a Python list of dicts in the correct format.
If using JSON files, the files should contain a single list of dictionaries. A dictionary represents a single context and its associated questions.
Each such dictionary contains two attributes, the "context" and "qas".
context: The paragraph or text from which the question is asked.
qas: A list of questions and answers.
Questions and answers are represented as dictionaries. Each dictionary in qas has the following format.
id: (string) A unique ID for the question. Should be unique across the entire dataset.
question: (string) A question.
is_impossible: (bool) Indicates whether the question can be answered correctly from the context.
answers: (list) The list of correct answers to the question.
A single answer is represented by a dictionary with the following attributes.
answer: (string) The answer to the question. Must be a substring of the context.
answer_start: (int) Starting index of the answer in the context.
We can convert the SQuAD data into this format quite easily.
Question Answering Model
Simple Transformers has a class that can be used for each supported NLP task. An object of this class is used to perform training, evaluation (when ground truth is known), and prediction (when ground truth is unknown).
Here, we are creating a QuestionAnsweringModel object and setting the hyperparameters for fine tuning the model. The first parameter is the model_type and the second is the model_name.
The args parameter takes in an optional Python dictionary of hyper-parameter values and configuration options. I highly recommend checking out all the options here.
The default values are shown below.
To load a model a previously saved model instead of a default model, you can change the model_name to the path to a directory which contains a saved model.
model = QuestionAnsweringModel('bert', 'path_to_model/')
Training
Training the model is a one-liner! Just pass in train_data to the train_model function.
You can also change the hyperparameters by passing in a dict containing the relevant attributes to the train_model method. Note that, these modifications will persist even after training is completed.
The train_model method will create a checkpoint (save) of the model at every nth step where n is self.args['save_steps']. Upon completion of training, the final model will be saved to self.args['output_dir'].
Evaluation
The correct answers for the dev data are not provided in the SQuAD dataset but we can upload our predictions to the SQuAD website for evaluation. Alternatively, you could split the train data into training and validation datasets and use the model.eval_model() method to validate the model locally.
For this guide, I’ll simply be uploading the predictions to SQuAD.
Breaking down this code, we are reading in the dev data, converting it into the correct format, getting the model predictions, and finally writing to a JSON file in the required submission format.
Results
The results obtained with these hyperparameters are given below.
SQuAD 2.0 is a challenging benchmark and this is reflected in these results. Some hyperparameter tuning should bump up these scores. Also, using a large model rather than a base model should significantly boost the results as well.