Summary

The article provides a comprehensive guide for beginners on using BERT for natural language processing tasks, including installation, pipeline usage, fine-tuning, custom dataset application, and hyperparameter optimization.

Abstract

The article "A Beginner’s Guide to Using BERT for the First Time" serves as an introductory tutorial for those looking to implement BERT (Bidirectional Encoder Representations from Transformers) in their NLP projects. It covers the installation of the necessary Huggingface Transformers library, which simplifies the use of BERT and other transformer models. The guide explains how to utilize the pipeline function for quick sentiment analysis, the process of fine-tuning BERT on a specific dataset, and how to work with custom datasets, such as the Amazon review dataset for sentiment analysis. Additionally, it touches on the importance of hyperparameter search to improve model performance, suggesting tools like Optuna or Ray Tune for this purpose. The article emphasizes the ease of adapting BERT for various NLP tasks and the potential for achieving high accuracy with proper dataset preparation and model tuning.

Opinions

The author believes that BERT has set a new standard in NLP, demonstrating state-of-the-art results across multiple tasks with minimal data requirements.
The Huggingface Transformers library is highly recommended for its comprehensive documentation and ease of use, allowing for seamless integration of BERT and other transformer models into NLP applications.
The sentiment analysis task using BERT is presented as a straightforward starting point for those new to BERT, with the pipeline function being particularly user-friendly for quick predictions.
The article suggests that fine-tuning BERT with a custom dataset is relatively straightforward, and it encourages experimentation with different transformer models provided by the Huggingface model hub.
The author points out that the Trainer class from the Transformers library is a powerful tool for training and evaluating models, offering flexibility and a range of training options.
There is an emphasis on the importance of proper dataset preparation, noting that the quality of the data significantly impacts model performance.
The author advocates for the use of hyperparameter search tools to optimize model performance, but also acknowledges that this process can be time-consuming and resource-intensive.
The article concludes optimistically, suggesting that with minimal modifications, users can create their own powerful NLP models using BERT and the Transformers library.

A Beginner’s Guide to Using BERT for the First Time

From predicting single sentence to fine-tuning using custom dataset to finding the best hyperparameter configuration.

BERT has become a new standard for Natural Language Processing (NLP). It achieved a whole new state-of-the-art on eleven NLP task, including text classification, sequence labeling, question answering, and many more. Even better, it can also give incredible results using only a small amount of data. BERT was first released in 2018 by Google along with its paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

Now we can easily apply BERT to our model by using Huggingface (🤗) Transformers library. The library already provided complete documentation about other transformers models too. You can check it here. In this post, I will try to summarize some important points which we will likely use frequently. We will take a look at how to use and train models using BERT from 🤗 Transformers. Later, you can also utilize other transformers models (such as XLM, RoBERTa, XLM RoBERTa (my favorite!), BART, and many others) by simply changing a single line of code.

Text classification seems to be a pretty good start to get to know BERT. There are many kinds of text classification tasks, but we will choose sentiment analysis in this case. Here are 5 main points which we will be covered in this post:

Installation
Pipeline
Fine-tune
Using custom dataset
Hyperparameter search

Installation

As stated on their website, to run 🤗 Transformers you will need to have some requirement as follow:

Python 3.6+
Pytorch 1.10+ or Tensorflow 2.0

They also encourage us to use virtual environments to install them, so don’t forget to activate it first.

The installation is quite easy, when Tensorflow or Pytorch had been installed, you just need to type:

pip install transformers

In this post, we are going to use Pytorch. But it should be easy if you want to translate it into Tensorflow, just add ‘TF’ at the beginning of each model class name.

Pipeline

When you just want to test or simply use it to predict some sentences, you can use pipeline(). Besides text classification, they already provided many different tasks such as text generation, question answering, summarization, and so on. To run sentiment analysis task, simply type:

from transformers import pipeline
classifier = pipeline('sentiment-analysis')
result = classifier('We are very happy to show you the 🤗 Transformers library.')

It uses a model named “distilbert-base-uncased-finetuned-sst-2-english” by default. We can also change to other models that we can find in the model hub. For example, if we want to use nlptown/bert-base-multilingual-uncased-sentiment, then simply do the following:

classifier = pipeline(‘sentiment-analysis’, model=”nlptown/bert-base-multilingual-uncased-sentiment”)

Fine-tune

First thing first, we need a dataset. At this point, we are going to use the dataset provided by 🤗 Datasets. They provide a wide range of task options, varying from text classification, token classification, language modeling, and many more. To install it, simply execute the following line:

pip install datasets

Load data

We are going to use sst2 dataset from GLUE task and bert-base-uncased pretrained. By runningload_dataset and load_metric, we are downloading dataset as well as metric. load_metricautomatically loads a metric associated with the chosen task.

Preprocessing

To preprocess, we need to instantiate our tokenizer using AutoTokenizer (or other tokenizer class associated with the model, eg: BertTokenizer). By calling from_pretrained(), we download the vocab used during pretraining the given model (in this case, bert-base-uncased). The vocab is useful so that the tokenization results are corresponding to the model’s vocab.

Fine-tuning

Fortunately, they also provide a simple interface called Trainer() which makes the training and evaluation process much easier without losing its flexibility to modify a wide range of training options.

First, instantiate and download the model with from_pretrained(). Since our task is sequence classification, we can use AutoModelForSequenceClassification (or other model class associated to the pretrained, eg: BertForSequenceClassification).

We need to define our own compute_metrics function if we want to have other metrics in addition to the loss. This function can be passed to the trainer.

Using Custom Dataset

Now we just need to convert our dataset into the right format so that the model can work properly. We will use a small subset from Amazon review dataset in the fashion category. You can find the dataset here. The labels are still in the form of rating, so we need to change them into whether positive or negative. Reviews with 3 or more stars will be classified as positive, and the rest are negative. This is just for an example, feel free to change it the way you like.

After that, we split them into train, validation, and test and tokenize them using AutoTokenizer. We also need to convert our data to dataset object by subclassing torch.utils.data.Dataset object and implementing __len__ and __getitem__. Take a look at AmazonDataset class below. For training, just repeat the steps in the previous section. But this time, we use DistilBert instead of BERT. It is a small version of BERT. Faster and lighter!

As you can see, the evaluation is quite good (almost 100% accuracy!). Apparently, it’s because there are a lot of repetitive data. Some reviews can appear more than three times in the dataset. So, make sure that your data is clear and good enough to represent the actual world.

Hyperparameter Search

Even better, they also support hyperparameter search using Optuna or Ray tune (you can choose one). It will run the training process several times so it needs to have the model defined via a function (so it can be reinitialized at each new run). See model_init function below.

Besides that, it will also take a very long time to run. Alternatively, you can do a hyperparameter search using only a portion of the training data to save time and resources. After getting the best configuration, we can rerun the training using full data with the best configuration. Just do something like this:

train_dataset = encoded_dataset["train"].shard(index=1, num_shards=10)

This process will return a BestRun object containing information about the hyperparameter which is used for the best run. To use this configuration, just set the hyperparameter into TrainingArgument.

That’s it! If you want to try another task or another pretrained model or even use your own dataset, you can easily customize it to your needs by modifying a couple of lines, and BOOM! You already had your own transformers-powered NLP model!

References

[1] Huggingface Transformers

[2] Devlin et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)

If you enjoyed reading this post and would like to hear more from me and other writers here, join Medium and subscribe to my newsletter. Or simply follow the links below. Thank you!

Join Medium with my referral link - Arfinda Ilmania

As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…

medium.com

Get an email whenever Arfinda Ilmania publishes.

Get an email whenever Arfinda Ilmania publishes. By signing up, you will create a Medium account if you don't already…

medium.com