Sentiment Analysis Without Modeling: TextBlob vs. VADER vs. Flair

Sentiment Prediction Accuracy Comparison

Photo by Guillaume Baudusseau on Unsplash

Sentiment analysis can be done with or without building a machine learning model. This article will go over the Python implementation of TextBlob, VADER, and Flair for non-model sentiment analysis.

After reading the article, you will learn

What is TextBlob?
What is VADER?
What is Flair?
How to use the Python library TextBlob for sentiment analysis?
How to use Python library VADER for sentiment analysis?
How to use Flair’s pre-trained model in Python for sentiment analysis?
Comparison between TextBlob vs. VADER vs. Flair.

Resources for this post:

Video tutorial on YouTube
Python code is at the end of the post. Click here for the notebook.
More video tutorials on NLP
More blog posts on NLP

Let’s get started!

Step 1: Install and Import Python Libraries

The first step is to install and import Python libraries. We need to install the vaderSentiment package for VADER and theflair package for flair.

# Install vaderSentiment package for VADER
pip install vaderSentiment

# Install flair
pip install flair

After installing the packages, let’s import the Python libraries. We need to import pandas and numpy for data processing.

For the sentiment analysis, we need to import TextBlob, SentimentIntensityAnalyzer from vaderSentiment, and TextClassifier from flair. We also need to load the English sentiment data from TextClassifier and import Sentence for text processing for the flair pre-trained model.

To check the sentiment prediction accuracy, we need to import accuracy_score from sklearn.

Last but not least, we set the pandas data frame column width to be 1000, which will allow us to see more content from the review.

# Data processing
import pandas as pd
import numpy as np

# Import TextBlob
from textblob import TextBlob

# Import VADER sentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

# Import flair pre-trained sentiment model
from flair.models import TextClassifier
classifier = TextClassifier.load('en-sentiment')

# Import flair Sentence to process input text
from flair.data import Sentence

# Import accuracy_score to check performance
from sklearn.metrics import accuracy_score

# Set a wider colwith
pd.set_option('display.max_colwidth', 1000)

Join Medium with my referral link - Amy @GrabNGoInfo

Read every story from Amy (and thousands of other writers on Medium). Your membership fee directly supports Amy and…

medium.com

Step 2: Download and Read in Data

The second step is to download and read the dataset.

The UCI Machine Learning Repository has the review data from three websites: imdb.com, amazon.com, and yelp.com. We will use the review data from amazon.com for this tutorial. Please follow these steps to download the data.

Go to: https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences
Click “Data Folder”
Download “sentiment labeled sentences.zip”
Unzip “sentiment labeled sentences.zip”
Copy the file “amazon_cells_labelled.txt” to your project folder

Those who are using Google Colab for this analysis need to mount Google Drive to read the dataset. You can ignore the code below if you are not using Google Colab.

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Change directory
import os
os.chdir("drive/My Drive/contents/sentiment_analysis")

# Print out the current directory
!pwd

Output:

Mounted at /content/drive
/content/drive/My Drive/contents/sentiment_analysis

Now let’s read the data into a pandas data frame and see what the dataset looks like.

# Read in data
amz_review = pd.read_csv('sentiment labelled sentences/amazon_cells_labelled.txt', sep='\t', names=['review', 'label'])

# Take a look at the data
amz_review.head()

.info helps us to get the information about the dataset.

# Get the dataset information
amz_review.info()

From the output, we can see that this data set has two columns, 1000 records, and no missing data. The ‘review’ column is object type, and the ‘label’ column is int64 type.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
#   Column  Non-Null Count  Dtype
---  ------  --------------  -----
0   review  1000 non-null   object
1   label   1000 non-null   int64
dtypes: int64(1), object(1)
memory usage: 15.8+ KB

Next, let’s check the distribution of the label.

# Check the label distribution
amz_review['label'].value_counts()

There are 500 positive and 500 negative reviews in the dataset, so we have a balanced dataset. For a balanced dataset, we can use accuracy as the performance metric.

1    500
0    500
Name: label, dtype: int64

If you are interested in imbalanced dataset classification, please check out my imbalanced classification and anomaly detection articles.

Step 3: What is TextBlob?

TextBlob is a Python library for Natural Language Processing (NLP). Sentiment analysis is one of many NLP tasks that TextBlob supports.

The sentiment property in TextBlob returns a polarity score and a subjectivity score for the input text.

The polarity score ranges from -1 to 1, where -1 means extremely negative, and 1 means highly positive. A score near 0 means neutral sentiment.
The subjectivity score ranges from 0 to 1, where 0 means extremely objective and 1 means highly subjective.

# Example text
text = 'GrabNGoInfo.com is a great machine learning tutorial website.'

# TextBlob sentiment
textBlob_sentiment = TextBlob(text).sentiment
textBlob_sentiment

In this example, the sample text ‘GrabNGoInfo.com is a great machine learning tutorial website.’ has a polarity score of 0.8 and the subjectivity score of 0.75, which means TextBlob interprets the sentence to have a subjective positive sentiment.

Sentiment(polarity=0.8, subjectivity=0.75)

We can use .polarity to extract the sentiment score.

# Extract sentiment score
textBlob_sentiment.polarity

Output:

0.8

Step 4: What is VADER?

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a Python library focusing on social media sentiments. It has a built-in algorithm to change sentiment intensity based on punctuations, slang, emojis, and acronyms.

The output of VADER includes four scores: compound score, negative score, neural score, and positive score.

The pos, neu, and neg represent the percentage of tokens that fall into each category, so they add up together to be 100%.
The compound score is a single score to measure the sentiment of the text. Similar to TextBlob, it ranges from -1 (extremely negative) to 1 (extremely positive). The scores near 0 represent the neural sentiment score.
The compound score is not a simple aggregation of the pos, neu, and neg scores. Instead, it incorporates rule-based enhancements such as punctuation amplifiers.

# Example text
text = 'GrabNGoInfo.com is a great machine learning tutorial website.'

# VADER Sentiment
vader = SentimentIntensityAnalyzer()
vader_sentiment = vader.polarity_scores(text)
vader_sentiment

VADER gave the sample text ‘GrabNGoInfo.com is a great machine learning tutorial website.’ a compound score of 0.6249. There is no negative word in the sentence, so the neg score value is 0. There are 63.1% of neutral words and 36.9% of positive words in the sentence.

{'compound': 0.6249, 'neg': 0.0, 'neu': 0.631, 'pos': 0.369}

The output of VADER is saved as a dictionary. We can extract the compound sentiment score by the key ‘compound’.

# Extract sentiment score
vader_sentiment['compound']

Output:

0.6249

Step 5: What is Flair?

Flair is a state-of-art NLP framework built on PyTorch. It incorporates recent research and provides an easy way to combine different embeddings to various NLP tasks. The pre-trained sentiment model offers a tool for sentiment analysis without training a customized model.

Unlike TextBlob and VADER which output a sentiment score between -1 and 1, flair sentiment output the predicted label with a confidence score. The confidence score ranges from 0 to 1, with 1 being very confident and 0 being very unconfident.

Before predicting the sentiment of a text, the input text needs to be tokenized by Sentence(). After that, we can see the sample text was split into nine tokens.

# Example text
text = 'GrabNGoInfo.com is a great machine learning tutorial website.'

# Flair tokenization
sentence = Sentence(text)
sentence

Output:

Sentence: "GrabNGoInfo.com is a great machine learning tutorial website ."   [− Tokens: 9]

Next, we use .predict to predict the sentiment.

# Flair sentiment prediction
classifier.predict(sentence)
sentence

The sample text ‘GrabNGoInfo.com is a great machine learning tutorial website.’ has the predicted sentiment of POSITIVE and a confidence score of 0.9895. Since 0.9895 is close to 1, flair is very confident about the predictions.

Sentence: "GrabNGoInfo.com is a great machine learning tutorial website ."   [− Tokens: 9  − Sentence-Labels: {'label': [POSITIVE (0.9895)]}]

The predicted label is saved as a value and the prediction confidence is saved as a score. We can use sentence.labels[0].value and sentence.labels[0].score separately to extract the values.

# Extract sentiment prediction score
print(f'Flair classified the review as {sentence.labels[0].value} with the score of {sentence.labels[0].score:.2f}')

Output:

Flair classified the review as POSITIVE with the score of 0.99

Step 6: How to Use TextBlob for Sentiment Analysis

In step 6, we will apply TextBlob to the Amazon review dataset and see how it performs.

We first get the sentiment polarity for each review and save the values into a column called ‘scores_TextBlob’. Then check if the polarity score is positive. If the score is greater than or equal to zero, the predicted sentiment for the review is positive (labeled as 1). Otherwise, the predicted sentiment for the review is negative (labeled as 0).

# Get sentiment score for each review
amz_review['scores_TextBlob'] = amz_review['review'].apply(lambda s: TextBlob(s).sentiment.polarity)

# Predict sentiment label for each review
amz_review['pred_TextBlob'] = amz_review['scores_TextBlob'].apply(lambda x: 1 if x >=0 else 0)

amz_review.head()

After getting predictions from TextBlob, let’s check the prediction accuracy.

# Compare Actual and Predicted
accuracy_score(amz_review['label'],amz_review['pred_TextBlob'])

Comparing the actual label with the TextBlob prediction, we get an accuracy score of 0.688, which means that TextBlob predicted the review sentiment 68.8% of the time.

Step 7: How to Use VADER for Sentiment Analysis

In step 7, we will apply VADER to the Amazon review dataset and see how it performs.

We first get the sentiment compound score for each review and save the values into a column called ‘scores_VADER’. Then check if the compound score is positive. If the score is greater than or equal to zero, the predicted sentiment for the review is positive (labeled as 1). Otherwise, the predicted sentiment for the review is negative (labeled as 0).

# Get sentiment score for each review
vader_sentiment = SentimentIntensityAnalyzer()
amz_review['scores_VADER'] = amz_review['review'].apply(lambda s: vader_sentiment.polarity_scores(s)['compound'])

# Predict sentiment label for each review
amz_review['pred_VADER'] = amz_review['scores_VADER'].apply(lambda x: 1 if x >=0 else 0)

amz_review.head()

After getting predictions from VADER, let’s check the prediction accuracy.

# Compare Actual and Predicted
accuracy_score(amz_review['label'],amz_review['pred_VADER'])

Output:

0.768

Comparing the actual label with the VADER prediction, we get an accuracy score of 0.768, which means that VADER predicted the review sentiment 76.8% of the time.

TextBlob has a prediction accuracy of 68.8% for the same dataset, so VADER has an 8% improvement over the TextBlob prediction.

Step 8: How to Use Flair for Sentiment Analysis

In step 8, we will apply the flair pre-trained sentiment model to predict the sentiment of the Amazon reviews.

Firstly, let’s define a function that takes a review as input and the predicted label and confidence as outputs.

# Define a function to get Flair sentiment prediction score
def score_flair(text):
  sentence = Sentence(text)
  classifier.predict(sentence)
  score = sentence.labels[0].score
  value = sentence.labels[0].value
  return score, value

Then apply the function to each review in the dataset.

# Get sentiment score for each review
amz_review['scores_flair'] = amz_review['review'].apply(lambda s: score_flair(s)[0])

# Predict sentiment label for each review
amz_review['pred_flair'] = amz_review['review'].apply(lambda s: score_flair(s)[1])

# Check the distribution of the score
amz_review['scores_flair'].describe()

We can see that the summary statistics show that most flair scores are close to 1, which means that the flair sentiment model is overall very confident about the predictions.

count    1000.000000
mean        0.988019
std         0.046841
min         0.533640
25%         0.996153
50%         0.999167
75%         0.999887
max         0.999999
Name: scores_flair, dtype: float64

The distribution of the labels is very balanced.

# Check the counts of labels
amz_review['pred_flair'].value_counts()

Output:

POSITIVE    504
NEGATIVE    496
Name: pred_flair, dtype: int64

Before checking the prediction accuracy, we need to map the ‘NEGATIVE’ value to 0 and the ‘POSITIVE’ value to 1 because the Amazon review dataset has true labels of 0 and 1.

# Change the label of flair prediction to 0 if negative and 1 if positive
mapping = {'NEGATIVE': 0, 'POSITIVE': 1}
amz_review['pred_flair'] = amz_review['pred_flair'].map(mapping)

After comparing the flair prediction with the actual sentiment label, we can see that flair predicts the sentiment correctly 94.8% of the time! The prediction accuracy is an 18% increase from VADER and a 26% increase from TextBlob!

# Compare Actual and Predicted
accuracy_score(amz_review['label'],amz_review['pred_flair'])

Output:

0.948

Summary: TextBlob vs. VADER vs. Flair

In this article, we compared TextBlob vs. VADER vs. Flair for sentiment analysis.

Methodology - VADER and TextBlob are lexicon and rule-based. - Flair is model-based.
Meaning of score - VADER and TextBlob have sentiment scores ranging from -1 to 1, where -1 means very negative, 1 means very positive, and 0 means neutral. - Flair has a score ranging from 0 to 1, where 0 means very unconfident about the prediction and 1 means very confident about the prediction.
Speed Performance - Flair is slower than TextBlob and VADER because the computation algorithm is more complicated.
Accuracy Performance - Based on the results of this project, flair has the best prediction accuracy, VADER is the second best, and TextBlob is the third.

After reading the article, you have learned

What is TextBlob?
What is VADER?
What is Flair?
How to use the Python library TextBlob for sentiment analysis?
How to use Python library VADER for sentiment analysis?
How to use flair pre-trained model in Python for sentiment analysis?
Comparison between TextBlob vs. VADER vs. Flair.

If you are interested in the nuance differences between TextBlob and VADER, please refer to my article TextBlob VS VADER For Sentiment Analysis Using Python

If you are interested in the video version of the tutorial, please check out the video below. Vist GrabNGoInfo website or GrabNGoInfo YouTube Channel for more machine learning tutorials.

Sentiment Analysis Without Modeling: TextBlob vs. VADER vs. Flair

Step 1: Install and Import Python Libraries

Join Medium with my referral link - Amy @GrabNGoInfo

Read every story from Amy (and thousands of other writers on Medium). Your membership fee directly supports Amy and…

Step 2: Download and Read in Data

Step 3: What is TextBlob?

Step 4: What is VADER?

Step 5: What is Flair?

Step 6: How to Use TextBlob for Sentiment Analysis

Step 7: How to Use VADER for Sentiment Analysis

Step 8: How to Use Flair for Sentiment Analysis

Summary: TextBlob vs. VADER vs. Flair

Recommended tutorials

Join Medium with my referral link - Amy GrabNGoInfo

As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…