Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

frequent mentioning of such terms in tweets could suggest that The Workers’ Party is the opposition party that is gaining the most attention, and that Hougang and Aljunied constituencies are likely to be one of the closely watched constituencies in the upcoming elections. Two other opposition parties, PSP (Progress Singapore Party) and SDP (Singapore Democratic Party) are also mentioned in the tweets quite often.What about the word cloud for tweets only?<figure id="2ea6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*b8tKsvPEcfWlZ6FylIrKsQ.png"><figcaption>Word cloud for tweets only</figcaption></figure>We observe a much larger variety of words commonly used in the tweets, but The Workers’ Party remains a prominent subject in many of the tweets.While word clouds allow us to pick out commonly used words across the tweets in our dataset (which haven’t been too surprising so far), we don’t know how the sentiments of those tweets are like. That brings us to the next part, where I perform sentiment analysis using VADER and BERT.<h1 id="98c4">Sentiment analysis using VADER</h1><figure id="fd3f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*vHr4i8v-V73qYfEO"><figcaption>Photo by <a href="https://unsplash.com/@sonance?utm_source=medium&utm_medium=referral">Viktor Forgacs</a> on <a href="https://unsplash.com?utm_source=medium&utm_medium=referral">Unsplash</a></figcaption></figure>I’m sure most people are quite familiar with <a href="https://pypi.org/project/vaderSentiment/">VADER</a> (Valence Aware Dictionary and sEntiment Reasoner), so I won’t bore everyone with a lengthy explanation about it in this article. Instead, I’ll jump right into my findings, where I present the breakdown of tweets that are classified to be carrying positive, negative or neutral sentiment.<figure id="80b3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*cEDauRfzs_i6YBoMI81y4w.png"><figcaption>Breakdown of tweet sentiments for tweets and retweets</figcaption></figure>For the dataset that included retweets, VADER classified most of the tweets as positive, a quarter as negative and slightly less than 10% of them as neutral.<figure id="4b84"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*KfqiAT6hJlNR9503IOJ3Lw.png"><figcaption>Breakdown of tweet sentiments for tweets only</figcaption></figure>After excluding retweets, VADER classified a smaller proportion of tweets as positive, but that percentage remains slightly more than 50%, with the remaining tweets being quite evenly split between negative and neutral.<figure id="0e6c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*urndJ0OQBUEq7YlrRjWEOg.png"><figcaption>Evidently, VADER doesn’t hander sarcasm very well</figcaption></figure>I took a quick look at the classified tweets and realised that VADER tends not to detect the correct sentiment when sarcasm is used, which isn’t too surprising. In this example, the tweet carries a negative sentiment towards the ruling party, but VADER identified a positive sentiment instead. Wow, awkward, indeed (sorry, I couldn’t resist).<h1 id="77b8">Sentiment analysis using BERT</h1>BERT (Bidirectional Encoder Representations from Transformers) is a state of the art NLP model developed by Google, and <a href="https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270">Rani’s article</a> does a pretty good job of explaining what it is about.For the purpose of this study, I’ll be using the BERT model fine-tuned by Preston using a <a href="https://www.kaggle.com/crowdflower/twitter-airline-sentiment">Kaggle airline Twitter dataset</a>, as detailed in his article <a href="https://readmedium.com/are-singa

Options

poreans-negative-nancies-a-sentiment-analysis-of-social-media-comments-using-bert-5a6e51b1c1e2">here</a>.How did BERT classify the tweets?<figure id="d507"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*o6qUAshQx94vrrWcGhfidQ.png"><figcaption>Breakdown of tweet sentiments for tweets and retweets</figcaption></figure>When retweets were included, BERT classified most of the tweets as negative, which is vastly different from VADER’s classification.<figure id="3054"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*8-Vdla67QGIBqvyQGm5s_w.png"><figcaption>Breakdown of tweet sentiments for tweets only</figcaption></figure>When only tweets were considered, BERT classified a smaller proportion of tweets as negative, but it still remains the majority sentiment.Remember that sarcastic tweet that VADER misclassified? Let’s see if BERT manages to identify the correct sentiment.<figure id="51b0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*WNpVyS4NmvsbnwotwcnVhw.png"><figcaption>Here, BERT does a good job with handling sarcastic comments</figcaption></figure>BERT successfully detects the negative sentiment in this tweet!<figure id="57cb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*4YxeEeR2YXtNx-8lYWo0Qw.png"><figcaption>But BERT isn’t perfect either</figcaption></figure>I took a quick look at some of the tweets classified by the model, and realised that BERT struggles with detecting the right sentiment at times. Here, we see an example with a negative sentiment, but BERT misclassifies it as positive. Then again, even I as a human find it difficult to correctly identify sentiments sometimes, so how can we expect machines to be perfect at the job?<h1 id="965d">What was the actual election outcome like?</h1>To summarise the actual results of the 2020 Singapore General Elections, while PAP kept their supermajority in parliament, it lost an additional constituency (Sengkang) to The Workers’ Party. The PAP’s average vote share declined to 61.24% which is its lowest vote share since the 2011 elections.BERT might be the better model at correctly classifying sentiments even in the presence of sarcasm, but given the largely negative sentiment on Twitter that it detected, I would have expected an even more drastic vote swing away from the ruling party. Perhaps, there is a certain degree of self-selection when it comes to turning to social media to express political opinions — those who are more vocal on Twitter tend to be critics of the ruling party, while supporters of the ruling party are more likely to remain silent on social media platforms.Another possible explanation would be the demographics of Twitter users. <a href="https://mediaonemarketing.com.sg/social-media-marketing-singapore-guide/#:~:text=In%202020%2C%20there%20are%201.37,aged%2013%2B%20is%2026%25.">Twitter users tend to be younger</a>, which would mean that even if there is no self-selection of Twitter users who express their political sentiments, tweet sentiments will only be representative of a subset of the Singapore population.<h1 id="88ca">Wrapping up</h1>One interesting extension to this study would be to go through the same process for Singapore’s more prominent opposition parties, such as WP, PSP and SDP, and then compare the proportion of sentiments to that of PAP.All in all, I had quite a lot of fun with this frivolous pet project, and all relevant codes and files can be found on GitHub <a href="https://github.com/meredithwan/GE2020">here</a>!Disclaimer: The opinions in this article are entirely my own, in no way do they reflect the views of the organisations that I am part of.</article></body>

Are Tweet sentiments reflective of the results in the 2020 Singapore General Elections?

Using VADER and BERT, I analyse the sentiments of Tweets pertaining to Singapore’s ruling party in the run-up to the 2020 General Elections.

A little over a month ago on 10 July, Singapore held its elections to elect members of the 14th Parliament of Singapore. What do you do when you’re really excited as a first-time voter who has a lot of spare time on her hands? You conduct a quick study to analyse the sentiments of Tweets and see if they reflect the actual results from the election. Okay, I guess I might be the only one who thinks in this manner (nerd alert), but anyway, let’s just dive straight into it!

Downloading Tweets

Using the Tweepy API, and with the help of the code used by Griffin in this article, I downloaded tweets using ‘PAP #GE2020’ as the search term.

PAP stands for the People’s Action Party, which is Singapore’s ruling party. The hashtag GE2020 is used by most people who tweeted about the 2020 Singapore general elections.

I deliberated quite a bit over what the appropriate search term was — simply using #GE2020 wouldn’t be quite right, as the tweets collected would also include those reflecting public sentiments towards opposition parties. Although the search term that I used would exclude tweets that did not mention PAP or use the hashtag GE2020 but were, in fact, talking about the ruling party, I felt that it was the closest that I could get to isolating the tweets reflecting the sentiments towards the ruling party.

I chose to include retweets as well, as I figured that Twitter users tend to retweet tweets that they resonated with. My dataset included tweets and retweets posted in between 6 July to 8 July, where the online political discourse was likely to be the most active since polling day (10 July) was coming up. By the way, you might be wondering why 9 July was excluded, it’s because that day is cooling-off day, where there is a prohibition of campaigning activities so as to allow voters to take a step back and reflect on issues before heading to the polls the following day.

I originally hoped to collect 50000 tweets and retweets but ended up getting a lot of duplicated data, probably because there aren’t that many tweets that fulfilled the criteria of my search term over a short span of 3 days (I also forgot how Singapore’s citizen population of around 3.5 million isn’t that large to begin with). My final dataset consisted of 2504 tweets and retweets, and 406 unique tweets.

Now, you may be curious about what are some of the commonly used words in these tweets. Let’s create word clouds to find out!

In the word cloud that includes both tweets and retweets, some of the terms that might be of interest would include ‘WP’, which stands for The Workers’ Party, ‘Hougang’ and ‘Aljunied’, which are constituencies currently held by the opposition party. The frequent mentioning of such terms in tweets could suggest that The Workers’ Party is the opposition party that is gaining the most attention, and that Hougang and Aljunied constituencies are likely to be one of the closely watched constituencies in the upcoming elections. Two other opposition parties, PSP (Progress Singapore Party) and SDP (Singapore Democratic Party) are also mentioned in the tweets quite often.

What about the word cloud for tweets only?

We observe a much larger variety of words commonly used in the tweets, but The Workers’ Party remains a prominent subject in many of the tweets.

While word clouds allow us to pick out commonly used words across the tweets in our dataset (which haven’t been too surprising so far), we don’t know how the sentiments of those tweets are like. That brings us to the next part, where I perform sentiment analysis using VADER and BERT.

Sentiment analysis using VADER

I’m sure most people are quite familiar with VADER (Valence Aware Dictionary and sEntiment Reasoner), so I won’t bore everyone with a lengthy explanation about it in this article. Instead, I’ll jump right into my findings, where I present the breakdown of tweets that are classified to be carrying positive, negative or neutral sentiment.

Breakdown of tweet sentiments for tweets and retweets

For the dataset that included retweets, VADER classified most of the tweets as positive, a quarter as negative and slightly less than 10% of them as neutral.

Breakdown of tweet sentiments for tweets only

After excluding retweets, VADER classified a smaller proportion of tweets as positive, but that percentage remains slightly more than 50%, with the remaining tweets being quite evenly split between negative and neutral.

Evidently, VADER doesn’t hander sarcasm very well

I took a quick look at the classified tweets and realised that VADER tends not to detect the correct sentiment when sarcasm is used, which isn’t too surprising. In this example, the tweet carries a negative sentiment towards the ruling party, but VADER identified a positive sentiment instead. Wow, awkward, indeed (sorry, I couldn’t resist).

Sentiment analysis using BERT

BERT (Bidirectional Encoder Representations from Transformers) is a state of the art NLP model developed by Google, and Rani’s article does a pretty good job of explaining what it is about.

For the purpose of this study, I’ll be using the BERT model fine-tuned by Preston using a Kaggle airline Twitter dataset, as detailed in his article here.

How did BERT classify the tweets?

When retweets were included, BERT classified most of the tweets as negative, which is vastly different from VADER’s classification.

When only tweets were considered, BERT classified a smaller proportion of tweets as negative, but it still remains the majority sentiment.

Remember that sarcastic tweet that VADER misclassified? Let’s see if BERT manages to identify the correct sentiment.

Here, BERT does a good job with handling sarcastic comments

BERT successfully detects the negative sentiment in this tweet!

I took a quick look at some of the tweets classified by the model, and realised that BERT struggles with detecting the right sentiment at times. Here, we see an example with a negative sentiment, but BERT misclassifies it as positive. Then again, even I as a human find it difficult to correctly identify sentiments sometimes, so how can we expect machines to be perfect at the job?

What was the actual election outcome like?

To summarise the actual results of the 2020 Singapore General Elections, while PAP kept their supermajority in parliament, it lost an additional constituency (Sengkang) to The Workers’ Party. The PAP’s average vote share declined to 61.24% which is its lowest vote share since the 2011 elections.

BERT might be the better model at correctly classifying sentiments even in the presence of sarcasm, but given the largely negative sentiment on Twitter that it detected, I would have expected an even more drastic vote swing away from the ruling party. Perhaps, there is a certain degree of self-selection when it comes to turning to social media to express political opinions — those who are more vocal on Twitter tend to be critics of the ruling party, while supporters of the ruling party are more likely to remain silent on social media platforms.

Another possible explanation would be the demographics of Twitter users. Twitter users tend to be younger, which would mean that even if there is no self-selection of Twitter users who express their political sentiments, tweet sentiments will only be representative of a subset of the Singapore population.

Wrapping up

One interesting extension to this study would be to go through the same process for Singapore’s more prominent opposition parties, such as WP, PSP and SDP, and then compare the proportion of sentiments to that of PAP.

All in all, I had quite a lot of fun with this frivolous pet project, and all relevant codes and files can be found on GitHub here!

Disclaimer: The opinions in this article are entirely my own, in no way do they reflect the views of the organisations that I am part of.