Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2170

Abstract

ownstream accuracy of your language models in more ways than you think.Take an NLP task like sentiment analysis for example.Would your sentiment model be better if the word embeddings used captured more of the semantics of the words (the meaning of the words) or the syntactic (English grammatical structure) relationships between them?Readily available word embeddings such as Global Vectors (GloVe) by Stanford seem to perform better on semantically related tasks as shown in their <a href="https://nlp.stanford.edu/pubs/glove.pdf">research paper</a>.<figure id="3b26"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*5w7H-Y3leWU7bBd-4pSmGQ.png"><figcaption>Figure 1 — Example of a Semantic Relationship Task</figcaption></figure>Word2Vec by Google on the other hand, albeit does poorer than GloVe on most NLP task, seem to perform better on syntactically related tasks on its own (<a href="https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf">source</a>).<figure id="91d8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*JoI0F0TIM36fk485gzZxCA.png"><figcaption>Figure 2 — Example of Syntactic Relationship Task</figcaption></figure>Having said that, both semantic and syntactic relationships are equally important for good performance in language models.Alas, nothing is perfect in this world.There aren’t any pre-trained word embeddings that are superb at both. I think on a personal note, GloVe does an amazing job for most NLP tasks.Besides, the world seemed to have moved away from pre-trained word embeddings already.Attention models are the “in” thing now.Take a look at this recent article by Google on 25th October, <a href="https://blog.google/products/search/search-language-understanding-bert">Understanding searches better than ever before</a>.Google has recently implemented their famed algorithm “BERT” which is an atte

Options

ntion model — Attention models basically have the ability to give different word embeddings to the same word used in different context.For example:“I went to the bank to deposit my paycheck. It was located by a river bank.”As a human, we know the first use of the word “bank” refers to the financial institution while the last instance of the word refers to the river.Attention models gives different word embeddings to the word “bank” as in financial bank vs “bank” as in river bank.Pretty cool stuff eh?Just to be fair though, I would like to point out <a href="https://arxiv.org/abs/1905.07129">ERNIE </a>by Baidu as well.ERNIE is actually the latest and greatest attention model in the NLP world. Thus far, ERNIE outperforms BERT in ALL standard base-line NLP tasks and even works on Mandarin. Not sure why people aren’t talking about it much though.Sorry got distracted talking about attention models.Let’s get back to the initial topic!<h1 id="3718">Ending Notes</h1>I think what I wanted to convey in this short article was this:There are no magical pre-trained word embeddings for ALL your NLP tasks.You have to keep in mind the NLP task you are trying to solve and train the type of word embedding that is best for it.Take my <a href="https://towardsdatascience.com/creating-word-embeddings-for-out-of-vocabulary-oov-words-such-as-singlish-3fe33083d466">Singlish article</a> for example, I am definitely not able to use GloVe or Word2Vec for it.Remember, word embeddings will ultimately affect your downstream accuracy.Rubbish in, rubbish out.Hope this short article on the nuances involved when working with word embeddings gives you some food for thought!Till next time, bye!LinkedIn Profile: <a href="https://www.linkedin.com/in/timothy-tan-97587190/">Timothy Tan</a></article></body>

Nuances in the usage of Word Embeddings: Semantic and Syntactic Relationships

Note: Super short post ahead. Just food for thought I guess? :)

Introduction

In the past weeks, I’ve been writing about Word Embeddings.

How I created word embeddings from scratch for a colloquial language such as Singlish, and how I augmented it to handle misspellings or out-of-vocabulary words with translation vectors.

In the latter article, I tested the effects of my experiment on downstream text classification accuracy.

However, I have come to realise from more readings and research, the nuances in the usage of word embeddings and what it truly entails.

Allow me to elaborate.

You see, using word embeddings for Natural Language Processing (NLP) is one thing, everyone can do it.

But…

Understanding the implications it has on downstream tasks is another.

And to understand the implications it has, you first need to know what semantic and syntactic relationships were learned by the word embeddings being used.

What are Semantic and Syntactic relationships?

I think the “what” will be made clear when I talk about the “why”.

So why would this even matter?

It matters because it affects your downstream accuracy of your language models in more ways than you think.

Take an NLP task like sentiment analysis for example.

Would your sentiment model be better if the word embeddings used captured more of the semantics of the words (the meaning of the words) or the syntactic (English grammatical structure) relationships between them?

Readily available word embeddings such as Global Vectors (GloVe) by Stanford seem to perform better on semantically related tasks as shown in their research paper.

Figure 1 — Example of a Semantic Relationship Task

Word2Vec by Google on the other hand, albeit does poorer than GloVe on most NLP task, seem to perform better on syntactically related tasks on its own (source).

Figure 2 — Example of Syntactic Relationship Task

Having said that, both semantic and syntactic relationships are equally important for good performance in language models.

Alas, nothing is perfect in this world.

There aren’t any pre-trained word embeddings that are superb at both. I think on a personal note, GloVe does an amazing job for most NLP tasks.

Besides, the world seemed to have moved away from pre-trained word embeddings already.

Attention models are the “in” thing now.

Take a look at this recent article by Google on 25th October, Understanding searches better than ever before.

Google has recently implemented their famed algorithm “BERT” which is an attention model — Attention models basically have the ability to give different word embeddings to the same word used in different context.

For example:

“I went to the bank to deposit my paycheck. It was located by a river bank.”

As a human, we know the first use of the word “bank” refers to the financial institution while the last instance of the word refers to the river.

Attention models gives different word embeddings to the word “bank” as in financial bank vs “bank” as in river bank.

Pretty cool stuff eh?

Just to be fair though, I would like to point out ERNIE by Baidu as well.

ERNIE is actually the latest and greatest attention model in the NLP world. Thus far, ERNIE outperforms BERT in ALL standard base-line NLP tasks and even works on Mandarin. Not sure why people aren’t talking about it much though.

Sorry got distracted talking about attention models.

Let’s get back to the initial topic!

Ending Notes

I think what I wanted to convey in this short article was this:

There are no magical pre-trained word embeddings for ALL your NLP tasks.

You have to keep in mind the NLP task you are trying to solve and train the type of word embedding that is best for it.

Take my Singlish article for example, I am definitely not able to use GloVe or Word2Vec for it.

Remember, word embeddings will ultimately affect your downstream accuracy.

Rubbish in, rubbish out.

Hope this short article on the nuances involved when working with word embeddings gives you some food for thought!

Till next time, bye!

LinkedIn Profile: Timothy Tan