avatarsawan saxena

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

6721

Abstract

s-number">0.04943105</span>, <span class="hljs-number">0.04145898</span>], [ <span class="hljs-number">0.04208959</span>, -<span class="hljs-number">0.00412361</span>, -<span class="hljs-number">0.04585704</span>, <span class="hljs-number">0.03489918</span>], [-<span class="hljs-number">0.04016889</span>, <span class="hljs-number">0.03448426</span>, <span class="hljs-number">0.00623332</span>, <span class="hljs-number">0.02844917</span>]], dtype=float32)]</pre></div><p id="6ca8">These weights are basically the vector representations of the words in vocabulary. As we discussed earlier, this is a lookup table of size 10 x 4, for words 0 to 9. The first word (0) is represented by first row in this table, which is</p><div id="d197"><pre>[-<span class="hljs-number">0.04333381</span>, -<span class="hljs-number">0.02326865</span>, -<span class="hljs-number">0.00812379</span>, <span class="hljs-number">0.02167496</span>]</pre></div><p id="a96d"><b>Note: </b>In this example we have not trained the embedding layer. The weights assigned to the word vectors are initialized randomly.</p><p id="3dd6">This was a nice example to start with. But while working with actual text data, we need to train the embedding layer to get the correct word embeddings. Lets see how to do it using restaurant reviews data.</p><h2 id="8116">Restaurant Review Classification</h2><p id="c77b">We will be performing following steps while solving this problem.</p><ol><li>Tokenize the sentences into words.</li><li>Create one-hot encoded vector for each word.</li><li>Use padding to ensure all sequences are of same length.</li><li>Pass the padded sequences as input to embedding layer.</li><li>Flatten and apply Dense layer to predict the label.</li></ol><p id="d6f2">We start by importing required libraries</p><div id="8d8c"><pre><span class="hljs-keyword">from</span> numpy <span class="hljs-keyword">import</span> <span class="hljs-keyword">array</span> <span class="hljs-keyword">from</span> tensorflow.keras.preprocessing.text <span class="hljs-keyword">import</span> one_hot <span class="hljs-keyword">from</span> tensorflow.keras.preprocessing.<span class="hljs-keyword">sequence</span> <span class="hljs-keyword">import</span> pad_sequences <span class="hljs-keyword">from</span> tensorflow.keras.models <span class="hljs-keyword">import</span> Sequential <span class="hljs-keyword">from</span> tensorflow.keras.layers <span class="hljs-keyword">import</span> Flatten,Embedding,Dense</pre></div><p id="b4fa">To make it simple, we will be using total of 10 reviews. Half of them are positive, represented by 0 and other half being negative, represented by 1.</p><div id="bba1"><pre><span class="hljs-comment"># Define 10 restaurant reviews</span> <span class="hljs-attr">reviews</span> =[ <span class="hljs-string">'Never coming back!'</span>, <span class="hljs-string">'horrible service'</span>, <span class="hljs-string">'rude waitress'</span>, <span class="hljs-string">'cold food'</span>, <span class="hljs-string">'horrible food!'</span>, <span class="hljs-string">'awesome'</span>, <span class="hljs-string">'awesome services!'</span>, <span class="hljs-string">'rocks'</span>, <span class="hljs-string">'poor work'</span>, <span class="hljs-string">'couldn't have done better'</span> ]</pre></div><div id="6b3e"><pre><span class="hljs-comment">#Define labels</span> <span class="hljs-attribute">labels</span> = array([<span class="hljs-number">1</span>,<span class="hljs-number">1</span>,<span class="hljs-number">1</span>,<span class="hljs-number">1</span>,<span class="hljs-number">1</span>,<span class="hljs-number">0</span>,<span class="hljs-number">0</span>,<span class="hljs-number">0</span>,<span class="hljs-number">0</span>,<span class="hljs-number">0</span>])</pre></div><p id="dec5">We will take vocabulary size as 50 and one-hot encode the words using one_hot function from Keras.</p><div id="fad5"><pre>Vocab_size = <span class="hljs-number">50</span> encoded_reviews = <span class="hljs-selector-attr">[one_hot(d,Vocab_size) for d in reviews]</span> <span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(f<span class="hljs-string">'encoded reviews: {encoded_reviews}'</span>)</span></span></pre></div><p id="3505">We will get the results as following encoded reviews.</p><div id="5eea"><pre>encoded reviews: <span class="hljs-comment">[<span class="hljs-comment">[18, 39, 17]</span>, <span class="hljs-comment">[27, 27]</span>, <span class="hljs-comment">[5, 19]</span>, <span class="hljs-comment">[41, 29]</span>, <span class="hljs-comment">[27, 29]</span>, <span class="hljs-comment">[2]</span>, <span class="hljs-comment">[2, 1]</span>, <span class="hljs-comment">[49]</span>, <span class="hljs-comment">[26, 9]</span>, <span class="hljs-comment">[6, 9, 11, 21]</span>]</span></pre></div><p id="00fc">Here you can see the length of each encoded review is equal to the number of words in that review. Keras one_hot is basically converting each word into its one-hot encoded index. Now we need to apply padding so that all the encoded reviews are of same length. Let’s define 4 as the maximum length and pad the encoded vectors with 0’s in the end.</p><div id="e5d2"><pre>max_length = <span class="hljs-number">4</span> padded_reviews = <span class="hljs-built_in">pad_sequences</span>(encoded_reviews,maxlen=max_length,<span class="hljs-attribute">padding</span>=<span class="hljs-string">'post'</span>) <span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(padded_reviews)</span></span></pre></div><p id="e0e1">The padded and encoded reviews will be like this.</p><div id="e73b"><pre>[[18<span class="hljs-number"> 39 </span>17 0] [27<span class="hljs-number"> 27 </span><span class="hljs-number"> 0 </span> 0] [<span class="hljs-number"> 5 </span>19 <span class="hljs-number"> 0 </span> 0] [41<span class="hljs-number"> 29 </span><span class="hljs-number"> 0 </span> 0] [27<span class="hljs-number"> 29 </span><span class="hljs-number"> 0 </span> 0] [<span class="hljs-number"> 2 </span><span class="hljs-number"> 0 </span><span class="hljs-number"> 0 </span> 0] [<span class="hljs-number"> 2 </span><span class="hljs-number"> 1 </span><span class="hljs-number"> 0 </span> 0] [49 <span class="hljs-number"> 0 </span><span class="hljs-number"> 0 </span> 0] [26 <span class="hljs-number"> 9 </span><span class="hljs-number"> 0 </span> 0] [<span class="hljs-number"> 6 </span><span class="hljs-number"> 9 </span>11 21]]</pre></div><p id="2cfd">After creating padded one-hot representation of

Options

the reviews, we are ready to pass it as input to the embedding layer. In the following code snippet, we create a simple Keras model. We will fix the length of embedded vectors for each word as 8 and the input length will be the maximum length which we have already defined as 4.</p><div id="df9e"><pre>model = Sequential() embedding_layer = Embedding(<span class="hljs-attribute">input_dim</span>=Vocab_size,output_dim=8,input_length=max_length) model.<span class="hljs-built_in">add</span>(embedding_layer) model.<span class="hljs-built_in">add</span>(Flatten()) model.<span class="hljs-built_in">add</span>(Dense(1,<span class="hljs-attribute">activation</span>=<span class="hljs-string">'sigmoid'</span>)) model.compile(<span class="hljs-attribute">optimizer</span>=<span class="hljs-string">'adam'</span>,loss='binary_crossentropy',metrics=[<span class="hljs-string">'acc'</span>])</pre></div><div id="d648"><pre><span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(model.summary()</span></span>)</pre></div><p id="577d">The model summary will be.</p><div id="5c0d"><pre>Model: "sequential<span class="hljs-emphasis">1" <span class="hljs-strong"></span><span class="hljs-strong">__</span><span class="hljs-strong"></span><span class="hljs-strong">__</span><span class="hljs-strong"></span><span class="hljs-strong">__</span><span class="hljs-strong"></span><span class="hljs-strong">__</span><span class="hljs-strong"></span><span class="hljs-strong">__</span><span class="hljs-strong"></span><span class="hljs-strong">__</span><span class="hljs-strong"></span><span class="hljs-strong">__</span><span class="hljs-strong"></span><span class="hljs-strong">__</span></span> <span class="hljs-section">Layer (type) Output Shape Param #
=================================================================</span> embedding<span class="hljs-emphasis">1 (Embedding) (None, 4, 8) 400
<span class="hljs-strong">
</span><span class="hljs-strong">__</span><span class="hljs-strong"></span><span class="hljs-strong">__</span><span class="hljs-strong"></span><span class="hljs-strong">__</span><span class="hljs-strong"></span><span class="hljs-strong">__</span><span class="hljs-strong"></span><span class="hljs-strong">__</span><span class="hljs-strong"></span><span class="hljs-strong">__</span><span class="hljs-strong"></span><span class="hljs-strong">__</span><span class="hljs-strong"></span><span class="hljs-strong">__</span></span> flatten (Flatten) (None, 32) 0
<span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span>_ <span class="hljs-section">dense (Dense) (None, 1) 33
=================================================================</span> Total params: 433 Trainable params: 433 Non-trainable params: 0 <span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span><span class="hljs-strong"></span>_ None</pre></div><p id="070d">Next, we will train the model for 100 epochs.</p><div id="5fb0"><pre><span class="hljs-attribute">model</span>.fit(padded_reviews,labels,epochs=<span class="hljs-number">100</span>,verbose=<span class="hljs-number">0</span>)</pre></div><p id="9d79">Once the training is completed, embedding layer has learnt the weights which are nothing but the vector representations of each word. Lets check the shape of the weight matrix.</p><div id="6fab"><pre><span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(embedding_layer.get_weights()</span></span><span class="hljs-selector-attr">[0]</span>.shape)</pre></div><p id="16fe">This embedding matrix is essentially a lookup table of 50 rows and 8 columns, as evident by the output.</p><div id="71f3"><pre>(<span class="hljs-number">50</span><span class="hljs-punctuation">,</span> <span class="hljs-number">8</span>)</pre></div><p id="0a27">If we check the embeddings for the first word, we get the following vector.</p><div id="e1a4"><pre>[ <span class="hljs-number">0.056933</span> <span class="hljs-number">0.0951985</span> <span class="hljs-number">0.07193055</span> <span class="hljs-number">0.13863552</span> -<span class="hljs-number">0.13165753</span> <span class="hljs-number">0.07380469</span> <span class="hljs-number">0.10305451</span> -<span class="hljs-number">0.10652688</span>]</pre></div><p id="c23e">So this is how we train an embedding layer on our text corpus and get the embedded vectors for each word. These vectors are then used to represent words in a sentence.</p><h2 id="fe66">Conclusion</h2><p id="b14b">Embeddings are a great way to deal with NLP problems because of two reasons. First it helps in dimensionality reduction over one-hot encoding as we can control the number of features. Second it is capable of understanding the context of a word so that similar words have similar embeddings. <a href="https://readmedium.com/deep-nlp-word-vectors-with-word2vec-d62cb29b40b3">This</a> is a great article explaining the working of word embeddings in detail.</p><p id="46e2">Please let me know in comments if you find this article useful. I am a data science enthusiast and blogger. You can reach out to me on my LinkedIn <a href="https://www.linkedin.com/in/sawan-saxena-640a4475/">profile</a>.</p><p id="deab">Thanks for reading.</p><p id="564b"><b>References</b></p><ul><li>What are Embedding Layers in Keras (11.3) by Jeff Heaton : <a href="https://www.youtube.com/watch?v=OuNH5kT-aD0">https://www.youtube.com/watch?v=OuNH5kT-aD0</a>.</li></ul></article></body>

Understanding Embedding Layer in Keras

In deep learning, embedding layer sounds like an enigma until you get the hold of it. Since embedding layer is an essential part of neural networks, it is important to understand the working of it. In this article, I will try to explain what is embedding layer, what is the need of it and how it works, along with some coding examples. So let’s get started.

What is Embedding Layer

Embedding layer is one of the available layers in Keras. This is mainly used in Natural Language Processing related applications such as language modeling, but it can also be used with other tasks that involve neural networks. While dealing with NLP problems, we can use pre-trained word embeddings such as GloVe. Alternatively we can also train our own embeddings using Keras embedding layer.

Need of Embeddings

Word embeddings can be thought of as an alternate to one-hot encoding along with dimensionality reduction.

As we know while dealing with textual data, we need to convert it into numbers before feeding into any machine learning model, including neural networks. For simplicity words can be compared to categorical variables. We use one-hot encoding to convert categorical features into numbers. To do so, we create dummy features for each of the category and populate them with 0’s and 1's.

Similarly if we use one-hot encoding on words in textual data, we will have a dummy feature for each word, which means 10,000 features for a vocabulary of 10,000 words. This is not a feasible embedding approach as it demands large storage space for the word vectors and reduces model efficiency.

Embedding layer enables us to convert each word into a fixed length vector of defined size. The resultant vector is a dense one with having real values instead of just 0’s and 1’s. The fixed length of word vectors helps us to represent words in a better way along with reduced dimensions.

This way embedding layer works like a lookup table. The words are the keys in this table, while the dense word vectors are the values. To understand it better, let’s look at the implementation of Keras Embedding layer.

Implementation in Keras

Let’s start by importing the required libraries.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding
import numpy as np

We can create a simple Keras model by just adding an embedding layer.

model = Sequential()
embedding_layer = Embedding(input_dim=10,output_dim=4,input_length=2)
model.add(embedding_layer)
model.compile('adam','mse')

There are three parameters to the embedding layer

  • input_dim : Size of the vocabulary
  • output_dim : Length of the vector for each word
  • input_length : Maximum length of a sequence

In the above example, we are setting 10 as the vocabulary size, as we will be encoding numbers 0 to 9. We want the length of the word vector to be 4, hence output_dim is set to 4. The length of the input sequence to embedding layer will be 2.

Now, lets pass a sample input to our model and see the results.

input_data = np.array([[1,2]])
pred = model.predict(input_data)
print(input_data.shape)
print(pred)

The output of the above code will be following.

(1, 2)
[[[ 0.04502351  0.00151128  0.01764284 -0.0089057 ]
  [-0.04007018  0.02874336  0.02772436  0.00842067]]]

As you can see, each word (1 and 2) is represented by a vector of length 4. If we print the weights of the embedding layer, we get below result.

[array([[-0.04333381, -0.02326865, -0.00812379,  0.02167496],
        [ 0.04502351,  0.00151128,  0.01764284, -0.0089057 ],
        [-0.04007018,  0.02874336,  0.02772436,  0.00842067],
        [ 0.00512743,  0.03695237, -0.02774147, -0.03748262],
        [ 0.02066498, -0.01512628, -0.03989452,  0.00809463],
        [-0.02207369,  0.02889762, -0.01229819, -0.03157005],
        [ 0.02565557,  0.02931032, -0.01611946, -0.00105535],
        [ 0.03920721,  0.04009463, -0.04943105,  0.04145898],
        [ 0.04208959, -0.00412361, -0.04585704,  0.03489918],
        [-0.04016889,  0.03448426,  0.00623332,  0.02844917]],
       dtype=float32)]

These weights are basically the vector representations of the words in vocabulary. As we discussed earlier, this is a lookup table of size 10 x 4, for words 0 to 9. The first word (0) is represented by first row in this table, which is

[-0.04333381, -0.02326865, -0.00812379,  0.02167496]

Note: In this example we have not trained the embedding layer. The weights assigned to the word vectors are initialized randomly.

This was a nice example to start with. But while working with actual text data, we need to train the embedding layer to get the correct word embeddings. Lets see how to do it using restaurant reviews data.

Restaurant Review Classification

We will be performing following steps while solving this problem.

  1. Tokenize the sentences into words.
  2. Create one-hot encoded vector for each word.
  3. Use padding to ensure all sequences are of same length.
  4. Pass the padded sequences as input to embedding layer.
  5. Flatten and apply Dense layer to predict the label.

We start by importing required libraries

from numpy import array
from tensorflow.keras.preprocessing.text import one_hot
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten,Embedding,Dense

To make it simple, we will be using total of 10 reviews. Half of them are positive, represented by 0 and other half being negative, represented by 1.

# Define 10 restaurant reviews
reviews =[
          'Never coming back!',
          'horrible service',
          'rude waitress',
          'cold food',
          'horrible food!',
          'awesome',
          'awesome services!',
          'rocks',
          'poor work',
          'couldn\'t have done better'
]
#Define labels
labels = array([1,1,1,1,1,0,0,0,0,0])

We will take vocabulary size as 50 and one-hot encode the words using one_hot function from Keras.

Vocab_size = 50
encoded_reviews = [one_hot(d,Vocab_size) for d in reviews]
print(f'encoded reviews: {encoded_reviews}')

We will get the results as following encoded reviews.

encoded reviews: [[18, 39, 17], [27, 27], [5, 19], [41, 29], [27, 29], [2], [2, 1], [49], [26, 9], [6, 9, 11, 21]]

Here you can see the length of each encoded review is equal to the number of words in that review. Keras one_hot is basically converting each word into its one-hot encoded index. Now we need to apply padding so that all the encoded reviews are of same length. Let’s define 4 as the maximum length and pad the encoded vectors with 0’s in the end.

max_length = 4
padded_reviews = pad_sequences(encoded_reviews,maxlen=max_length,padding='post')
print(padded_reviews)

The padded and encoded reviews will be like this.

[[18 39 17  0]
 [27 27  0  0]
 [ 5 19  0  0]
 [41 29  0  0]
 [27 29  0  0]
 [ 2  0  0  0]
 [ 2  1  0  0]
 [49  0  0  0]
 [26  9  0  0]
 [ 6  9 11 21]]

After creating padded one-hot representation of the reviews, we are ready to pass it as input to the embedding layer. In the following code snippet, we create a simple Keras model. We will fix the length of embedded vectors for each word as 8 and the input length will be the maximum length which we have already defined as 4.

model = Sequential()
embedding_layer = Embedding(input_dim=Vocab_size,output_dim=8,input_length=max_length)
model.add(embedding_layer)
model.add(Flatten())
model.add(Dense(1,activation='sigmoid'))
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['acc'])
print(model.summary())

The model summary will be.

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, 4, 8)              400       
_________________________________________________________________
flatten (Flatten)            (None, 32)                0         
_________________________________________________________________
dense (Dense)                (None, 1)                 33        
=================================================================
Total params: 433
Trainable params: 433
Non-trainable params: 0
_________________________________________________________________
None

Next, we will train the model for 100 epochs.

model.fit(padded_reviews,labels,epochs=100,verbose=0)

Once the training is completed, embedding layer has learnt the weights which are nothing but the vector representations of each word. Lets check the shape of the weight matrix.

print(embedding_layer.get_weights()[0].shape)

This embedding matrix is essentially a lookup table of 50 rows and 8 columns, as evident by the output.

(50, 8)

If we check the embeddings for the first word, we get the following vector.

[ 0.056933    0.0951985   0.07193055  0.13863552 -0.13165753  0.07380469    0.10305451 -0.10652688]

So this is how we train an embedding layer on our text corpus and get the embedded vectors for each word. These vectors are then used to represent words in a sentence.

Conclusion

Embeddings are a great way to deal with NLP problems because of two reasons. First it helps in dimensionality reduction over one-hot encoding as we can control the number of features. Second it is capable of understanding the context of a word so that similar words have similar embeddings. This is a great article explaining the working of word embeddings in detail.

Please let me know in comments if you find this article useful. I am a data science enthusiast and blogger. You can reach out to me on my LinkedIn profile.

Thanks for reading.

References

NLP
Deep Learning
Embedding
Machine Learning
Neural Networks
Recommended from ReadMedium