Free AI web copilot to create summaries, insights and extended knowledge, download it at here
4420
Abstract
</div>
</div>
</figure></iframe></div></div></figure><h2 id="11ab">Attention decoder</h2><p id="809a">Then we create a decoder neural network system with GRU and attention mechanism. Attention allows the decoder network to “focus” on a different part of the encoder’s outputs for every step of the decoder’s own outputs. First we calculate a set of attention weights. These will be multiplied by the encoder output vectors to create a weighted combination.</p><figure id="2626"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*xjT_acsYTGragyU78YaWjQ.png"><figcaption></figcaption></figure><p id="3a70">The result (called attn_applied in the code) should contain information about that specific part of the input sequence, and thus help the decoder choose the right output words. For every step the decoder can select a different part of the target sentence to consider based on previous hidden state. Here we use attention weights as a softmax function by concatenating encoder output and hidden layer from previous iteration. Multiple strategies for combining exists (concatenation, dot product and sum) and has to be experimented.</p><figure id="d9ee"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*SvPSxU5MMsHegEYgdVZoHA.png"><figcaption></figcaption></figure>
<figure id="5e4c">
<div>
<div>
<iframe class="gist-iframe" src="/gist/viveksasikumar/56daf3f4ec158987291b7571fbab3aa8.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><p id="82a3">The following is a picture I obtained from a MOOC about application of attention operation weights in a decoder. Please note that here, they are using LSTM instead of GRU. Attention mechanism is especially useful in language translation where the second word in a language-1 sentence might appear at the end in the language-2 sentence. Attention mechanism helps point to the right part of the sequence for each of the input words.</p><figure id="121c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*3Fq5y3buGaezWCk5uM-Pyg.png"><figcaption></figcaption></figure><p id="343c">Attention mechanism — Normally attention is a linear layer that takes input from both embedded output from encoder</p><h1 id="12b1">Preparing and training Data</h1><p id="fdd1">Now we create functions to prepare the training data including:</p><ol><li>Creating input & output tensors from the list and creating basic functions to track time & plot loss graphs while training</li><li>Train-iteration function which calls the optimizers for encoder & decoder, loss function</li><li>Train function with teacher forcing to run encoder training, get the output from encoder to decoder and train the decoder, backward propagation</li><li>Evaluation function to evaluate actual output string and predicted output string</li></ol><h2 id="65f5">1. Creating input & output tensors from the list and creating basic functions to track time & plot loss graphs while training</h2>
<figure id="81ff">
<div>
<div>
<iframe class="gist-iframe" src="/gist/viveksasikumar/69aaaa06810de097720e361cf4b1e2d0.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><h2 id="2b54">2. Train-iteration function which calls the optimizers for encoder & decoder and loss function</h2><p id="dd0a">In the following code, we are using SGD optimizer. We could also use <a href="https://pytorch.org/docs/stable/optim.html">other optimizers</a> such as Adam, ASGD, LBFGS, RMSProp etc.</p>
<figure id="7fad">
<div>
<div>
<iframe class="gist-iframe" src="/gist/viveksasikumar/c8af795614ab61f49242eb1231a3f4a7.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><h2 id="b970">3. Train function with teacher forcing to run encoder training, get the output from encoder to decoder and train the decoder, backward propagation</h2><figure id="99fc"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:
Options
800/1*uep_r-DEhKIbWVQvE5ua1w.png"><figcaption></figcaption></figure>
<figure id="3cbf">
<div>
<div>
<iframe class="gist-iframe" src="/gist/viveksasikumar/a1c5eaee73fff274f3bf5a39ee20a1fc.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><h2 id="2164">3. Train function with teacher forcing to run encoder training, get the output from encoder to decoder and train the decoder, backward propagation</h2>
<figure id="2251">
<div>
<div>
<iframe class="gist-iframe" src="/gist/viveksasikumar/c1c007c1cb78675287541ffdf73550cc.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><h2 id="400a">4. Evaluation function to evaluate actual output string and predicted output string</h2>
<figure id="1b30">
<div>
<div>
<iframe class="gist-iframe" src="/gist/viveksasikumar/c1c007c1cb78675287541ffdf73550cc.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><h1 id="746d">Finally…</h1><p id="8a42">We call the functions to train the models.</p>
<figure id="e450">
<div>
<div>
<iframe class="gist-iframe" src="/gist/viveksasikumar/9b3db13c1b5a5f6658cf2bc234c82292.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
</div>
</div>
</figure></iframe></div></div></figure><p id="79f3">After training, we can enter a new string with function evaluateAndShowAttention(“Trial_String…”) to get the output.</p><figure id="825c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*CunhgMg1T97bTkSzhBgR2g.png"><figcaption>English to Swedish translation</figcaption></figure><figure id="36ae"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*zOQnvrY6sbuXpFqwLmDfXg.png"><figcaption>Attention mechanism view</figcaption></figure><p id="d501">With a single GRU layer in encoder & decoder, teacher forcing and hidden size of 256, we get a marginally good model.</p><p id="f10d">Future experiments:</p><ol><li>We can experiment with attention models of <i>dot product</i> and <i>sum</i> of encoder output & hidden layer output from the GRU.</li><li>More GRU layers</li><li>More hidden states</li><li>Adding Spatial 1-D dropouts between layers</li><li>Hyper-parameter tuning</li><li>Use the same architecture to create answering tools and chatbots with different datasets</li><li>Experiment with BERT pretrained model from Google</li></ol><p id="b893">Let me know what you think! I have used a lot of PyTorch tutorials, GitHub repos, MOOCs and blogs to put together this article. Please feel free to comment and advise me on better ways to run these models.</p><p id="c715">Github link: <a href="https://github.com/viveksasikumar/Deep-Learning/blob/master/Final%20Project%20-%20Seq2Seq%20Attention%20%26%20Teacher%20Forcing%20v1.ipynb">https://github.com/viveksasikumar/Deep-Learning/blob/master/Final%20Project%20-%20Seq2Seq%20Attention%20%26%20Teacher%20Forcing%20v1.ipynb</a></p><h1 id="e27d">Reference</h1><ol><li><a href="https://pytorch.org/tutorials/beginner/chatbot_tutorial.html">https://pytorch.org/tutorials/beginner/chatbot_tutorial.html</a></li><li><a href="https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html">https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html</a></li><li><a href="https://machinelearningmastery.com/encoder-decoder-recurrent-neural-network-models-neural-machine-translation/">https://machinelearningmastery.com/encoder-decoder-recurrent-neural-network-models-neural-machine-translation/</a></li><li><a href="https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be">https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be</a></li><li><a href="https://www.udemy.com/applied-deep-learning-build-a-chatbot-theory-application/">https://www.udemy.com/applied-deep-learning-build-a-chatbot-theory-application/</a></li></ol></article></body>