avatarSachinsoni

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3458

Abstract

M Architecture :</h1><p id="6751">The <b>architecture of LSTM includes three key components: the forget gate</b>, which decides what information to discard from the long-term memory, the <b>input gate</b>, which determines what new information to store in the long-term memory, and the <b>output gate</b> in LSTM determines what</p><figure id="3952"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*uZDRGRrEN5N6XQCGRIfqsA.png"><figcaption>RNN vs LSTM</figcaption></figure><figure id="6744"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ZJiT_90tHmrrB_tFIaCb4A.png"><figcaption>Components of LSTM</figcaption></figure><p id="5c09">information from the long-term memory is used to produce the final output of the LSTM cell at a particular time step. It regulates the flow of information from the long-term memory to the current cell output, ensuring that only relevant information is considered in generating the output.</p><h1 id="ec02">LSTM Working :</h1><p id="7ef8">In the LSTM model, we can think of three main stages: input, processing, and output. At the input stage, we have three inputs: the input for the current state, previous cell state, and previous hidden state. During processing, the model updates the cell state(c0 -> ct) and calculates the new hidden state (h0 -> ht).</p><figure id="7ff5"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*hXpZ7G5anc8979TklwZ0cg.png"><figcaption></figcaption></figure><p id="aaf3">Finally, at the output stage, we have two outputs: the current cell state and the current hidden state. These outputs provide the information needed for further processing or decision-making in the model.</p><h1 id="9733">Understanding the gates in LSTM :</h1><p id="be82">The architecture of LSTM includes three key components: the forget gate, which decides what information to discard from the long-term memory,</p><figure id="45c6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*hVFWB9u70ghMGKhjpEU8mg.png"><figcaption></figcaption></figure><p id="13f3">the input gate, which determines what new information to store in the long-term memory, and the output gate, which decides what information to use from the long-term memory to produce the output.</p><figure id="8523"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*qJpq7FRq4j-vdT-dCRsW5g.png"><figcaption></figcaption></figure><h2 id="2103">What are ht, ct, xt, ft , it, c’t and ot ?</h2><figure id="b546"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*6a0cecAOSGQg1Z8z_jB73w.png"><figcaption></figcaption></figure><p id="29c4">The yellow color boxes represent neural network layers with a specified number of nodes, which is indeed a hyperparameter determined by the user. In an LSTM cell, each gate (forget gate, input gate, and output gate) and the candidate cell state computation involve a neural network layer</p><figure id="35b1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*qm44-LwbG3sfpuAUxB6m1A.png"><figcaption></figcaption></figure><p id="9cbc">with the same number of nodes, and they use either the sigmoid or the tanh activation function.</p><h2 id="17d5">Pointwise Operations :</h2><figure id="cec8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*bE9_O45SoGbdumerI8MZdg.png"><figcaption></figcaption></figure><figure id="311c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*2j5szA

Options

lkMBGGcWgoKTCAwg.png"><figcaption></figcaption></figure><h1 id="d8ae">Forget Gate workflow :</h1><figure id="f4ff"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*hoX6p1mCfDiVAhjvx529oA.png"><figcaption></figcaption></figure><figure id="22c1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*annp0b_JL0Qwd0aLv9z6Lg.png"><figcaption></figcaption></figure><figure id="0bc8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*r8_pogE0KyhVKlT5_NUwww.png"><figcaption></figcaption></figure><h2 id="2bd5">How forget controls the long term context ?</h2><p id="02a9">Let’s take an example to illustrate how the forget gate controls the long-term context and why it’s called the “forget” gate.</p><figure id="0d0b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*xb7aKxaftgN272Ol9LQq0w.png"><figcaption></figcaption></figure><figure id="5d26"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*zDR2e_-IHc0_qfvfSVAb-Q.png"><figcaption></figcaption></figure><figure id="98d2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*XzBKVVUWkIkEVGmaNHpqKA.png"><figcaption></figcaption></figure><h1 id="2974">Input Gate workflow :</h1><figure id="aff7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*9elPQnl8zphhrV8bNLR3cA.png"><figcaption></figcaption></figure><figure id="927f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*vnTJeqwdV_3RaPgn8HsA2w.png"><figcaption></figcaption></figure><figure id="8286"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*l9Ex2SEIe4k3gl60eaYifg.png"><figcaption></figcaption></figure><figure id="d31e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*JzJEEdZA7v9mVTzT93AqkQ.png"><figcaption></figcaption></figure><h1 id="10b2">Output Gate workflow :</h1><figure id="ab13"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*iMSomDAKQbMqRz8GN6yDlg.png"><figcaption></figcaption></figure><figure id="cd0e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*RrL49zcJ4F6wYdshz78iiQ.png"><figcaption></figcaption></figure><figure id="95c9"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Km2rh4TCxy8n80jUZKTvnA.png"><figcaption></figcaption></figure><blockquote id="6bfd"><p><b>Now see the following LSTM animation to understand the working of LSTM :</b></p></blockquote><figure id="7728"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*NIFLmgpE4TDRBwh4.gif"><figcaption></figcaption></figure><h1 id="86f1">References :</h1><div id="e76d" class="link-block"> <a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/"> <div> <div> <h2>Understanding LSTM Networks</h2> <div><h3>Posted on August 27, 2015 Humans don't start their thinking from scratch every second. As you read this essay, you…</h3></div> <div><p>colah.github.io</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/)"></div> </div> </div> </a> </div><p id="d1d6">I hope this blog has enhanced your fundamental knowledge of LSTM basics concept. If you’ve gained value from this content, consider following me for more insightful posts. Appreciate your time in reading this article. Thank you!</p></article></body>

Unlocking the Power of Long Short-Term Memory (LSTM) Networks

In today’s world, where data tells stories through sequences, Long Short-Term Memory (LSTM) networks emerge as powerful tools. In this introductory journey, we’ll unravel the mysteries of LSTM networks, exploring their architecture and practical applications. Whether you’re a newcomer to neural networks or a seasoned data enthusiast, this guide aims to simplify the complexities of LSTMs, empowering you to unlock their potential in your projects.

LSTM Architecture

Let me take an example to explain the core idea behind LSTM :

Once upon a time, King Vikram fought bravely against King XYZ and won. But he passed away. Then his son, Vikram Junior, took over. He was even braver than his dad, but sadly, he also died in a battle with King XYZ. Then Vikram Junior’s son, Vikram Super Junior, became king. He wasn’t as strong as his dad and granddad, but he fought King XYZ. Even though it looked like he might lose, he used his smarts to beat King XYZ and get revenge for his family.

After reading the story or any other sequential data, our minds process information word by word, initially focusing on short-term context. For instance, as the story begins with an ancient tale involving King Vikram, our immediate attention is drawn to the events unfolding in the present. However, as the narrative progresses, our minds naturally transition to creating and maintaining long-term context. For example, upon encountering the mention of King Vikram’s demise, we adjust our long-term context accordingly. Subsequently, as new characters like Vikram Junior and Vikram Super Junior are introduced, our minds adapt by integrating them into the evolving long-term context. Each time a character’s role in the story concludes, we update our long-term context accordingly, akin to the way LSTM (Long Short-Term Memory) networks operate, dynamically adjusting their memory of past events as new information is processed.

In the case of RNNs, each line of information carries the burden of maintaining both short and long-term context. However, mathematically, it’s challenging to preserve both contexts simultaneously. As a result, the short-term context tends to overshadow the long-term one, akin to how we often remember the latest episode of a Netflix series more vividly than earlier ones. Recognizing this limitation, scientists proposed a solution: incorporating two pathways, one for short-term memory and another for long-term memory. This approach enables the model to prioritize important information, retaining it in long-term memory while discarding less relevant details over time.

The LSTM architecture is more complicated as compared to RNNs because it has to manage both short-term and long-term context. This means it needs to handle communication between these two types of memory, adding complexity to the model.

LSTM Architecture :

The architecture of LSTM includes three key components: the forget gate, which decides what information to discard from the long-term memory, the input gate, which determines what new information to store in the long-term memory, and the output gate in LSTM determines what

RNN vs LSTM
Components of LSTM

information from the long-term memory is used to produce the final output of the LSTM cell at a particular time step. It regulates the flow of information from the long-term memory to the current cell output, ensuring that only relevant information is considered in generating the output.

LSTM Working :

In the LSTM model, we can think of three main stages: input, processing, and output. At the input stage, we have three inputs: the input for the current state, previous cell state, and previous hidden state. During processing, the model updates the cell state(c0 -> ct) and calculates the new hidden state (h0 -> ht).

Finally, at the output stage, we have two outputs: the current cell state and the current hidden state. These outputs provide the information needed for further processing or decision-making in the model.

Understanding the gates in LSTM :

The architecture of LSTM includes three key components: the forget gate, which decides what information to discard from the long-term memory,

the input gate, which determines what new information to store in the long-term memory, and the output gate, which decides what information to use from the long-term memory to produce the output.

What are ht, ct, xt, ft , it, c’t and ot ?

The yellow color boxes represent neural network layers with a specified number of nodes, which is indeed a hyperparameter determined by the user. In an LSTM cell, each gate (forget gate, input gate, and output gate) and the candidate cell state computation involve a neural network layer

with the same number of nodes, and they use either the sigmoid or the tanh activation function.

Pointwise Operations :

Forget Gate workflow :

How forget controls the long term context ?

Let’s take an example to illustrate how the forget gate controls the long-term context and why it’s called the “forget” gate.

Input Gate workflow :

Output Gate workflow :

Now see the following LSTM animation to understand the working of LSTM :

References :

I hope this blog has enhanced your fundamental knowledge of LSTM basics concept. If you’ve gained value from this content, consider following me for more insightful posts. Appreciate your time in reading this article. Thank you!

Recommended from ReadMedium