Visual Guides to understand the basics of Large Language Models

A compilation of tools and articles that intuitively break down the complicated AI concepts

Image by the author using free illustrations by unDraw.co

Today, the world is abuzz with LLMs, short for Large Language models. Not a day passes without the announcement of a new language model, fueling the fear of missing out in the AI space. Yet, many still struggle with the basic concepts of LLMs, making it challenging to keep pace with the advancements. This article is aimed at those who would like to dive into the inner workings of such AI models to have a solid grasp of the subject. With this in mind, I present a few tools and articles that can help solidify the concepts and break down the concepts of LLMs so they can be easily understood.

· 1. The Illustrated Transformer by Jay Alammar · 2. The Illustrated GPT-2 by Jay Alammar · 3. LLM Visualization by Brendan Bycroft · 4. Tokenizer tool by OpenAI · 5. Understanding GPT Tokenizers by Simon Wilson · 6. Do Machine Learning Models Memorize or Generalize? -An explorable by PAIR

1. The Illustrated Transformer by Jay Alammar

GIF created by Author, based on **The Illustrated Transformer** by Jay Alammar | This work is licensed under a Creative Commons Attribution ShareAlike 4.0 International License.

I’m sure many of you are already familiar with this iconic article. Jay was one of the earliest pioneers in writing technical articles with powerful visualizations. A quick run through this blog site will make you understand what I’m trying to imply. Over the years, he has inspired many writers to follow suit, and the idea of tutorials changed from simple text and code to immersive visualizations. Anyway, back to the illustrated Transformer. The transformer architecture is the fundamental building block of all Language Models with Transformers (LLMs). Hence, it is essential to understand the basics of it, which is what Jay does beautifully. The blog covers crucial concepts like:

A High-Level Look at The Transformer Model
Exploring The Transformer’s Encoding and Decoding Components
Self-Attention
Matrix Calculation of Self-Attention
The Concept of Multi-Headed Attention
Positional Encoding
The Residuals in The Transformer Architecture
The Final Linear and Softmax Layer of The Decoder
The Loss Function in Model Training

He has also created a “Narrated Transformer” video, which is a gentler approach to the topic. Once you are done with this blog post, the Attention Is All You Need paper, and the official Transformer blog post would be great add-ons.

Link: https://jalammar.github.io/illustrated-transformer/

2. The Illustrated GPT-2 by Jay Alammar

GIF created by Author, based on **The Illustrated G**PT-2 by Jay Alammar | This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Another great article from Jay Alammar — the illustrated GPT-2. It is a supplement to the Illustrated Transformer blog, containing more visual elements to explain the inner workings of transformers and how they’ve evolved since the original paper. It also has a dedicated section for applications of transformers beyond language modeling.

🔗: https://jalammar.github.io/illustrated-gpt2/

3. LLM Visualization by Brendan Bycroft

GIF created by Author, based on LLM Visualization by Brendan Bycroft | This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The LLM visualization project provides a walkthrough of the LLM algorithm backing OpenAI’s ChatGPT. It's a great resource to explore the algorithm down to every step required to run a single token inference., seeing the whole process in action.

The project features a web page containing visualizations of a small LLM akin to what powers ChatGPT but in stunning 3D effects. This tool offers a step-by-step guide through a single token inference and features interactive elements for a hands-on experience. As of today, visualizations for the following architectures are available:

GPT-2(small)
Nano GPT
GPT-2(XL)
GPT-3

🔗 : https://bbycroft.net/llm

5. Generative AI exists because of the transformer — Financial Times

GIF created by the Author, based on Generative AI, exists because of the transformer — Financial Times(FT) | This work is being distributed under FT’s sharing policy.

Great job by the Visual Storytelling Team and Madhumita Murgia at Financial Times for employing visuals to elucidate the functioning of LLMs, with a special emphasis on the self-attention mechanism and the Transformer architecture.

🔗 https://ig.ft.com/generative-ai/

4. Tokenizer tool by OpenAI

Screenshot by Author | Source: OpenAI’s Tokenizer tool documentation available for sharing under the MIT License.

Large language models process text using tokens — sequences of numbers. Tokenizers convert text into tokens. OpenAI’s tokenizer tool provides a helpful way to test specific strings and see how they are translated into tokens. You can use the tool to understand how a piece of text might be tokenized by a language model and the total count of tokens in that piece of text.

Link: https://platform.openai.com/tokenizer

5. Understanding GPT tokenizers by Simon Wilson

GIF created by Author, based on Understanding GPT tokenizers by Simon Wilson | This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

While we have already mentioned that OpenAI offers a Tokenizer tool for exploring how tokens work, Simon Wilson has built his own tokenizer tool, which is slightly more interesting. It is available as a tool as an Observable notebook. The notebook converts text to tokens, tokens to text, and runs searches against the full token table.

Some of the key insights from Simon’s analysis are : • Most common English words have a single token assigned. • Some words have tokens with a leading space, enabling more efficient encoding of full sentences. • Non-English languages may have less efficient tokenization. • Glitch tokens can lead to unexpected behavior.

🔗 https://lnkd.in/eXTcia8Z

5. Chunkviz by Greg Kamradt

GIF by the Author based on the Chunkviz app available for sharing under the MIT License.

Chunking is a strategy that involves breaking down large pieces of text into smaller segments when building LLM applications. This is important so that you can fit your document into your model’s context window. Context windows refer to the maximum length of text they can be handled by a language model at once. But there are various strategies for chunking, and this is where this tool shines. You can choose from a variety of chunk strategies and see how it affects your text. currently, you can visualize text splitting & chunking strategies from four different LangChainAI splitters. features

🔗 https://chunkviz.up.railway.app/

6. Do Machine Learning Models Memorize or Generalize? -An explorable by PAIR

GIF by the Author based on the Do Machine Learning Models Memorize or Generalize? explorable, available for sharing under the MIT License.

Explorables are interactive essays by Google’s PAIR team that try to simplify complex AI-related topics with interactive mediums. This particular explorable delves deep into the concept of Generalization and Memorization, exploring a vital question — whether large language models(LLMs) truly understand the world, or are they just recalling information from their extensive training data?

In this interactive article, the authors take an investigative journey through the training dynamics of a tiny model. They reverse engineer the solution they find, providing a brilliant illustration of the exciting emerging field of Mechanistic Interpretability.

🔗 https://pair.withgoogle.com/explorables/grokking/

Conclusion

We looked at a few invaluable tools and articles that try to break down complex technical jargon into easily understandable forms. I am a big proponent of writing and presenting technical concepts in interactive, visual formats. This reminds me of a previous article of mine that focussed on tools that present intuitive explanations of standard machine learning concepts.

Learn Machine Learning Concepts Interactively

Five freely available tools that intuitively break down the complicated machine learning concepts

towardsdatascience.com

The articles and tools highlighted in this article aim to lower the barrier to entry for beginners and enthusiasts alike, making learning more engaging and accessible. I plan to continually update this article with more such resources as I discover them. Additionally, I welcome and look forward to incorporating suggestions from readers.