Visual Guides to understand the basics of Large Language Models
A compilation of tools and articles that intuitively break down the complicated AI concepts

Today, the world is abuzz with LLMs, short for Large Language models. Not a day passes without the announcement of a new language model, fueling the fear of missing out in the AI space. Yet, many still struggle with the basic concepts of LLMs, making it challenging to keep pace with the advancements. This article is aimed at those who would like to dive into the inner workings of such AI models to have a solid grasp of the subject. With this in mind, I present a few tools and articles that can help solidify the concepts and break down the concepts of LLMs so they can be easily understood.
Table of Contents
· 1. The Illustrated Transformer by Jay Alammar · 2. The Illustrated GPT-2 by Jay Alammar · 3. LLM Visualization by Brendan Bycroft · 4. Tokenizer tool by OpenAI · 5. Understanding GPT Tokenizers by Simon Wilson · 6. Do Machine Learning Models Memorize or Generalize? -An explorable by PAIR
1. The Illustrated Transformer by Jay Alammar

I’m sure many of you are already familiar with this iconic article. Jay was one of the earliest pioneers in writing technical articles with powerful visualizations. A quick run through this blog site will make you understand what I’m trying to imply. Over the years, he has inspired many writers to follow suit, and the idea of tutorials changed from simple text and code to immersive visualizations. Anyway, back to the illustrated Transformer. The transformer architecture is the fundamental building block of all Language Models with Transformers (LLMs). Hence, it is essential to understand the basics of it, which is what Jay does beautifully. The blog covers crucial concepts like:
- A High-Level Look at The Transformer Model
- Exploring The Transformer’s Encoding and Decoding Components
- Self-Attention
- Matrix Calculation of Self-Attention
- The Concept of Multi-Headed Attention
- Positional Encoding
- The Residuals in The Transformer Architecture
- The Final Linear and Softmax Layer of The Decoder
- The Loss Function in Model Training
He has also created a “Narrated Transformer” video, which is a gentler approach to the topic. Once you are done with this blog post, the Attention Is All You Need paper, and the official Transformer blog post would be great add-ons.
Link: https://jalammar.github.io/illustrated-transformer/
2. The Illustrated GPT-2 by Jay Alammar

Another great article from Jay Alammar — the illustrated GPT-2. It is a supplement to the Illustrated Transformer blog, containing more visual elements to explain the inner workings of transformers and how they’ve evolved since the original paper. It also has a dedicated section for applications of transformers beyond language modeling.
🔗: https://jalammar.github.io/illustrated-gpt2/
3. LLM Visualization by Brendan Bycroft

The LLM visualization project provides a walkthrough of the LLM algorithm backing OpenAI’s ChatGPT. It's a great resource to explore the algorithm down to every step required to run a single token inference., seeing the whole process in action.
The project features a web page containing visualizations of a small LLM akin to what powers ChatGPT but in stunning 3D effects. This tool offers a step-by-step guide through a single token inference and features interactive elements for a hands-on experience. As of today, visualizations for the following architectures are available:
- GPT-2(small)
- Nano GPT
- GPT-2(XL)
- GPT-3
5. Generative AI exists because of the transformer — Financial Times

Great job by the Visual Storytelling Team and Madhumita Murgia at Financial Times for employing visuals to elucidate the functioning of LLMs, with a special emphasis on the self-attention mechanism and the Transformer architecture.
🔗 https://ig.ft.com/generative-ai/
4. Tokenizer tool by OpenAI


Large language models process text using tokens — sequences of numbers. Tokenizers convert text into tokens. OpenAI’s tokenizer tool provides a helpful way to test specific strings and see how they are translated into tokens. You can use the tool to understand how a piece of text might be tokenized by a language model and the total count of tokens in that piece of text.
Link: https://platform.openai.com/tokenizer
5. Understanding GPT tokenizers by Simon Wilson

While we have already mentioned that OpenAI offers a Tokenizer tool for exploring how tokens work, Simon Wilson has built his own tokenizer tool, which is slightly more interesting. It is available as a tool as an Observable notebook. The notebook converts text to tokens, tokens to text, and runs searches against the full token table.
Some of the key insights from Simon’s analysis are : • Most common English words have a single token assigned. • Some words have tokens with a leading space, enabling more efficient encoding of full sentences. • Non-English languages may have less efficient tokenization. • Glitch tokens can lead to unexpected behavior.
5. Chunkviz by Greg Kamradt

Chunking is a strategy that involves breaking down large pieces of text into smaller segments when building LLM applications. This is important so that you can fit your document into your model’s context window. Context windows refer to the maximum length of text they can be handled by a language model at once. But there are various strategies for chunking, and this is where this tool shines. You can choose from a variety of chunk strategies and see how it affects your text. currently, you can visualize text splitting & chunking strategies from four different LangChainAI splitters. features
🔗 https://chunkviz.up.railway.app/
6. Do Machine Learning Models Memorize or Generalize? -An explorable by PAIR

Explorables are interactive essays by Google’s PAIR team that try to simplify complex AI-related topics with interactive mediums. This particular explorable delves deep into the concept of Generalization and Memorization, exploring a vital question — whether large language models(LLMs) truly understand the world, or are they just recalling information from their extensive training data?
In this interactive article, the authors take an investigative journey through the training dynamics of a tiny model. They reverse engineer the solution they find, providing a brilliant illustration of the exciting emerging field of Mechanistic Interpretability.
🔗 https://pair.withgoogle.com/explorables/grokking/
Conclusion
We looked at a few invaluable tools and articles that try to break down complex technical jargon into easily understandable forms. I am a big proponent of writing and presenting technical concepts in interactive, visual formats. This reminds me of a previous article of mine that focussed on tools that present intuitive explanations of standard machine learning concepts.
The articles and tools highlighted in this article aim to lower the barrier to entry for beginners and enthusiasts alike, making learning more engaging and accessible. I plan to continually update this article with more such resources as I discover them. Additionally, I welcome and look forward to incorporating suggestions from readers.





