Prompt Formatting is the Unsung Hero

Your free and user friendly Blueprint to unlock the power of any AI

We often marvel at the impressive feats of large language models (LLMs) — generating creative text formats, translating languages flawlessly, and answering our questions in informative ways. But have you ever wondered what makes these marvels tick?

The answer lies in a seemingly mundane, yet crucial element: prompt formatting. Yes, that’s right! Before you delve into the intricacies of prompt engineering, ensuring your instructions are formatted correctly sets the stage for success.

A new model just came out and you want to test it. You download it run the pipeline and the result is… soo BAD 🤬😤

Why?

Let’s learn together how to deal properly with any kind of LLM and become masters of prompt Formatting.

Ok, but what is prompt formatting?

If you search on Google every time you look for this topic you will find only topics related to prompt engineering. But prompt engineering is the art of crafting your words in a way that the instruction is easy to be processed by the Language model.

Even before the good craft we need to understand that after the initial pre-training, every model is fine-tuned to follow instructions. According to HOW the instruction database is designed, the model learn to receive them ONLY in that specific FORMAT.

Image by Christopher Kuszajewski from Pixabay

If your prompt is simply “what is the meaning of Science?” you will get a gibberish output: this is not because your question was garbage, but because the model did not receive few special tokens to identify what the instruction is.

Here an example tested on my miniPC with the quantized version of Llama-2–7b-Chat:

User: What is the meaning of Science?

Llama2: What are some examples of scientific discoveries?
What are three types of science?

The model is not dumb… it is simply not able to understand the instructions.

Let’s take an example from the model card of Llama2–13b-Chat on Hugging Face:

Intended use section for meta-llama/Llama-2–13b-chat-hf

Not really explaining the matter here, but if you go on the official Meta Getting Started page you can see in the code that:

prompt = "Who wrote the book innovator's dilemma?"
pipe = pipeline(task="text-generation", model=model, 
                tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")

The Llama2-Chat fine tuned model is expecting a Prompt following the mentioned above f-string: f”<s>[INST] {prompt} [/INST]”.

Scrolling a little bit more down the page it is finally clearly declared a Format:

Prompt format, single and multi turn variation.

If we run again the same question “what is the meaning of Science?” applying the format mentioned above, this time we are going to get a good reply.

Special Tokens

Large Language Models is a Neural Network that works with numbers. So to give instructions to the LLM (and also receive answers…) we need a tool to transform words into numbers. I present to you the Tokenizer!

Tokenization is the process of breaking down a piece of text into small units called tokens. A token may be a word, part of a word or just characters like punctuation.

The tokenizer uses a special vocabulary (different for every model) that transforms our instruction (the prompt) into something meaningful for the LLM. Here an example…

This sentence is given to the vocabulary and we get the index number for it, like coordinates. Note that many words have already a leading space, and that The and the are different (so Case sensitive). There is also an end token 198.

What is so special about special tokens?

Among them we have also Special Tokens: Special tokens (like those representing the start and end of a text or unknown words, from the example above is token 198…) are also assigned unique IDs.

Every Model is fine tuned with peculiar Special Tokens. They usually are:

screenshot from https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha/discussions/2/files

bos_token: Beginning Of Sequence is used to specify that in the prompt we are starting a sequence.

eos_token: End Of Sequence is used to specify that in the prompt we completed a sequence.

pad_token: Padding: it is a token used to extend a sequence up to a given length by repeating a dummy token. We don’t manually add “[PAD]” tokens to the sequences. The pad token is usually a special token defined inside the tokenizer and automatically added, if necessary, along with the other special tokens to the sequence.

unk_token: Unknown token, identify a word or symbols that have no match in the model vocabulary.

A Full IDEAL prompt

Not all the LLMs are equals, but ideally a Prompt should have the following components:

System Message: A system_prompt is text that is prepended to the prompt. It’s used in a chat context to help guide or constrain model behavior.

Let’s say you wanted to write a chatbot that talks like a pirate. One way to do this would be to prepend “you are a pirate” to every prompt.

Instead, we can set a system_prompt ”You are a pirate,” and the model will understand your request without having to be told in every prompt (see source below).

You can also use system prompts to make your LLM behave in a more… professional way. Try system prompts like “Act if as if you’re responding to documentation questions” or “You are responding to highly technical customers.” A good LLM is quite good at respecting system prompts.

User Message: it is your instruction, what you need the LLM do for you. This part is the one relevant to Prompt Engineering techniques.

Assistant Message: this is the indicators to the LLM that from this point on the Language Model should start its reply

source: https://replicate.com/blog/how-to-prompt-llama

An example

Here Dlphin-Llama2–7B as an example of instruction fine tuned LLM family that use a full prompt format:

SYSTEM: {system_message}
USER: {prompt}
ASSISTANT:

If we think about degree of attention/priority the model use a hierarchy;

Assistant Message: Responses from the model. Informs future responses, but is least authoritative.
User Message: Input from the User — medium authoritative, can steer the model contrary to Assistant Messages, unless asking for some answer prohibited by model guardrails, or contrary to System Message.
System Message — Most authoritative. Model is most attentive to, and tries hardest to obey.

The Formatting Blueprint

Because every model (or model family) follows its own formatting style, the only rule is… to know where to find the prompt format just right for your specific model.

How to do it?

1 — The Hugging Face model card

The Hugging Face model cardusually give you a code snippet. You can use that example to infer the correct format.

Model Card of https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0

As you can see this model card is really well done. We can understand that the prompt format for TinyLlama-1.1B-Chat is:

# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>

2— The quantized version Model Card on TheBloke

TheBloke is tha master repository on Hugging Face Hub for the quantized version of the LLMs. The model cards are really informative and you can easily understand what is the expected prompt format for every model.

example from https://huggingface.co/TheBloke/Nous-Hermes-2-Mixtral-8x7B-DPO-GGUF

In the example above we get the prompt format and also its name. This is super useful because you will see that many different Models share the same prompt formatting template. The reason is mainly because the fine-tuning dataset is already curated with the famous special tokens, so automatically these tokens become part of the Model prompt format.

Prompt Format

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Prompt Format template: ChatML

3 — Use the Hardware-Corner LLM database

This is a treasure! This website can provide you with all the details about any LLM on Hugging Face. But the interesting thing is that you can easily use the search bar to filter the models and retrieve also the prompt format.

model card for https://huggingface.co/Qwen/Qwen-1_8B-Chat

Let’s take the Qwen-1_8B-Chat Model. The Hugging Face model Card may be complete…but unless you understand Chinese you cannot do anything with it.

From the HardwareCorner database you can search in the model family filter qwen and look for our specific model Qwen-1.8B-Chat

The website will send you to the dedicated page, giving you hardware requirements details etc…

What to do since the Qwen-1.8B-Chat is not in the list?

No worries here! As long a smaller or bigger parameter model in the same fine tuning (Chat or Instruct) is there, we can inherit the prompt format from them. Let’s use the Qwen-7B-Chat (click on the >_ icon to expand it).

You can now copy/paste the Prompt Format template and use it in your Python code.

Cracking the Code: The Last Piece of the Puzzle

Remember, even the coolest AI tools need clear instructions to work their magic. Just like you wouldn’t ask your friend to bake a cake without giving them a recipe, LLMs need prompts formatted in a specific way to understand what you want them to do.

The good news is, figuring out these formatting rules is easy… if you have a blueprint! We went through it together, with three principles and tools you can use.

So, don’t underestimate the power of proper formatting. It’s the secret sauce that turns your ideas into amazing AI creations. With the right format in hand, you can unlock the true potential of these tools and have a blast exploring the incredible world of AI.

Now go forth and create something awesome!

Hope you enjoyed the article. If this story provided value and you wish to show a little support, you could:

Clap a lot of times for this story
Highlight the parts more relevant to be remembered (it will be easier for you to find it later, and for me to write better articles)
Learn how to start to Build Your Own AI, download This Free eBook
Sign up for a Medium membership using my link — ($5/month to read unlimited Medium stories)
Follow me on Medium
Read my latest articles https://medium.com/@fabio.matricardi

Here some few more resources:

Production ready Open-Source LLMware are here!

Main stream reports that free LLM are not good for Enterprise use: what if I tell you that we have plenty of ready to…

ai.gopubby.com

Inside the Matrix: cracking the Code of sequence models with Encoders base LLMs

Dive into the rabbit hole of AI architecture, where information flows like Morpheus’ code, revealing the secrets behind…

ai.plainenglish.io

Prompt Formatting is the Unsung Hero

Your free and user friendly Blueprint to unlock the power of any AI

Ok, but what is prompt formatting?

Special Tokens

What is so special about special tokens?

A Full IDEAL prompt

An example

The Formatting Blueprint

1 — The Hugging Face model card

2— The quantized version Model Card on TheBloke

3 — Use the Hardware-Corner LLM database

Cracking the Code: The Last Piece of the Puzzle

Production ready Open-Source LLMware are here!

Main stream reports that free LLM are not good for Enterprise use: what if I tell you that we have plenty of ready to…

Inside the Matrix: cracking the Code of sequence models with Encoders base LLMs

Dive into the rabbit hole of AI architecture, where information flows like Morpheus’ code, revealing the secrets behind…

Tiny Tweaks, Big Impact: Mastering the parameters of your Mini Language Model

Bake off the hidden potential of small LLMs through careful “ingredients” tuning.

Padding Large Language Models — Examples with Llama 2

Best practices to pad training examples for causal LLMs

WRITER at MLearning.ai / Hacking GPTs Store / 20K+ Art Prompts

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai