avatarFabio Matricardi

Summary

Effective prompt formatting is crucial for maximizing the performance of large language models (LLMs) by ensuring instructions are structured in a way that aligns with the model's fine-tuning.

Abstract

The article emphasizes the importance of prompt formatting as a foundational element in leveraging the capabilities of large language models (LLMs). It explains that even with advanced models, improperly formatted prompts can lead to subpar outputs. The process of prompt engineering involves crafting instructions that are compatible with the model's training, which includes the use of special tokens that signal the beginning and end of sequences, among other indicators. The article also highlights the significance of tokenization and the role of tokenizers in converting text into numerical representations that LLMs can process. Furthermore, it provides a blueprint for constructing effective prompts, which includes a system message to guide the model's behavior, a user message conveying the specific task, and an assistant message indicating the model's response. The article concludes by offering resources and strategies for identifying the appropriate prompt format for different LLMs, stressing that clear and well-structured instructions are essential for unlocking the full potential of AI tools.

Opinions

  • The author believes that prompt formatting is an underrated aspect of working with LLMs and is essential for their effective operation.
  • There is an opinion that even sophisticated AI models like Llama-2 can produce poor results if the prompts are not formatted correctly.
  • The article suggests that understanding the model's fine-tuning process and the associated instruction database is key to successful prompt engineering.
  • The use of special tokens and adherence to a model's specific prompt format is seen as critical for the model to process and respond to instructions accurately.
  • The author advocates for the use of system prompts to steer the model's behavior in a consistent and controlled manner, such as making a chatbot respond in a specific style or level of formality.
  • It is implied that the correct prompt format can be inferred from model cards and other resources, and that this information is readily available to those who know where to look.
  • The author encourages readers to engage with AI tools actively, using the provided blueprint to format prompts effectively and thereby create more impressive AI-generated outputs.

Prompt Formatting is the Unsung Hero

Your free and user friendly Blueprint to unlock the power of any AI

Image by the author and Lexica.art

We often marvel at the impressive feats of large language models (LLMs) — generating creative text formats, translating languages flawlessly, and answering our questions in informative ways. But have you ever wondered what makes these marvels tick?

The answer lies in a seemingly mundane, yet crucial element: prompt formatting. Yes, that’s right! Before you delve into the intricacies of prompt engineering, ensuring your instructions are formatted correctly sets the stage for success.

A new model just came out and you want to test it. You download it run the pipeline and the result is… soo BAD 🤬😤

Why?

Let’s learn together how to deal properly with any kind of LLM and become masters of prompt Formatting.

Ok, but what is prompt formatting?

If you search on Google every time you look for this topic you will find only topics related to prompt engineering. But prompt engineering is the art of crafting your words in a way that the instruction is easy to be processed by the Language model.

Even before the good craft we need to understand that after the initial pre-training, every model is fine-tuned to follow instructions. According to HOW the instruction database is designed, the model learn to receive them ONLY in that specific FORMAT.

Image by Christopher Kuszajewski from Pixabay

If your prompt is simply “what is the meaning of Science?” you will get a gibberish output: this is not because your question was garbage, but because the model did not receive few special tokens to identify what the instruction is.

Here an example tested on my miniPC with the quantized version of Llama-2–7b-Chat:

User: What is the meaning of Science?

Llama2: What are some examples of scientific discoveries?
What are three types of science?

The model is not dumb… it is simply not able to understand the instructions.

Let’s take an example from the model card of Llama2–13b-Chat on Hugging Face:

Intended use section for meta-llama/Llama-2–13b-chat-hf

Not really explaining the matter here, but if you go on the official Meta Getting Started page you can see in the code that:

prompt = "Who wrote the book innovator's dilemma?"
pipe = pipeline(task="text-generation", model=model, 
                tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")

The Llama2-Chat fine tuned model is expecting a Prompt following the mentioned above f-string: f”<s>[INST] {prompt} [/INST]”.

Scrolling a little bit more down the page it is finally clearly declared a Format:

Prompt format, single and multi turn variation.

If we run again the same question “what is the meaning of Science?” applying the format mentioned above, this time we are going to get a good reply.

Llama2–7b-chat rocks!

Special Tokens

Large Language Models is a Neural Network that works with numbers. So to give instructions to the LLM (and also receive answers…) we need a tool to transform words into numbers. I present to you the Tokenizer!

Tokenization is the process of breaking down a piece of text into small units called tokens. A token may be a word, part of a word or just characters like punctuation.

The tokenizer uses a special vocabulary (different for every model) that transforms our instruction (the prompt) into something meaningful for the LLM. Here an example…

This sentence is given to the vocabulary and we get the index number for it, like coordinates. Note that many words have already a leading space, and that The and the are different (so Case sensitive). There is also an end token 198.

What is so special about special tokens?

Among them we have also Special Tokens: Special tokens (like those representing the start and end of a text or unknown words, from the example above is token 198…) are also assigned unique IDs.

Every Model is fine tuned with peculiar Special Tokens. They usually are:

screenshot from https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha/discussions/2/files

bos_token: Beginning Of Sequence is used to specify that in the prompt we are starting a sequence.

eos_token: End Of Sequence is used to specify that in the prompt we completed a sequence.

pad_token: Padding: it is a token used to extend a sequence up to a given length by repeating a dummy token. We don’t manually add “[PAD]” tokens to the sequences. The pad token is usually a special token defined inside the tokenizer and automatically added, if necessary, along with the other special tokens to the sequence.

unk_token: Unknown token, identify a word or symbols that have no match in the model vocabulary.

Image by Alexa from Pixabay

A Full IDEAL prompt

Not all the LLMs are equals, but ideally a Prompt should have the following components:

System Message: A system_prompt is text that is prepended to the prompt. It’s used in a chat context to help guide or constrain model behavior.

Let’s say you wanted to write a chatbot that talks like a pirate. One way to do this would be to prepend “you are a pirate” to every prompt.

Instead, we can set a system_prompt ”You are a pirate,” and the model will understand your request without having to be told in every prompt (see source below).

You can also use system prompts to make your LLM behave in a more… professional way. Try system prompts like “Act if as if you’re responding to documentation questions” or “You are responding to highly technical customers.” A good LLM is quite good at respecting system prompts.

User Message: it is your instruction, what you need the LLM do for you. This part is the one relevant to Prompt Engineering techniques.

Assistant Message: this is the indicators to the LLM that from this point on the Language Model should start its reply

source: https://replicate.com/blog/how-to-prompt-llama

An example

Here Dlphin-Llama2–7B as an example of instruction fine tuned LLM family that use a full prompt format:

SYSTEM: {system_message}
USER: {prompt}
ASSISTANT:

If we think about degree of attention/priority the model use a hierarchy;

  • Assistant Message: Responses from the model. Informs future responses, but is least authoritative.
  • User Message: Input from the User — medium authoritative, can steer the model contrary to Assistant Messages, unless asking for some answer prohibited by model guardrails, or contrary to System Message.
  • System Message — Most authoritative. Model is most attentive to, and tries hardest to obey.
Image by tookapic from Pixabay

The Formatting Blueprint

Because every model (or model family) follows its own formatting style, the only rule is… to know where to find the prompt format just right for your specific model.

How to do it?

1 — The Hugging Face model card

The Hugging Face model cardusually give you a code snippet. You can use that example to infer the correct format.

Model Card of https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0

As you can see this model card is really well done. We can understand that the prompt format for TinyLlama-1.1B-Chat is:

# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>

2— The quantized version Model Card on TheBloke

TheBloke is tha master repository on Hugging Face Hub for the quantized version of the LLMs. The model cards are really informative and you can easily understand what is the expected prompt format for every model.

example from https://huggingface.co/TheBloke/Nous-Hermes-2-Mixtral-8x7B-DPO-GGUF

In the example above we get the prompt format and also its name. This is super useful because you will see that many different Models share the same prompt formatting template. The reason is mainly because the fine-tuning dataset is already curated with the famous special tokens, so automatically these tokens become part of the Model prompt format.

Prompt Format

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Prompt Format template: ChatML

3 — Use the Hardware-Corner LLM database

This is a treasure! This website can provide you with all the details about any LLM on Hugging Face. But the interesting thing is that you can easily use the search bar to filter the models and retrieve also the prompt format.

model card for https://huggingface.co/Qwen/Qwen-1_8B-Chat

Let’s take the Qwen-1_8B-Chat Model. The Hugging Face model Card may be complete…but unless you understand Chinese you cannot do anything with it.

From the HardwareCorner database you can search in the model family filter qwen and look for our specific model Qwen-1.8B-Chat

The website will send you to the dedicated page, giving you hardware requirements details etc…

search result

What to do since the Qwen-1.8B-Chat is not in the list?

No worries here! As long a smaller or bigger parameter model in the same fine tuning (Chat or Instruct) is there, we can inherit the prompt format from them. Let’s use the Qwen-7B-Chat (click on the >_ icon to expand it).

expanded 7B-Chat model card

You can now copy/paste the Prompt Format template and use it in your Python code.

Image by maxlenke from Pixabay

Cracking the Code: The Last Piece of the Puzzle

Remember, even the coolest AI tools need clear instructions to work their magic. Just like you wouldn’t ask your friend to bake a cake without giving them a recipe, LLMs need prompts formatted in a specific way to understand what you want them to do.

The good news is, figuring out these formatting rules is easy… if you have a blueprint! We went through it together, with three principles and tools you can use.

So, don’t underestimate the power of proper formatting. It’s the secret sauce that turns your ideas into amazing AI creations. With the right format in hand, you can unlock the true potential of these tools and have a blast exploring the incredible world of AI.

Now go forth and create something awesome!

Hope you enjoyed the article. If this story provided value and you wish to show a little support, you could:

  1. Clap a lot of times for this story
  2. Highlight the parts more relevant to be remembered (it will be easier for you to find it later, and for me to write better articles)
  3. Learn how to start to Build Your Own AI, download This Free eBook
  4. Sign up for a Medium membership using my link — ($5/month to read unlimited Medium stories)
  5. Follow me on Medium
  6. Read my latest articles https://medium.com/@fabio.matricardi

Here some few more resources:

WRITER at MLearning.ai / Hacking GPTs Store / 20K+ Art Prompts

Artificial Intelligence
Prompt Engineering
Hugging Face
Python
Ml So Good
Recommended from ReadMedium
avatarIsamu Isozaki
Understanding AI for Stories

19 min read