The Art of Prompt Engineering

Large language models (LLMs) like GPT-3 and Claude have demonstrated impressive capabilities in generating natural language. However, their performance depends heavily on how users prompt them. The art of carefully crafting prompts to get the desired output from an LLM is called prompt engineering.

In this guide, we dive into practical techniques and best practices for getting the most out of LLMs.

Specifically, we cover these lessons :

Solution Verification: Solutions proposed by LLMs need double-checking to ensure reliability. Even advanced models can make mistakes, so validation is essential.
Use Top-k Sampling: Retrieving multiple candidate solutions allows picking the best one. Single responses are prone to errors.
Minimal Feedback with Iterative Prompting: Simple iterative guidance like “try again” is often effective vs detailed corrections.
External Critics: Separate evaluators like humans or rules-based systems should judge LLM outputs, not the LLM itself.
Focus on Search over Deep Logic: Leverage LLM strength in searching data rather than complex logical reasoning.
Diverse Prompting Techniques: Varying prompt formulations unlocks better performance.
Solution Recognition Over Generation: Validating solutions is easier than creating them for LLMs.
Hybrid System Design: Blend neural networks like LLMs with classical AI for robustness.

By covering both foundational strategies and hands-on techniques, this guide aims to equip readers with the principles and skills to tap into the vast potential of large language models.

GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems

There has been considerable divergence of opinion on the reasoning abilities of Large Language Models (LLMs). While the…

arxiv.org

1. Set the Right Objective

The first step is clearly defining what you want the LLM to do. Do you want it to summarize text, answer questions, generate code, or something else? Having a precise objective guides prompt formulation. Avoid ambiguous or subjective goals.

For example, “explain this concept simply” is vague compared to “summarize this text in three bullet points for a high school student.”

2. Try a Diverse Set of Prompts

LLMs can be very particular about prompt wording. Small tweaks can vastly impact the response. Develop a library of diverse prompts targeting the same objective.

For summarization, prompts could range from “TL;DR:” to “Summarize the key ideas from this text in 140 characters” to “Explain this to a first-grader.” Evaluate which phrasing works best.

3. Leverage Examples

Providing examples of desired responses, also called “few-shot learning,” can greatly improve LLM performance.

For a summarization task, giving 2–3 examples of good summaries with different text allows the LLM to infer the ideal format.

Examples also work for goals like translating text, answering questions, or even generating code.

4. Apply Constraints

Constraints like word limits, bullet points, and output formats (e.g. JSON) can shape LLM responses to better suit your needs.

For a 3-bullet point summary, the prompt could be:

“Summarize this text in 3 bullet points, with each bullet point containing no more than 2 sentences:”

Bullet 1:
Bullet 2:
Bullet 3:

The more constraints you set, the more control you have over the response structure.

5. Iterative Prompting

Often the first LLM response is not perfect. Iteratively re-prompting and providing minimal feedback such as “Try again” or “Be more concise” allows quickly honing in on the ideal output.

Think of it as guiding a student through multiple drafts of an essay — broad pointers get better results than comprehensive corrections.

6. Blend LLMs with Rules

Combining LLM capabilities with rules, logic, and classical NLP can yield more robust systems.

For example, named entity recognition and coreference resolution as pre-processing steps can improve summarization quality by resolving ambiguous pronouns and entities.

Hybrid systems prevent over-reliance on the LLM. The rules handle parts that are precisely programmable while the LLM focuses on fuzzy tasks like abstraction and synthesis.

7. Prompt Engineering for RAG Systems

llama_index/docs/examples/prompts/prompts_rag.ipynb at main · run-llama/llama_index

LlamaIndex (formerly GPT Index) is a data framework for your LLM applications …

github.com

Retrieval-augmented generation (RAG) combines a retriever to find relevant contexts and a generator (typically an LLM) to synthesize the response. The LlamaIndex notebook demonstrates various prompt engineering techniques for both modules.

Retriever Prompting

The retriever in a RAG system can be prompted to some extent to improve retrieval of relevant contexts. Here are some ways prompting can be incorporated into the retriever:

Query Rewriting: The original natural language query can be rewritten or reformulated using an LLM to better match the documents/contexts. This rewritten query is then passed to the retriever.
Query Expansion: The original query can be expanded by generating relevant keywords, synonyms, alternate phrases etc. using an LLM. These expanded terms are added to the query for a richer signal to the retriever.
Document Re-ranking: The initial set of retrieved documents based on keyword matching can be re-ranked by an LLM. The LLM takes the query and each document as input, and scores the relevance of that doc for the query. Documents are re-ranked based on these relevance scores.
Prompt-Based Retrieval: Instead of keywords, the entire query can be provided as a prompt to the retriever along with example relevant documents. The retriever is trained to score new documents based on this prompted format.
Hybrid Retriever: A neural retriever can be combined with the keyword-based retriever. The neural retriever uses query/document embeddings from an LLM to assess relevance. The scores from both retrievers are combined.

The retriever’s functionality can be improved by techniques like query/document reformulation, expansion, re-ranking, prompting with examples, and hybrid retrieval. The key idea is to incorporate the generalization power of LLMs to retrieve more contextual matches. With the right prompting, significant gains are possible.

Generator Prompting

Clearly state the objective — “Based on the context, succinctly answer the question:”
Highlight the most relevant paragraphs of retrieved contexts.
Remind the LLM to use given context only by saying “Answer from the provided context.”
Provide Q&A examples based on sample contexts to demonstrate ideal responses.
Append constraints like “Respond in 1–2 sentences”.
In LlamaIndex, a generator is created with index.as_query_engine(). Prompts are customized by updating the prompts dict.
RAG prompts from LangchainHub can be incorporated using LangchainPromptTemplate.
Few-shot examples can be added dynamically based on the query by defining a function to retrieve them.
Context transformations like PII filtering can be applied by defining filter functions.

8. Dynamic Prompting and Context Transformations

So far we have discussed techniques for optimizing prompts that are defined statically upfront. However, prompts can also be generated dynamically during runtime based on the specific query or context. This allows customizing prompts in a targeted way for each case rather than having one generic prompt template.

In this section, we will cover two powerful techniques: few-shot prompting and context transformations.

Few-Shot Prompting

Few-shot learning refers to the technique of providing examples in the prompt to guide the LLM. Typically, these examples demonstrate the desired response format or style for a given type of query.

For instance, if we want the LLM to respond to queries about summations in a bullet point format, we can provide a few examples of bullet point summaries generated for sample texts.

The key idea is that these few-shot examples are generated dynamically based on each specific query. This allows prompting the LLM with the most relevant examples each time.

Here is some sample code to accomplish dynamic few-shot prompting:

python

# Index some example QA pairs
example_docs = [
  {"query": "Summarize this text",
   "response": "- Main idea 1\n- Detail 1\n- Detail 2"},
  {"query": "Summarize this other text",
   "response": "- Main theme\n- Supporting evidence\n- Conclusion"}
]
example_index = VectorStoreIndex(example_docs)

# Function to retrieve examples for a query  
def get_few_shot_examples(query):
  examples = example_index.retrieve(query, k=2)
  return "\n\n".join([f"{eg['query']}\n{eg['response']}" for eg in examples])

# Prompt template
prompt = """
Here are some examples of desired responses:

{few_shot_examples}

Query: {query}
Response:
"""

# Generate prompt dynamically
prompt = prompt.format(
  few_shot_examples=get_few_shot_examples(query),
  query=query
)

By retrieving the most similar examples for each specific query, we can provide targeted few-shot guidance to the LLM each time. This helps steer the response towards the desired format and style.

Context Transformations

In some cases, we may want to transform the input context before feeding it into the LLM prompt. For instance, we can anonymize sensitive personal information for privacy, remove toxic content for safety, or extract only relevant sentences for conciseness.

These transformations can be achieved by writing functions that process the raw context and return the transformed output. This gives fine-grained control over the information consumed by the LLM.

Here is an example to remove personally identifiable information (PII) from a context:

Copy code

# PII removal modules
import Scrubber

# Function to filter out PII 
def remove_pii(context):
  cleaned = Scrubber.clean(context)
  return cleaned

# Prompt template 
prompt = """
Context:

{context}

Based on the context, answer the question:
"""

# Generate prompt with cleaned context
cleaned_context = remove_pii(raw_context) 
prompt = prompt.format(context=cleaned_context)

By processing the context upfront, we can address important concerns like data privacy and content safety even with large uncontrolled corpora.

Dynamic prompting gives us a way to programmatically customize prompts during runtime for each query.

Techniques like few-shot learning and context transformations help shape the information fed to the LLM, guiding it towards the right response format safely.

With the power of software, the possibilities for dynamic prompt engineering are endless!

Conclusion :

In this comprehensive guide, we explored the art of prompt engineering — crafting effective prompts to get the most out of large language models.

We discussed foundational strategies like setting clear goals, trying diverse phrasings, and adding examples and constraints. These techniques give us better control over the LLM’s response structure. Iterative prompting with human guidance and blending rules with LLM capabilities also improves results.

For retrieval augmented generation systems, we covered prompting techniques like query rewriting, document re-ranking, and training with query-document examples to make retrievers more contextual. Generator prompting strategies like highlighting relevant context, providing examples, and adding output constraints give us further control over the final response.

Dynamic prompting takes this a step further by customizing prompts during runtime based on the specific case using techniques like few-shot learning and context transformations. This allows safely steering LLMs in the right direction for each query.

By incrementally developing prompts, we can unlock impressive performance on a wide range of NLP tasks.

However, prompt engineering requires nuanced understanding of the interplay between prompts and LLMs.

It takes creativity, rigor, and iteration to find the right phrasings to activate the LLM’s capabilities.

As LLMs continue evolving rapidly in size and sophistication, prompt engineering will only grow in importance.

We hope this guide has equipped you with the key principles and skills to take advantage of these powerful models.

The prompts we use shape how these models interact with and impact the world. Developing prompts responsibly and ethically is a key step as we work towards democratizing access to capable and safe AI.

In Plain English

Thank you for being a part of our community! Before you go:

Be sure to clap and follow the writer! 👏
You can find even more content at PlainEnglish.io 🚀
Sign up for our free weekly newsletter. 🗞️
Follow us: Twitter(X), LinkedIn, YouTube, Discord.
Check out our other platforms: Stackademic, CoFeed, Venture.