Promptbreeder: Prompting LLM in a better way
LLMs are great, but not everyone can use them with the same efficiency. The better your prompts are, the better your results become. With time and experience, our prompting skills improve, and so do our results and efficiency. But what if I tell you that there is a new way that automatically finds the best prompts for you, and this is where Prmptbreeder comes. Promptbreeder is a technique that self-improves the prompts to find the most optimum result. This is not the first method to do so; Chain-of-thought prompting is another method that enhances the reasoning of Large Language Models (LLMs). Usually, manually crafted prompt strategies are sub-optimal.
What is PROMPTBREEDER, and what it achieves?
- PROMPTBREEDER is a self-improving system that evolves prompts for specific domains.
- Using LLMs, it adjusts and assesses task prompts based on training data over multiple iterations.
- PROMPTBREEDER also refines the rules (mutation-prompts) guiding task-prompt adjustments. This results in a dual layer of self-improvement: refining prompts and refining methods (self-referential).
- PROMPTBREEDER outperforms leading strategies in arithmetic and reasoning tests.
- It can also create detailed prompts for complex challenges like hate speech classification.
Prompt strategies are traditionally manually designed. However, the effectiveness of a prompt is heavily influenced by its specific phrasing, leading to a growing interest in automating the process of prompt engineering. The Automatic Prompt Engineer (APE) was one attempt to tackle this challenge. It aimed to generate prompts based on input-output examples from datasets. Yet, APE’s iterative refinement showed diminishing benefits after a few rounds. To address this limitation, a new approach has been proposed that uses a diversity-maintaining evolutionary algorithm for the continuous and self-referential improvement of prompts for Large Language Models (LLMs).
Introduction
Correctly prompting an LLM is vital for optimal performance. Recent studies have sought ways to enhance or automate this process. Chain-of-Thought Prompting (CoT) is a prominent method that provides intermediary reasoning steps, enhancing the reasoning abilities of LLMs. Extensions to CoT, such as Self-Consistency (CoT-SC), focus on deriving the most consistent answer from multiple solutions. Some techniques involve planning before problem-solving, decomposing problems, or refining solutions. Additionally, “Soft Prompting” techniques adjust prompt representations directly. There are concerns, however, that continuously updating LLM parameters might be unsustainable as models become more complex.
Prompt engineering methods are often manually designed and domain-independent. The idea is that automation and self-improvement can customize prompts for specific areas. Auto-CoT and AutomaticCoT create reasoning chains automatically for Few-Shot CoT. Another method, APE, generates and alters prompts.
Simultaneously, OPRO uses a unique mutation prompt for optimization, testing the results on selected problems. Another approach, EvoPrompt, utilizes a fixed mutation to yield new prompts.
The ambition of creating a self-improving system in AI is longstanding. Earlier works tackled self-referential neural networks adjusting their parameters. Modern efforts aim for scalability, inspired by past models, but there are challenges in scaling these methods for contemporary LLMs. Observations indicate that LLMs can produce variations from examples and assess novelty. The evolution of unique prompts is comparable to systems like Picbreeder, which evolves distinct images. The trend suggests a shift from data-driven learning to a more self-determined learning approach.
PROMPTBREEDER
Promptbreeder is a system designed to optimize how we prompt or guide Large Language Models (LLMs) for better results. Its objective is to automatically find the best “task prompts” that, when given to an LLM, improve the LLM’s ability to answer questions in a given domain.
Task-prompts and Mutation-prompts
- A *task-prompt (P)* preconditions the LLM to provide better responses. It’s like setting the context before asking a question.
- Promptbreeder uses *mutation-prompts (M)* to create task-prompt variations. Think of mutation-prompts as tools or templates to modify the existing task-prompts.
Evolutionary Approach
- Promptbreeder uses an evolutionary algorithm to “evolve” or develop these task-prompts. It’s similar to natural selection: the best prompts are kept and modified, while the less effective ones are discarded.
- Another LLM performs the “mutation” in this algorithm. This LLM produces a new task prompt when given a mutation and task prompt. Mathematically, this can be represented as ( P’ = LLM(M + P)), where ‘+’ is string concatenation.
Self-Referential Mechanism
- Promptbreeder doesn’t just evolve task prompts; it also evolves the mutation prompts using a “hyper-mutation prompt (H)”. This is a higher-level evolution. The formula for a mutated mutation prompt is ( M’ = LLM(H + M)).
- This is like evolving the evolution process, making the system more adaptable and efficient.
Initialization and Evolution
- With some initial mutation prompts and problem descriptions, Promptbreeder creates an initial population of task prompts.
- Each task-prompt is paired with a mutation-prompt in a 1:1 ratio during the evolution.
- The evolution process involves comparing two task prompts, selecting the better one (based on performance), mutating it, and replacing the less effective one. The binary tournament genetic algorithm inspires this method.

Mutation Operators
Direct Mutation
- Zero-order Prompt Generation: Combines the problem description with “A list of 100 hints:” to generate a new task prompt.
- First-order Prompt Generation: Merges a mutation prompt with an existing task prompt to create a mutated task prompt.
Estimation of Distribution Mutation
- Filters and ranks current task prompts using BERT embedding cosine similarities.
- Provide this list to the LLM for continuation with new task prompts.
- Variants: EDA Rank, IndexMutation (task-prompts in fitness order), and Lineage-Based Mutation (list of best prompts in chronological order).
Hypermutation: Mutation of Mutation-prompts:
- Zero-order Hyper-Mutation: Combines problem description with a thinking style to generate a new mutation-prompt.
- First-order Hyper-Mutation: Uses the prompt “Please summarize and improve the following instruction:” combined with an existing mutation prompt to generate a new mutation prompt.
Lamarckian Mutation
- Transforms a successful problem-solving method into a new task prompt.
- Uses correct workings to reverse-engineer the task prompt.
Prompt Crossover and Context Shuffling:
- Prompt Crossover: 10% chance that after a mutation, a task-prompt gets replaced with another from the population, selected based on fitness.
- Context Shuffling: The list of correct workings (few-shot context) can be evolved. If full, a new correct method can replace an existing one after evaluation.
We are not going to talk about results here because there are quite a lot of pages in there. If interested, here’s the paper:





