Prompt Engineering via Prompt Patterns — Fact Check List Pattern

The article is part of series: Prompt Engineering via Prompt Patterns

You can switch to video version of this article

Have you ever participated in a lottery scheme, where the process of picking the winner through a random sampling method is being broadcasted live. Would you be more confident in transparency of the system if some humans are choosing winners from handwritten names or numbers, or a computer generating random numbers? If you are like most people, you would pick computer as the right answer. Afterall software doesn’t cheat, or lie, or does it?

To ruin your optimism, it does. At least the generative AI language models do, and do so very convincingly. It is really hard to tell. Not convinced? Try asking ChatGPT about a made up company say, sewing machines enterprises and ask it how it is doing against competition. Chances are you would be rewarded with a lengthy paragraph or more of text detailing company’s struggle against mighty competitors. If chances of that happening are high, chances of you doubting that maybe the company with such name actually exists is even higher. Such is convincing nature of generated text. There is even a term for it. Hallucinations.

I will be spending some time going over the issue and its background before jumping to the pattern itself because the issue is complex and has repercussions in real world usage of LLMs. Understanding it is really important.

There is the famous example of Google Bard claiming James Webb telescope taking pictures of exoplanets. Well it is pretty harmless until you believe it and try to use the unverified output in real life. You may have heard famous cases where lawyers were fined for citing fake cases in court, cases they researched using ChatGPT. Try giving it some complex mathematical question or logic based questions from your school, and chances of it coming up with a wrong answer are pretty high. Fake statistics don’t lag far behind. If you research for ChatGPT hallucinations on google, most answers on top would be regarding fake references or citations being generated for research purposes.

Don’t take the wrong impression. Large language models like ChatGPT or Bard are not deceitful evil villains trying to trick you with incorrect information. They are just models trying to come up with the most plausible output given your input based on the training data they worked with. Additionally, they have the generative AI capabilities built in which, in some contexts, gives you excellent results when trying to generate content like stories, or works of fiction, but the same ability can bite you when you are using them in more sober settings. Interestingly, you can ask ChatGPT why it provides fake references and citations and give it a chance to defend itself. And one plea it takes is that public internet, which acts as its training data, is riddled with fake references that seep in its responses, and model is unable to differentiate between credible and non credible sources.

Bottomline being that underlying all the sophistication where large language model is trying to generate a coherent, meaningful probability driven response to your query, it is a predictive model. The guesswork creeps in elsewhere where answers to mathematical questions are also ‘guessed’. Most people being optimists find it hard to double check or fact check everything that is provided to them, but sadly, where it comes to large language models, one is better off fact checking everything specifically if you intend to use the output in real life like cite the researched cases in court or add citations to your research paper. Chatbot operators are hard at work trying to minimize wrong answers, but the buck stops with you.

Piecing together available information using probability to come up with the most plausible answer, and generative AI are core features of large language models and they would be worthless without this amazing capability. You can’t take away the side effect of hallucination. Your options are to verify verify verify everything that it produces. In addition, since the article is about prompt patterns, which by definition are creative ways to solve common problems, there are ways to suppress or minimize (but not eliminate) or at least identify infactual or made up material in ChatGPT responses.

Enough background. Now I’m not gonna leave you at the mercy of large language models here. The article is part of Prompt Engineering via Prompt Patterns article series, and it’s time to introduce the fact check list pattern. The key is to leverage ability to instruct large language model to additionally generate a list of facts about the generated output, that should form an important part of the statements in the output. So the key statements used in pattern are

Contextual Statements

· Generate a set of facts that are contained in the output

· The set of facts should be inserted in a specific point in the output

· The set of facts should be the fundamental facts that could undermine the veracity of the output if any of them are incorrect

How does this technique work? Well the output text can be fairly large, and it might be difficult to properly go through each line and verify if it is accurate and factual. The list this pattern generates gives you a kind of shortcut to observe which assumptions and facts that have been used by large language model in that text. The list of facts allows you to understand how/why the generated output is produced this way, and also gauge whether the generated output missed some important consideration i.e. is the list of facts missing some important fact or consideration that should have been there. One also needs to ensure that facts are good enough, and ensure there isn’t a weird statement there listed as a fact. Any mistake in list of facts would be a bloated problem in the generated text and would serve as a good red flag to follow up on.

There is still hard work involved for you and the pattern does not fix the inherent issue of hallucinations or in factual output. It, for one, jolts you out of disbelief that there can be errors in the generated output, and kind of reminds you that you have a responsibility to fact check generated output. And secondly, you now have a fact list to begin the process of evaluation with. The bitter truth is that there is no guarantee of verifying accuracy of generated output even with this tool, but it is much better than evaluating generated output with no reference point at all. At least you can compare the output against the fact check list to verify that output actually conforms to the list of facts.

The pattern is particularly useful in the scenario where you don’t have full domain knowledge in which the output is being generated. As an example, a software developer generating code might want to use this pattern to generate facts focused on security aspects. That can be done using scoping statements like ‘Only generate facts related to security’. Taking another look at our earliest example of lottery algorithm, the factual statements might look like, there should be no duplicates, numbers should be evenly distributed etc, which would help you take another look at the generated numbers to not only ensure they conform to these facts, you can also come up with your own fact that say there should not be any skew in favor of odd vs even numbers or something and instruct it as fact to chatgpt to incorporate.

Here let me go beyond this prompt pattern and discuss another techniques to make response more accurate. Do note that the model would always try to ‘guess’ the output probabilistically even if you are say asking it a logic based maths problem. This would often result in incorrect answer. This would usually happen if there is computation or time required to come up with the answer and model would try to respond in the shortest possible time and would resort to taking shortcuts. The remedy is to add additional statements or steps it must perform before responding to you. Here you kind of take control of the steps it must perform before it responds to you so its ability to guess is bypassed by the steps you specify.

The second technique, specially suited to logical maths type problems, is to explicitly instruct it to work out its own solution before responding. And in case of research, instructions like find relevant information and then answer based on relevant information really help model to perform extra computation before responding to you and generally help reduce inaccurate or guess based answers. In many cases, being explicit in instructing model to avoid generating incorrect output like the techniques we just covered have shown remarkable improvement. Even instructions as simple as if you don’t have information just say ‘I don’t know’ would dramatically reduce chances of made up information being presented to you. Depending on situation to situation, you can instruct it accordingly to do due diligence just like you would instruct some human apprentice who is learning to put some effort in to generate better quality work.

This is it for fact check list pattern. You can checkout other useful patterns like this in our article series. If you liked, please give a clap/share. You can also consider subscribing to our YouTube channel as well. Thank you!!!.

Next: Prompt Engineering via Prompt Patterns — The Template Pattern