avatarGary A. Stafford

Summary

The content provides an advanced guide to mastering long document insights using Amazon Bedrock and Anthropic Claude 2 Foundation Model for comprehensive document analysis and summarization.

Abstract

The article delves into advanced techniques for summarizing and analyzing long texts, specifically illustrated through the analysis of Bram Stoker's "Dracula." Utilizing Amazon Bedrock in conjunction with the Anthropic Claude 2 Foundation Model, the post demonstrates how to process texts that exceed the context window of foundation models by splitting them into manageable chunks. It emphasizes the importance of prompt engineering to obtain desired outputs, showcasing methods for plot summarization, character analysis, identifying literary devices, setting descriptions, and generating questions and answers. The guide also includes practical code examples, references to external resources, and discusses the nuances of interacting with generative AI models.

Opinions

  • The author believes that traditional summarization methods are insufficient for very long texts and that advanced techniques are necessary to capture the essence of such documents.
  • The article suggests that the use of XML-style tags and specific instructions in prompts can significantly improve the quality and precision of AI-generated summaries.
  • There is an emphasis on the importance of keeping semantically related pieces of text together during the chunking process to maintain the contextual integrity of the original document.
  • The author holds that Generative AI can be a powerful tool for literary analysis, aiding in the understanding of elements such as characters, themes, and literary devices.
  • The post implies that the Anthropic Claude 2 model, with its large context window and training as a conversational agent, is particularly well-suited for complex text analysis tasks.
  • The article conveys the idea that the combination of Amazon Bedrock and the Anthropic Claude 2 model provides a versatile platform for developers to perform sophisticated text analysis without the need to manage the underlying infrastructure.

Mastering Long Document Insights: Advanced Summarization with Amazon Bedrock and Anthropic Claude 2 Foundation Model

Unleash the Power of Generative AI for Comprehensive Document Analysis and Summarization

Text summarization is a standard Natural Language Processing (NLP) task that creates a concise summary of a longer text. According to LangChain, with foundation models, if the text is too long to stuff into a single prompt, we could use techniques like Map Reduce to summarize each document in a ‘map’ step and then ‘reduce’ the summaries into a final summary. This technique is commonly referred to as a ‘summary of summaries.’ In this post, we will move beyond simple summarization and explore advanced techniques to analyze long texts using Amazon Bedrock and the Anthropic Claude 2 Foundation Model.

In the spirit of season 🎃, we will analyze Bram Stoker’s original 1897 Victorian-era graphic novel, Dracula, freely available through Project Gutenberg. Written in the epistolary style, the novel is based on diary entries, telegrams, letters, and memos, providing an additional level of complexity when analyzing the longer text. We will explore elements such as the novel’s context, setting, characters, plot, literary devices, and themes aided by Generative AI.

Source Code

The source code used in this post’s demonstration is open-sourced and available on GitHub. I suggest starting with the project’s Jupyter Notebook, which contains all the examples herein plus additional examples not included in the post.

Amazon Bedrock and Anthropic Claude

AWS announced the general availability of Amazon Bedrock in late September 2023. Amazon Bedrock is a fully managed serverless service that makes foundation models (FMs) from leading AI companies, such as Cohere, Anthropic, Stability AI, Meta, and AI21 Labs, available through an application programming interface (API). The Anthropic Claude 2 foundation model employed in this post was accessed through Amazon Bedrock using the Boto3 SDK for Python.

What is a Prompt?

Quoting Wikipedia, Prompt engineering is the process of structuring text that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform. Further, citing Prompt Engineering Guide’s Elements of a Prompt, a prompt can contain any of these four types of elements:

  • Instruction — a specific task or instruction you want the model to perform
  • Context — external information or additional context that can steer the model to better responses
  • Input Data — the input or question that we are interested in finding a response for
  • Output Indicator — the type or format of the output

You do not need all four elements for a prompt, and the format depends on the task and the model provider.

Model-specific Prompt Engineering

Generally, each model provider has some required or recommended methods for constructing an optimal prompt for their particular foundation model. For example, according to Anthropic’s guide, Using the correct format, Claude was trained as a conversational agent using these special tokens to mark who is speaking. The \n\nHuman: (you) asks a question or gives instructions and the \n\nAssistant: (Claude) responds. Additionally, according to Anthropic’s guide, Constructing a prompt, XML tags, like <tag></tag>, help demarcate some essential parts of your prompt, such as rules, examples, or input text to process. Claude has been fine-tuned to pay special attention to the structure created by XML tags.

Text Splitting

To summarize long texts, texts whose length exceeds the context window (aka token limit) of a foundation model must be split into what is commonly referred to as chunks. Even Anthropic Claude 2, with its massive 100K context window, cannot ingest an entire novel, whose 27 chapters total approximately 833K characters, 153K words, or 206K tokens.

Splitting or chunking a long text into smaller chunks is more challenging than it may sound. You could somewhat mindlessly set an arbitrary character chunk size and the chunk overlap. However, as LangChain’s Text splitter documentation states, ideally, you want to keep the semantically related pieces of text together. What semantically related means could depend on the type of text. Luckily enough, our novel contains 27 chapters with an average length of 75 paragraphs, 5,683 words, 30,874 characters, or 7,628 tokens. A chapter is defined as “a main division of a book.” Thus, each chapter should contain semantically related pieces of text.

We can use simple Regular Expressions to split the chapters and remove any extraneous content before or after the novel’s chapters:

import re

def split_book(book_text):
    # Use regular expressions to split the book into chapters
    # Specific to this Gutenberg eBooks format
    chapters = re.split(r"^CHAPTER [IVXLCDM]+$", book_text, flags=re.MULTILINE)
    
    # remove everything prior to chapter 1
    chapters.pop(0)

    # Split the last chapter into two parts and remove everything after "THE END"
    chapter26 = re.split(r"^.*THE END.*$", chapters[26], flags=re.MULTILINE)[0]
    chapters.pop(26)
    chapters.append(chapter26)

    return chapters

What are Tokens?

Anthropic defines tokens as the smallest individual “atoms” of a language model and can varyingly correspond to words, subwords, characters, or even bytes in the case of Unicode. For Claude, the average token is about 3.5 characters. Using Anthropic’s Python library to access the Anthropic REST API’s client.count_tokens() method, we can compute the average input token length of the chapters in the novel to be 7,628, with an average character-to-token ratio of 4:05, well within the ballpark of the Anthropic estimate of 3.5:1.

from anthropic import Anthropic

client = Anthropic()
chapter_token_count = client.count_tokens(chapter.strip())

Based on that average input token count, we could easily pass the contents of several chapters at one time into the Anthropic Claude 2 foundation model due to its 100K token context window. However, instead of figuring out other ways to split the novel and maintain the semantic relationship within chunks, we will use the individual chapters for all examples in this post.

Analysis Techniques

In the following section, we will examine several methods for analyzing the long text of the novel. As we progress through the examples, we will increase the complexity of the prompts, adding techniques to shape the output more precisely.

1. Plot Summarization

A simple summary of summaries is one of the most common examples of text summarization of long texts. In this example, first loop through each chapter and ask the foundation model to write a concise single-paragraph summary of the main points, events, and ideas contained in the chapter.

First, let’s look at a simple prompt that we could pass to the Anthropic Claude model, which should return a summary:

prompt = f"""\n\nHuman: Write a short summary of the following chapter:

    {chapter}.\n\nAssistant:"""

The challenge with this basic prompt is that we don’t know how large of a response we will get back or how it will be formatted. By increasing the amount of instructions we included in the prompt, we can increase the probability that we can get back the exact type of response we expect from the model:

prompt = f"""\n\nHuman: Write a concise, grammatically correct, single-paragraph summary of the following chapter:
    
    {chapter}\n\nAssistant:"""

In addition to adding even more instructions to the prompt, we can also use XML-style tags, such as <chapter></chapter>, which Anthropic recommends for their Claude models to demarcate sections of the prompt better.

prompt = f"""\n\nHuman: Write a concise, grammatically correct, single-paragraph summary of the chapter's main points, events, and ideas contained inside the <chapter></chapter> XML tags below.  
    
    <chapter>
    {chapter}
    </chapter>\n\nAssistant:"""

This prompt is then passed into the following function, which calls the Anthropic Claude 2 foundation model through Amazon Bedrock:

import json
import boto3
from botocore.exceptions import ClientError

def create_bedrock_connection(self):
    client_bedrock = boto3.client(
        "bedrock-runtime", 
        "us-east-1"
    )
    
    return client_bedrock


def create_summary(self, client_bedrock, prompt):
    try:
        body = json.dumps(
            {
                "prompt": f"\n\nHuman: {prompt}\n\nAssistant:",
                "max_tokens_to_sample": self.max_tokens_to_sample,
                "temperature": self.temperature,
                "top_k": self.top_k,
                "top_p": self.top_p,
                "stop_sequences": self.stop_sequences,
            }
        )

        accept = "application/json"
        content_type = "application/json"

        response = client_bedrock.invoke_model(
            body=body, 
            modelId=self.model, 
            accept=accept, 
            contentType=content_type
        )
        response_body = json.loads(response.get("body").read())
        return response_body
    except ClientError as ex:
        self.logger.error(ex)
        exit(1)

Summary of Summaries

Here is an example of the model’s summary of Chapter 1 of the novel:

Jonathan Harker travels by train from Munich to Bistritz, where he stays at the Golden Krone Hotel before departing for the Borgo Pass. He is met by a mysterious driver in a calèche who takes him through the Pass at night. Harker sees strange sights along the way, including blue flames and a ring of wolves surrounding them. Despite feeling afraid, Harker continues on to a ruined castle, arriving just before midnight. The chapter covers Harker’s journey to Transylvania and the foreboding events he witnesses that hint at the supernatural horrors that await him at Count Dracula’s castle.

Image from Midjourney using the novel’s generated summary

Using a nearly identical prompt, we can then pass the individual chapter summaries, stored in a single text file, to the same function to derive a summary of summaries — a summary of the complete novel. Note that we have added directions for the Assistant (Claude) to ensure the expected output further:

prompt = f"""\n\nHuman: Write a concise grammatically correct summary using only the information within the <text></text> XML tags below. 
    The Assistant will refrain from using bullet-point lists.
    
    <text>
    {input_text}
    </text>\n\nAssistant:"""

We should receive a response similar to the following based on a summarization of the individual chapter summaries:

The novel follows the harrowing journey of Jonathan Harker and his companions to defeat Count Dracula after realizing he is an immortal vampire. It begins with Harker’s ominous imprisonment in Dracula’s remote Transylvanian castle, where he first encounters the supernatural horror. Back in England, his fiancée Mina and her friend Lucy become targets of Dracula’s evil, as signs point to his arrival in London. Led by Professor Van Helsing, Harker joins Lucy’s suitors and doctors, John Seward and Van Helsing, to share information and hunt Dracula. After Lucy tragically dies and returns as a vampire, the men heroically destroy her. They race to find Dracula’s lairs across London and sterilize his boxes of native earth to weaken him. Their pursuit of Dracula leads them back to Transylvania, where they finally defeat him through a violent battle at his castle. But not before a bitten Mina faces her own dark transformation, and Quincey Morris sacrifices his life, striking the final blow to Dracula’s heart. United by love and courage, the characters embrace the reality of supernatural evil in the world, finding the strength to conquer it.

Image from Midjourney using the description of the novel’s setting in London

We can modify the prompt if we desire a bullet-point list-format summary of each chapter versus the previous paragraph format. Note we are further reinforcing the use of the context contained between the <chapter></chapter> tags in the instructions:

prompt = f"""Human: Write a bullet-point list of 3-4 main points, events, and ideas using only the information within the <text></text> XML tags below. 
    
    <text>
    {input_text}
    </text>\n\nAssistant:"""

For Chapter 1, the results should look similar to the following:

Here are the main points from the chapter:

  • Jonathan Harker travels by train from Munich to Bistritz, entering the region of Transylvania. He notes the changing landscape and glimpses the cultures/peoples that inhabit the region.
  • In Bistritz, the locals seem wary of Harker’s intended destination and make signs to ward off evil. Harker brushes this off as superstition.
  • Harker stays at the Golden Krone Hotel. He receives a note from Count Dracula welcoming him and mentioning a carriage ride.
  • During the carriage ride into the mountains, the driver seems increasingly anxious/eager. Harker sees a blue flame in the darkness that spooks the horses.
  • The howling of wolves surrounds them. The driver mysteriously disappears and returns.
  • They arrive at a ruined castle, assumed to be Dracula’s, under the moonlight. The scene is gloomy and ominous.

2. Character Analysis

In addition to a simple plot summary, we can analyze the context, setting, characters, literary devices, and themes aided by Generative AI. Next, we will focus on the main characters from the novel. To reduce the output size, we have asked the model only to return 3–4 main characters from each chapter. Note how we have added two additional sections, the <template> and <example> sections. These two sections will help reinforce the format of the model’s response. In the template, we use the square brackets to indicate placeholders the model will replace (e.g., [Character]: [Description]). Again, we will provide specific instructions to the Assistant to ensure the output is as expected.

prompt = f"""\n\nHuman: Provide a list of the 3-4 main characters and a brief description of each main character, in the chapter contained in the <chapter> tags below.
    The Assistant will order the main characters by how many times they are mentioned.
    The Assistant will number the characters.
    Follow the template contained in the <template> tags below and replace the placeholders with the relevant information:
    <template>
    [Number]. [Character]: [Description]
    </template>
    
    Here is an example contained in the <example> tags below:
    <example>
    1. Pink Panther: A suave and smooth-talking anthropomorphic animated panther.
    </example>
    
    <chapter>
    {chapter}
    </chapter>\n\nAssistant:"""

Passing our prompt to the same function as before, we should get back character descriptions similar to the following for each chapter:

Here are the main characters from the chapter:

  1. Jonathan Harker: A young English lawyer traveling to Transylvania to help a nobleman with a legal matter. He keeps a journal describing his journey.
  2. Count Dracula: A mysterious Transylvanian nobleman whom Jonathan is traveling to meet. He sends cryptic letters arranging Jonathan’s travel.
  3. The Coach Driver: Drives Jonathan through the Borgo Pass to reach Dracula’s castle. He seems uneasy about taking Jonathan there.
  4. Wolves: A pack of wolves surrounds the coach as it travels through the Borgo Pass. Their howling terrifies the horses and Jonathan.
Image from Midjourney using the description of the novel’s protagonist, Jonathan Harker

Using a nearly identical prompting strategy to the previous summary of summaries, we then pass the individual character descriptions to the same function as before to derive an overall character description for Count Dracula based on the individual chapter-level observations:

prompt = f"""\n\nHuman: Write a concise, grammatically correct, single-paragraph description of the main character, Dracula (aka Count Dracula), based on the following individual character descriptions. 
    The Assistant will refrain from using bullet-point lists.

    <summaries>
    {character_summaries}
    </summaries>\n\nAssistant:"""

We should receive a response similar to the following overall character description based on the individual chapter-level descriptions of Count Dracula:

“Here is a single-paragraph description of Count Dracula:

Count Dracula is an ancient vampire of nobility from the remote mountains of Transylvania who resides alone in a decaying yet magnificent castle. He initially appears charming and welcoming but gradually reveals his true sinister and threatening nature. With his gaunt figure, waxen skin, high aquiline nose, pointed ears, sharp teeth, and burning red eyes, Dracula exhibits a mysterious magnetism and power over humans, able to control animals and the weather. He attacks the innocent Jonathan Harker, later turning Lucy Westenra into a vampire. Persistent in his vampiric evil, Dracula possesses supernatural abilities and cunning intelligence, escaping capture multiple times. Though eloquent and refined on the surface, at his core, Dracula is a ruthless predator who spreads his undead curse, requiring the combined efforts of Dr. Van Helsing and his allies to defeat him finally.

Image from Midjourney using the description of the novel’s antagonist, Count Dracula

3. Character Types

To quote Essentials in Writing, without characters, there would be no plot or conflicts. Readers should take note of the author’s methods of character development, the characters’ traits and actions, and how the characters change (or don’t change) throughout the story. There are several typical character types used throughout fictional literature, including the Protagonist, Antihero, Antagonist, Guide, Contagonist, Sidekicks (Deuteragonist), Henchmen, Love Interest, Temptress, Confidant, and Foil.

Here is an example prompt to extract those character types from each chapter. Note how we use the <character_types> section to define a list of character types to be considered.

prompt = f"""\n\nHuman: The following list of character types, contained in the <character_types> tags below, are often found in fictional literature: 
    <character_types>
    - Protagonist
    - Antihero
    - Antagonist
    - Guide
    - Contagonist
    - Sidekicks (Deuteragonist)
    - Henchmen
    - Love Interest
    - Temptress
    - Confidant
    - Foil
    </character_types>

    Based on this list of character types, give 3-4 examples of character types found in the chapter in the <chapter> tags below, including the character name and an explanation of why.
    The Assistant will use a bullet-point list.
    The Assistant will not use square brackets in its response.

    Follow the template contained in the <template> tags below and replace the placeholders with the character name, character type, and explanation:
    <template>
    - [Character_Name] - [Character_Type]: [Explanation]
    </template>

    Here is an example contained in the <example> tags below:
    <example>
    - Love Interest - Minnie Mouse: Mickey Mouse's lifelong romantic interest.
    </example>

    <chapter>
    {chapter}
    </chapter>\n\nAssistant:"""

Passing our prompt to the same function as before, we should get back a list of character types similar to the following for each chapter:

Here are some examples of character types from the chapter:

  • Protagonist — Jonathan Harker: The main character and narrator who is devoted to protecting his wife, Mina.
  • Love Interest — Mina Harker: Jonathan’s wife, whom he loves deeply and wants to protect from Dracula.
  • Antagonist — Count Dracula: The vampire whom Jonathan and the others are trying to defeat.
  • Henchman — Dracula’s ‘earth-boxes’: The boxes of earth Dracula uses as hideouts that Jonathan and the others must find and destroy.
  • Guide/Confidant — Professor Van Helsing: The experienced vampire hunter guides the others in their quest against Dracula.

4. Literary Devices

According to Literary-Devices.com, literary devices are techniques and structures writers use to convey a message, create an effect, and evoke emotions in the reader. These devices can emphasize a literary work, make it more interesting, or convey a deeper meaning. Standard literary devices include Metaphor, Simile, Alliteration, Personification, and Symbolism.

Image from Midjourney using the description of Chapter 7 where the schooner Demeter comes ashore

We will use the following prompt to extract literary devices from the chapters and generate explanations for each device. We have provided a list of literary devices to the model in the <literary_devices> tags. Again, we have included additional instructions to the Assistant to ensure the output is as expected.

prompt = f"""\n\nHuman: The following list of literary devices inside the <literary_devices></literary_devices> XML tags below, are often found in fictional literature: 
    <literary_devices>
    Allegory, Alliteration, Allusion, Amplification, Anagram, 
    Analogy, Anthropomorphism, Antithesis, Chiasmus, Colloquialism, 
    Circumlocution, Epigraph, Euphemism, Foreshadowing, Hyperbole, 
    Imagery, Metaphor, Mood, Motif, Onomatopoeia, Oxymoron, Paradox, 
    Personification, Portmanteau, Puns, Satire, Simile, Symbolism, Tone
    </literary_devices>

    Based on the list of literary devices, give 2-3 examples of literary devices found inside the <chapter></chapter> XML tags below, and explain why.
    The Assistant will use a bullet-point list.
    The Assistant will refrain from including square brackets and XML tags in the response.
    
    Follow the template inside the <template></template> XML tags below for your response. Replace the placeholders, in square brackets, with the literary device and the explanation:
    <template>
    - [Literary_Device]: [Explanation]
    </template>

    <chapter>
    {chapter}
    </chapter>\n\nAssistant:"""

Passing our prompt to the same function again, we should get back a list of literary devices similar to the following for each chapter. Note how the model has incorporated quotes from the chapter into its explanation:

Here are some examples of literary devices found in the provided chapter:

  • Foreshadowing: When Jonathan describes seeing Dracula crawling down the castle wall like a lizard, this foreshadows Dracula’s inhuman, monstrous nature that will be further revealed later.
  • Imagery: Stoker uses vivid imagery to describe the setting, like “great jagged mountain fastnesses, rising peak on peak, the sheer rock studded with mountain ash and thorn.” This helps the reader visualize the remote, ominous location.
  • Symbolism: The crucifix Jonathan places above his bed symbolizes the protection of his Christian faith against the evil Dracula.
  • Metaphor: Dracula’s eyes are described as “blazing red with passion,” like the flames of hell, metaphorically equating Dracula with the devil.
  • Personification: The creeping mist is personified as having a “deadly chill” that creeps into Jonathan’s bones.
Image from Midjourney based on the foreshadowing literary device

5. Setting

According to Britannica, setting, in literature, is the location and time frame in which the action of a narrative takes place. The makeup and behavior of fictional characters often depend on their environment just as much as on their personal characteristics.

We will use the following prompt to generate a numbered list of no more than three settings from each chapter, in order according to how often they are mentioned in the chapter. Again, we use the <template> and <example> tags, along with specific instructions to the Assistant to ensure the output is as expected.

prompt = f"""Provide a list of the no more than three settings and a brief description of each setting inside the <chapter></chapter> XML tags below.
    The Assistant will order the settings by how many times they are mentioned in the chapter.
    The Assistant will number the list of settings.
    The Assistant will refrain from including square brackets and XML tags in the response.
    
    Follow the template inside the <template></template> XML tags below and replace the placeholders, in square brackets, with the relevant information:
    <template>
    [Number]. [Setting]: [Description]
    </template>

    Here is an example inside the <example></example> XML tags below:
    <example>
    1. Hoboken, New Jersey: Part of the New York metropolitan area on the banks of the Hudson River across from lower Manhattan, where the story takes place.
    </example>

    <chapter>
    {chapter}
    </chapter>"""

Passing our prompt to the same function again, we should get back a numbered list of settings similar to the following for each chapter:

Here are the top 3 settings mentioned in the chapter, with brief descriptions:

  1. Dr. Seward’s house/asylum: This is where the majority of the chapter takes place, with scenes in Dr. Seward’s study, Renfield’s room, etc. It is an asylum where Dr. Seward cares for psychiatric patients, including Renfield.
  2. Carfax: The old house purchased by Dracula in Purfleet. Some of the boxes of earth were delivered here from Dracula’s castle. The men plan to investigate Carfax further.
  3. Dracula’s castle: Dracula’s residence in Transylvania. It is where he originally lived before purchasing properties in England. Van Helsing recounts Dracula’s history and roots in the castle and the area around it.
Image from Midjourney based on a description of Dr. Seward’s asylum

6. Questions and Answers

In addition to analysis, we can leverage the large language model to produce questions and answers based on the content of each chapter. The questions could be used for developing training and evaluation material or converted to dense vector embeddings and used for semantic similarity searches. Note the requested question-and-answer format of the <template> section in this prompt. This is reinforced by providing an example in the <example> section. Lastly, we have added even more instructions to the “Assistant” to ensure the output is as expected.

prompt = f"""\n\nHuman: Generate a list of 6 questions that the following chapter, inside the <chapter></chapter> XML tags below, could be used to answer. 
    The Assistant will provide both the question and the answer.
    The Assistant will refrain from asking overly broad questions.
    The Assistant will refrain from using bullet-point lists.
    The Assistant will refrain from including square brackets and XML tags in the response.
    
    Follow the template inside the <template></template> XML tags below and replace the placeholders, in square brackets, with the relevant information:
    <template>
    Q: [Question]
    A: [Answer]
    </template>

    Here is an example inside the <example></example> XML tags below:
    <example>
    Q: What is the weather like in Spain?
    A: The rain in Spain stays mainly in the plain.
    </example>

    <chapter>
    {chapter}
    </chapter>\n\nAssistant:"""

Passing our prompt to the same function as before, we should get back a list of questions and answers similar to the following for each chapter:

Here are three questions that could be answered by the provided chapter:

Q: What happened when Jonathan Harker woke up back in his room after his encounter with Dracula? A: He was confused about how he got back to his room, indicating Dracula must have carried him there. His clothes were folded neatly, which was unlike him, and his watch was unwound, suggesting Dracula hurriedly undressed him.

Q: How does Jonathan plan to try to escape from Castle Dracula? A: He decides his only chance is to climb down the castle wall, sneak into Dracula’s room through the window, and search for the key that will let him out of the castle.

Q: What does Jonathan discover when he finds Dracula’s body in a box in the underground crypt? A: He finds Dracula appearing partially rejuvenated — with dark hair instead of white, and a flushed, blood-stained face. This horrifies Jonathan, who sees Dracula as a monster. In a rage, Jonathan tries to kill Dracula but fails before fleeing the crypt.

By modifying the question-and-answer format prompt above, we can generate multiple-choice questions. Note the increased complexity of the <template> section in this prompt. We can indicate precisely how many choices we want to generate.

prompt = f"""\n\nHuman: Generate a list of 6 multiple-choice questions that the following chapter, inside the <chapter></chapter> XML tags below, could be used to answer. 
    The Assistant will provide the question, four possible answers lettered a,b,c, and d, and the correct answer.
    The Assistant will ask brief, specific questions.
    The Assistant will refrain from using bullet-point lists.
    The Assistant will refrain from including square brackets and XML tags in the response.
    
    Follow the template inside the <template></template> XML tags below and replace the placeholders, in square brackets, with the relevant information:
    <template>
    Q: [Question]
    (a) [Choice_1]
    (b) [Choice_2]
    (c) [Choice_3]
    (d) [Choice_4]
    A: (Letter) [Correct_Answer]
    </template>
    
    Here is an example inside the <example></example> XML tags below:
    <example>
    Q: What color is fresh grass?
    (a) Red
    (b) Blue
    (c) Green
    (d) Yellow
    A: (c) Green
    </example>
    
    <chapter>
    {chapter}
    </chapter>\n\nAssistant:"""

Passing our modified prompt to the same function as before, we should get back multiple-choice questions similar to the following for each chapter:

Here are three multiple-choice questions based on the chapter:

Q: What was the name of the hotel where Jonathan Harker stayed in Bistritz? (a) The Golden Lion (b) The Silver Stag (c) The Golden Krone (d) The Red Bear

A: (c) The Golden Krone

Q: What did Jonathan order for dinner at the hotel that was described as a national dish? (a) Goulash (b) Schnitzel (c) Paprika Hendl (d) Sauerbraten

A: (c) Paprika Hendl

Q: What did the old lady at the hotel give Jonathan as a charm against evil before he departed? (a) A four-leaf clover (b) A crucifix (c) A rabbit’s foot (d) An amulet

A: (b) A crucifix

References

Conclusion

In this post, we learned how to move beyond simple long text summarization and discovered how to analyze different aspects of the document. Further, we build progressively refined prompts, which help precisely shape the output from the Anthropic Claude 2 foundation model, accessed through Amazon Bedrock. 👻

If you are not yet a Medium member and want to support authors like me, please sign up here: https://garystafford.medium.com/membership

This blog represents my viewpoints and not those of my employer, Amazon Web Services (AWS). All product names, logos, and brands are the property of their respective owners.

Generative Ai
Anthropic Claude
Text Summarization
Amazon Bedrock
AWS
Recommended from ReadMedium