avatarLucas Soares

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5192

Abstract

function is the director of the show. It takes the document, the number of questions per round, and a context window, creating a personalized experience for the user:</p><div id="4dd7"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">run_qa_session</span>(<span class="hljs-params">docs, num, context_window</span>): overall_scores = [] qa_dict = {} <span class="hljs-keyword">for</span> i, page_num <span class="hljs-keyword">in</span> <span class="hljs-built_in">enumerate</span>(<span class="hljs-built_in">range</span>(<span class="hljs-number">0</span>, <span class="hljs-built_in">len</span>(docs), context_window)): <span class="hljs-comment"># Concatenating the pages (set by the context_window) to give as context for the Q&A</span> context = <span class="hljs-string">""</span>.join([page.page_content <span class="hljs-keyword">for</span> page <span class="hljs-keyword">in</span> docs[page_num:page_num+context_window]]) q_a = create_qa(context, num) <span class="hljs-comment"># Create a list of questions and answers from the output string by leveraging the '---' separator</span> q_a_list = q_a.split(<span class="hljs-string">'---'</span>) scores_list = [] <span class="hljs-keyword">for</span> qa <span class="hljs-keyword">in</span> q_a_list: question = qa.split(<span class="hljs-string">"A:"</span>)[<span class="hljs-number">0</span>].replace(<span class="hljs-string">"Q:"</span>, <span class="hljs-string">""</span>) answer = qa.split(<span class="hljs-string">"A:"</span>)[<span class="hljs-number">1</span>].replace(<span class="hljs-string">"Q:"</span>, <span class="hljs-string">""</span>) user_answer = <span class="hljs-built_in">input</span>(question) qa_dict[<span class="hljs-string">f"Round <span class="hljs-subst">{i}</span>"</span>] = {<span class="hljs-string">"question"</span>: question, <span class="hljs-string">"answer"</span>: answer, <span class="hljs-string">"user_answer"</span>: user_answer} <span class="hljs-built_in">print</span>(<span class="hljs-string">"CORRECT ANSWER: "</span>, answer) <span class="hljs-built_in">print</span>(<span class="hljs-string">""</span>) score_feedback = evaluate_answer(question, answer, user_answer) score = score_feedback.split(<span class="hljs-string">"SCORE:"</span>)[<span class="hljs-number">1</span>].split(<span class="hljs-string">"FEEDBACK:"</span>)[<span class="hljs-number">0</span>] <span class="hljs-keyword">try</span>: feedback = score_feedback.split(<span class="hljs-string">"FEEDBACK:"</span>)[<span class="hljs-number">1</span>] <span class="hljs-keyword">except</span>: feedback = <span class="hljs-string">"Error getting feedback"</span> <span class="hljs-comment"># write a check to make sure the output can be turned into an integer</span> <span class="hljs-keyword">try</span>: score = <span class="hljs-built_in">int</span>(score) <span class="hljs-keyword">except</span>: <span class="hljs-built_in">print</span>(<span class="hljs-string">"The score could not be converted to an integer. Please try again."</span>) <span class="hljs-built_in">print</span>(<span class="hljs-string">"The output score was: "</span>, score) <span class="hljs-keyword">if</span> <span class="hljs-built_in">type</span>(score)==<span class="hljs-built_in">int</span>: scores_list.append(score) <span class="hljs-built_in">print</span>(<span class="hljs-string">"SCORE:"</span>, score) <span class="hljs-built_in">print</span>(<span class="hljs-string">""</span>) <span class="hljs-built_in">print</span>(<span class="hljs-string">"FEEDBACK:"</span>, feedback) <span class="hljs-built_in">print</span>(<span class="hljs-string">"*********"</span>) round_score = <span class="hljs-built_in">sum</span>(scores_list)/<span class="hljs-built_in">len</span>(scores_list) <span class="hljs-built_in">print</span>(<span class="hljs-string">"ROUND SCORE:"</span>, round_score) overall_scores.append(round_score) continue_input = <span class="hljs-built_in">input</span>(<span class="hljs-string">"Press enter to continue to the next round or press 'q' to quit."</span>) <span class="hljs-keyword">if</span> continue_input == <span class="hljs-string">"q"</span>: <span class="hljs-keyword">break</span>

<span class="hljs-keyword">return</span> overall_scores, qa_dict</pre></div><h2 id="47a4">Evaluating and scoring answers</h2><p id="3d16">Every question answered is evaluated by the `evaluate_answer` function. This function plays the part of a strict examiner, using GPT-3.5-turbo to check user’s answers against the correct answers, providing a score between 0 to 100:</p><div id="41ec"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">evaluate_answer</span>(<span class="hljs-par

Options

ams">question, true_answer, user_answer</span>): <span class="hljs-comment"># Evaluate the answer</span> evaluate_prompt = <span class="hljs-string">f"Given this question: <span class="hljs-subst">{question}</span> for which the correct answer is this: <span class="hljs-subst">{true_answer}</span>, give a score from 0 to 100 to the following answer given by the user: <span class="hljs-subst">{user_answer}</span>. The output should be formmated as follows: SCORE: <score number as an integer (e.g 45, 90, etc...)> \n: FEEDBACK: <A one sentence feedback justifying the score.>"</span> response = openai.ChatCompletion.create( model=<span class="hljs-string">"gpt-3.5-turbo"</span>, messages = [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"system"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"You are a helpful research and
programming assistant"</span>}, {<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: evaluate_prompt}] )

<span class="hljs-keyword">return</span> response[<span class="hljs-string">"choices"</span>][<span class="hljs-number">0</span>][<span class="hljs-string">"message"</span>][<span class="hljs-string">"content"</span>]</pre></div><p id="c687"><b>Showcasing the Scores</b></p><p id="4f57">Our journey is graphically represented with the `plot_scores` function, which proudly displays the scores of each round using Matplotlib and Seaborn:</p><div id="e6a1"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">plot_scores</span>(<span class="hljs-params">overall_scores</span>):
plt.plot(overall_scores)
plt.xlabel(<span class="hljs-string">"Round"</span>)
plt.ylabel(<span class="hljs-string">"Score"</span>)
plt.title(<span class="hljs-string">"Q&amp;A Session Scores"</span>)
plt.show()</pre></div><h2 id="dd66">The Main Function: Pulling the Strings</h2><p id="0918">Finally, we reach the main function, the puppeteer of this entire process. It delegates tasks like loading the paper, setting the number of questions and context window, running the Q&amp;A session, plotting the scores, and displaying the Q&amp;A data:</p><div id="33a3"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">main</span>():
docs = load_paper(file_path)
num = <span class="hljs-number">1</span>
context_window = <span class="hljs-number">3</span>
overall_scores, qa_dict = run_qa_session(docs, num, context_window)
plot_scores(overall_scores)
<span class="hljs-built_in">print</span>(<span class="hljs-string">"The Q&amp;A data: "</span>, qa_dict)

<span class="hljs-keyword">if</span> name == <span class="hljs-string">"main"</span>: file_path = <span class="hljs-string">"./paper.pdf"</span> main()</pre></div><h1 id="70cf">Wrapping Up</h1><p id="78f2">This script paints a clear picture of how to use the ChatGPT API to transform mundane academic papers into dynamic Q&A sessions, making it effortless for researchers and students to extract knowledge from it.</p><p id="fb36">By marrying ChatGPT with the PDF document loading and processing libraries, we’ve crafted a simple educational tool that greatly enhance information extraction from academic resources.</p><p id="6e9e">Check out the source code here:</p><div id="58b6" class="link-block"> <a href="https://github.com/EnkrateiaLucca/automatic_qas_for_reading_papers"> <div> <div> <h2>GitHub - EnkrateiaLucca/automatic_qas_for_reading_papers: Automatic Q&As with the ChatGPT API</h2> <div><h3>Automatic Q&As with the ChatGPT API. Contribute to EnkrateiaLucca/automatic_qas_for_reading_papers development by…</h3></div> <div><p>github.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*syUAYGn_UAPcxacI)"></div> </div> </div> </a> </div><p id="8f33">If you liked this post, <a href="https://lucas-soares.medium.com/membership">join Medium</a>, subscribe to my <a href="https://www.youtube.com/channel/UCu8WF59Scx9f3H1N_FgZUwQ">Youtube channel</a> and <a href="https://lucas-soares.medium.com/subscribe">my newsletter</a>. Thanks and see you next time! Cheers! :)</p><h2 id="d584">BECOME a WRITER at MLearning.ai //FREE ML Tools// Divine Code</h2><div id="6cd7" class="link-block"> <a href="https://readmedium.com/mlearning-ai-submission-suggestions-b51e2b130bfb"> <div> <div> <h2>Mlearning.ai Submission Suggestions</h2> <div><h3>How to become a writer on Mlearning.ai</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*6xCb1sNpjadaSBuVLPTFQQ.png)"></div> </div> </div> </a> </div></article></body>

Photo by Patrick Tomasso on Unsplash

Building an Automatic Q&A App with the ChatGPT API

Building an app that creates Q&A for interactively reading papers

In this article, we’ll build a Python script that utilizes OpenAI’s ChatGPT API to create an automatic Q&A session from academic papers.

Steps

This task involves loading a PDF file, extracting its content, generating questions and answers using GPT-3, and then evaluating the quality of answers provided by the user.

Below are the steps the script takes to create a Q&A session for a given paper (pdf file):

  1. Load the PDF document using PyPDFLoader from langchain.
  2. Creating the Q&A pairs
  3. Orchestrating the Q&A Session
  4. Evaluating and scoring answers
  5. Defining the main function

If you prefer video, check out my Youtube video on this topic here:

Delving Deep into the Code

Let’s dissect the code and study each of its integral components in detail.

Importing the Necessary Modules

The coding adventure begins with the necessary imports:

from langchain.document_loaders import PyPDFLoader 
import openai 
import matplotlib.pyplot as plt 
import seaborn as sns
sns.set()

Loading the PDF Document

Our `load_paper` function steps forward to take the file path to a PDF document and hands it over to PyPDFLoader, our trusty assistant in loading the content of the document:

def load_paper(file_path="./paper.pdf"): 
    loader = PyPDFLoader(file_path) 
    docs = loader.load() 
    return docs

Creating the Q&A Pairs

With the document in hand, the `create_qa` function steps up to take a context and a set number of questions to generate. With the help of GPT-3.5-turbo, it crafts a set of questions and answers from the given context:

def create_qa(context, num=5):
    # Defining the context for creating the Q&As
    # Prompt to create the questions
    q_a_prompt = f"Create a set of {num} questions with answers based solely on this text from a paper:\n\n{context}\n\n. Separate each block composed of a question and an answer with 3 dashes '---' like this Q: <question>\n A:<answer> --- Q: <question>\n A:<answer> etc.... Let's think step by step. Q:"
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "system", "content": "You are a helpful research and\
            programming assistant"},
                  {"role": "user", "content": q_a_prompt}]
    )
    
    return response["choices"][0]["message"]["content"]

Orchestrating the Q&A session

The `run_qa_session` function is the director of the show. It takes the document, the number of questions per round, and a context window, creating a personalized experience for the user:

def run_qa_session(docs, num, context_window):
    overall_scores = []
    qa_dict = {}
    for i, page_num in enumerate(range(0, len(docs), context_window)):
        # Concatenating the pages (set by the context_window) to give as context for the Q&A
        context = "".join([page.page_content for page in docs[page_num:page_num+context_window]])
        q_a = create_qa(context, num)
        # Create a list of questions and answers from the output string by leveraging the '---' separator
        q_a_list = q_a.split('---')
        scores_list = []
        for qa in q_a_list:
            question = qa.split("A:")[0].replace("Q:", "")
            answer = qa.split("A:")[1].replace("Q:", "")
            user_answer = input(question)
            qa_dict[f"Round {i}"] = {"question": question, "answer": answer, "user_answer": user_answer}
            print("CORRECT ANSWER: ", answer)
            print("***")
            score_feedback = evaluate_answer(question, answer, user_answer)
            score = score_feedback.split("SCORE:")[1].split("FEEDBACK:")[0]
            try:
                feedback = score_feedback.split("FEEDBACK:")[1]
            except:
                feedback = "Error getting feedback"
            # write a check to make sure the output can be turned into an integer
            try:
                score = int(score)
            except:
                print("The score could not be converted to an integer. Please try again.")
                print("The output score was: ", score)
            if type(score)==int:
                scores_list.append(score)
            print("SCORE:", score)
            print("***")
            print("FEEDBACK:", feedback)
            print("*********")
        round_score = sum(scores_list)/len(scores_list)
        print("ROUND SCORE:", round_score)
        overall_scores.append(round_score)
        continue_input = input("Press enter to continue to the next round or press 'q' to quit.")
        if continue_input == "q":
            break
        
        
    
    return overall_scores, qa_dict

Evaluating and scoring answers

Every question answered is evaluated by the `evaluate_answer` function. This function plays the part of a strict examiner, using GPT-3.5-turbo to check user’s answers against the correct answers, providing a score between 0 to 100:

def evaluate_answer(question, true_answer, user_answer):
    # Evaluate the answer
    evaluate_prompt = f"Given this question: {question} for which the correct answer is this: {true_answer}, give a score from 0 to 100 to the following answer given by the user: {user_answer}. The output should be formmated as follows: SCORE: <score number as an integer (e.g 45, 90, etc...)> \n: FEEDBACK: <A one sentence feedback justifying the score.>"
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages = [{"role": "system", "content": "You are a helpful research and\
            programming assistant"},
                    {"role": "user", "content": evaluate_prompt}]
    )

    return response["choices"][0]["message"]["content"]

Showcasing the Scores

Our journey is graphically represented with the `plot_scores` function, which proudly displays the scores of each round using Matplotlib and Seaborn:

def plot_scores(overall_scores):
    plt.plot(overall_scores)
    plt.xlabel("Round")
    plt.ylabel("Score")
    plt.title("Q&A Session Scores")
    plt.show()

The Main Function: Pulling the Strings

Finally, we reach the main function, the puppeteer of this entire process. It delegates tasks like loading the paper, setting the number of questions and context window, running the Q&A session, plotting the scores, and displaying the Q&A data:

def main():
    docs = load_paper(file_path)
    num = 1
    context_window = 3
    overall_scores, qa_dict = run_qa_session(docs, num, context_window)
    plot_scores(overall_scores)
    print("The Q&A data: ", qa_dict)

if __name__ == "__main__":
    file_path = "./paper.pdf"
    main()

Wrapping Up

This script paints a clear picture of how to use the ChatGPT API to transform mundane academic papers into dynamic Q&A sessions, making it effortless for researchers and students to extract knowledge from it.

By marrying ChatGPT with the PDF document loading and processing libraries, we’ve crafted a simple educational tool that greatly enhance information extraction from academic resources.

Check out the source code here:

If you liked this post, join Medium, subscribe to my Youtube channel and my newsletter. Thanks and see you next time! Cheers! :)

BECOME a WRITER at MLearning.ai //FREE ML Tools// Divine Code

Machine Learning
Artificial Intelligence
ChatGPT
Research
Ml So Good
Recommended from ReadMedium