function is the director of the show. It takes the document, the number of questions per round, and a context window, creating a personalized experience for the user:</p><div id="4dd7"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">run_qa_session</span>(<span class="hljs-params">docs, num, context_window</span>):
overall_scores = []
qa_dict = {}
<span class="hljs-keyword">for</span> i, page_num <span class="hljs-keyword">in</span> <span class="hljs-built_in">enumerate</span>(<span class="hljs-built_in">range</span>(<span class="hljs-number">0</span>, <span class="hljs-built_in">len</span>(docs), context_window)):
<span class="hljs-comment"># Concatenating the pages (set by the context_window) to give as context for the Q&A</span>
context = <span class="hljs-string">""</span>.join([page.page_content <span class="hljs-keyword">for</span> page <span class="hljs-keyword">in</span> docs[page_num:page_num+context_window]])
q_a = create_qa(context, num)
<span class="hljs-comment"># Create a list of questions and answers from the output string by leveraging the '---' separator</span>
q_a_list = q_a.split(<span class="hljs-string">'---'</span>)
scores_list = []
<span class="hljs-keyword">for</span> qa <span class="hljs-keyword">in</span> q_a_list:
question = qa.split(<span class="hljs-string">"A:"</span>)[<span class="hljs-number">0</span>].replace(<span class="hljs-string">"Q:"</span>, <span class="hljs-string">""</span>)
answer = qa.split(<span class="hljs-string">"A:"</span>)[<span class="hljs-number">1</span>].replace(<span class="hljs-string">"Q:"</span>, <span class="hljs-string">""</span>)
user_answer = <span class="hljs-built_in">input</span>(question)
qa_dict[<span class="hljs-string">f"Round <span class="hljs-subst">{i}</span>"</span>] = {<span class="hljs-string">"question"</span>: question, <span class="hljs-string">"answer"</span>: answer, <span class="hljs-string">"user_answer"</span>: user_answer}
<span class="hljs-built_in">print</span>(<span class="hljs-string">"CORRECT ANSWER: "</span>, answer)
<span class="hljs-built_in">print</span>(<span class="hljs-string">""</span>)
score_feedback = evaluate_answer(question, answer, user_answer)
score = score_feedback.split(<span class="hljs-string">"SCORE:"</span>)[<span class="hljs-number">1</span>].split(<span class="hljs-string">"FEEDBACK:"</span>)[<span class="hljs-number">0</span>]
<span class="hljs-keyword">try</span>:
feedback = score_feedback.split(<span class="hljs-string">"FEEDBACK:"</span>)[<span class="hljs-number">1</span>]
<span class="hljs-keyword">except</span>:
feedback = <span class="hljs-string">"Error getting feedback"</span>
<span class="hljs-comment"># write a check to make sure the output can be turned into an integer</span>
<span class="hljs-keyword">try</span>:
score = <span class="hljs-built_in">int</span>(score)
<span class="hljs-keyword">except</span>:
<span class="hljs-built_in">print</span>(<span class="hljs-string">"The score could not be converted to an integer. Please try again."</span>)
<span class="hljs-built_in">print</span>(<span class="hljs-string">"The output score was: "</span>, score)
<span class="hljs-keyword">if</span> <span class="hljs-built_in">type</span>(score)==<span class="hljs-built_in">int</span>:
scores_list.append(score)
<span class="hljs-built_in">print</span>(<span class="hljs-string">"SCORE:"</span>, score)
<span class="hljs-built_in">print</span>(<span class="hljs-string">""</span>)
<span class="hljs-built_in">print</span>(<span class="hljs-string">"FEEDBACK:"</span>, feedback)
<span class="hljs-built_in">print</span>(<span class="hljs-string">"*********"</span>)
round_score = <span class="hljs-built_in">sum</span>(scores_list)/<span class="hljs-built_in">len</span>(scores_list)
<span class="hljs-built_in">print</span>(<span class="hljs-string">"ROUND SCORE:"</span>, round_score)
overall_scores.append(round_score)
continue_input = <span class="hljs-built_in">input</span>(<span class="hljs-string">"Press enter to continue to the next round or press 'q' to quit."</span>)
<span class="hljs-keyword">if</span> continue_input == <span class="hljs-string">"q"</span>:
<span class="hljs-keyword">break</span>
<span class="hljs-keyword">return</span> overall_scores, qa_dict</pre></div><h2 id="47a4">Evaluating and scoring answers</h2><p id="3d16">Every question answered is evaluated by the `evaluate_answer` function. This function plays the part of a strict examiner, using GPT-3.5-turbo to check user’s answers against the correct answers, providing a score between 0 to 100:</p><div id="41ec"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">evaluate_answer</span>(<span class="hljs-par
Options
ams">question, true_answer, user_answer</span>):
<span class="hljs-comment"># Evaluate the answer</span>
evaluate_prompt = <span class="hljs-string">f"Given this question: <span class="hljs-subst">{question}</span> for which the correct answer is this: <span class="hljs-subst">{true_answer}</span>, give a score from 0 to 100 to the following answer given by the user: <span class="hljs-subst">{user_answer}</span>. The output should be formmated as follows: SCORE: <score number as an integer (e.g 45, 90, etc...)> \n: FEEDBACK: <A one sentence feedback justifying the score.>"</span>
response = openai.ChatCompletion.create(
model=<span class="hljs-string">"gpt-3.5-turbo"</span>,
messages = [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"system"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"You are a helpful research and
programming assistant"</span>},
{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: evaluate_prompt}]
)
<span class="hljs-keyword">return</span> response[<span class="hljs-string">"choices"</span>][<span class="hljs-number">0</span>][<span class="hljs-string">"message"</span>][<span class="hljs-string">"content"</span>]</pre></div><p id="c687"><b>Showcasing the Scores</b></p><p id="4f57">Our journey is graphically represented with the `plot_scores` function, which proudly displays the scores of each round using Matplotlib and Seaborn:</p><div id="e6a1"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">plot_scores</span>(<span class="hljs-params">overall_scores</span>):
plt.plot(overall_scores)
plt.xlabel(<span class="hljs-string">"Round"</span>)
plt.ylabel(<span class="hljs-string">"Score"</span>)
plt.title(<span class="hljs-string">"Q&A Session Scores"</span>)
plt.show()</pre></div><h2 id="dd66">The Main Function: Pulling the Strings</h2><p id="0918">Finally, we reach the main function, the puppeteer of this entire process. It delegates tasks like loading the paper, setting the number of questions and context window, running the Q&A session, plotting the scores, and displaying the Q&A data:</p><div id="33a3"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">main</span>():
docs = load_paper(file_path)
num = <span class="hljs-number">1</span>
context_window = <span class="hljs-number">3</span>
overall_scores, qa_dict = run_qa_session(docs, num, context_window)
plot_scores(overall_scores)
<span class="hljs-built_in">print</span>(<span class="hljs-string">"The Q&A data: "</span>, qa_dict)
<span class="hljs-keyword">if</span> name == <span class="hljs-string">"main"</span>:
file_path = <span class="hljs-string">"./paper.pdf"</span>
main()</pre></div><h1 id="70cf">Wrapping Up</h1><p id="78f2">This script paints a clear picture of how to use the ChatGPT API to transform mundane academic papers into dynamic Q&A sessions, making it effortless for researchers and students to extract knowledge from it.</p><p id="fb36">By marrying ChatGPT with the PDF document loading and processing libraries, we’ve crafted a simple educational tool that greatly enhance information extraction from academic resources.</p><p id="6e9e">Check out the source code here:</p><div id="58b6" class="link-block">
<a href="https://github.com/EnkrateiaLucca/automatic_qas_for_reading_papers">
<div>
<div>
<h2>GitHub - EnkrateiaLucca/automatic_qas_for_reading_papers: Automatic Q&As with the ChatGPT API</h2>
<div><h3>Automatic Q&As with the ChatGPT API. Contribute to EnkrateiaLucca/automatic_qas_for_reading_papers development by…</h3></div>
<div><p>github.com</p></div>
</div>
<div>
<div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*syUAYGn_UAPcxacI)"></div>
</div>
</div>
</a>
</div><p id="8f33">If you liked this post, <a href="https://lucas-soares.medium.com/membership">join Medium</a>, subscribe to my <a href="https://www.youtube.com/channel/UCu8WF59Scx9f3H1N_FgZUwQ">Youtube channel</a> and <a href="https://lucas-soares.medium.com/subscribe">my newsletter</a>. Thanks and see you next time! Cheers! :)</p><h2 id="d584">BECOME a WRITER at MLearning.ai //FREE ML Tools// Divine Code</h2><div id="6cd7" class="link-block">
<a href="https://readmedium.com/mlearning-ai-submission-suggestions-b51e2b130bfb">
<div>
<div>
<h2>Mlearning.ai Submission Suggestions</h2>
<div><h3>How to become a writer on Mlearning.ai</h3></div>
<div><p>medium.com</p></div>
</div>
<div>
<div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*6xCb1sNpjadaSBuVLPTFQQ.png)"></div>
</div>
</div>
</a>
</div></article></body>
Building an Automatic Q&A App with the ChatGPT API
Building an app that creates Q&A for interactively reading papers
In this article, we’ll build a Python script that utilizes OpenAI’s ChatGPT API to create an automatic Q&A session from academic papers.
Steps
This task involves loading a PDF file, extracting its content, generating questions and answers using GPT-3, and then evaluating the quality of answers provided by the user.
Below are the steps the script takes to create a Q&A session for a given paper (pdf file):
Load the PDF document using PyPDFLoader from langchain.
Creating the Q&A pairs
Orchestrating the Q&A Session
Evaluating and scoring answers
Defining the main function
If you prefer video, check out my Youtube video on this topic here:
Delving Deep into the Code
Let’s dissect the code and study each of its integral components in detail.
Importing the Necessary Modules
The coding adventure begins with the necessary imports:
from langchain.document_loaders import PyPDFLoader
import openai
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
Loading the PDF Document
Our `load_paper` function steps forward to take the file path to a PDF document and hands it over to PyPDFLoader, our trusty assistant in loading the content of the document:
With the document in hand, the `create_qa` function steps up to take a context and a set number of questions to generate. With the help of GPT-3.5-turbo, it crafts a set of questions and answers from the given context:
defcreate_qa(context, num=5):
# Defining the context for creating the Q&As# Prompt to create the questions
q_a_prompt = f"Create a set of {num} questions with answers based solely on this text from a paper:\n\n{context}\n\n. Separate each block composed of a question and an answer with 3 dashes '---' like this Q: <question>\n A:<answer> --- Q: <question>\n A:<answer> etc.... Let's think step by step. Q:"
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "system", "content": "You are a helpful research and\
programming assistant"},
{"role": "user", "content": q_a_prompt}]
)
return response["choices"][0]["message"]["content"]
Orchestrating the Q&A session
The `run_qa_session` function is the director of the show. It takes the document, the number of questions per round, and a context window, creating a personalized experience for the user:
defrun_qa_session(docs, num, context_window):
overall_scores = []
qa_dict = {}
for i, page_num inenumerate(range(0, len(docs), context_window)):
# Concatenating the pages (set by the context_window) to give as context for the Q&A
context = "".join([page.page_content for page in docs[page_num:page_num+context_window]])
q_a = create_qa(context, num)
# Create a list of questions and answers from the output string by leveraging the '---' separator
q_a_list = q_a.split('---')
scores_list = []
for qa in q_a_list:
question = qa.split("A:")[0].replace("Q:", "")
answer = qa.split("A:")[1].replace("Q:", "")
user_answer = input(question)
qa_dict[f"Round {i}"] = {"question": question, "answer": answer, "user_answer": user_answer}
print("CORRECT ANSWER: ", answer)
print("***")
score_feedback = evaluate_answer(question, answer, user_answer)
score = score_feedback.split("SCORE:")[1].split("FEEDBACK:")[0]
try:
feedback = score_feedback.split("FEEDBACK:")[1]
except:
feedback = "Error getting feedback"# write a check to make sure the output can be turned into an integertry:
score = int(score)
except:
print("The score could not be converted to an integer. Please try again.")
print("The output score was: ", score)
iftype(score)==int:
scores_list.append(score)
print("SCORE:", score)
print("***")
print("FEEDBACK:", feedback)
print("*********")
round_score = sum(scores_list)/len(scores_list)
print("ROUND SCORE:", round_score)
overall_scores.append(round_score)
continue_input = input("Press enter to continue to the next round or press 'q' to quit.")
if continue_input == "q":
breakreturn overall_scores, qa_dict
Evaluating and scoring answers
Every question answered is evaluated by the `evaluate_answer` function. This function plays the part of a strict examiner, using GPT-3.5-turbo to check user’s answers against the correct answers, providing a score between 0 to 100:
defevaluate_answer(question, true_answer, user_answer):
# Evaluate the answer
evaluate_prompt = f"Given this question: {question} for which the correct answer is this: {true_answer}, give a score from 0 to 100 to the following answer given by the user: {user_answer}. The output should be formmated as follows: SCORE: <score number as an integer (e.g 45, 90, etc...)> \n: FEEDBACK: <A one sentence feedback justifying the score.>"
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages = [{"role": "system", "content": "You are a helpful research and\
programming assistant"},
{"role": "user", "content": evaluate_prompt}]
)
return response["choices"][0]["message"]["content"]
Showcasing the Scores
Our journey is graphically represented with the `plot_scores` function, which proudly displays the scores of each round using Matplotlib and Seaborn:
Finally, we reach the main function, the puppeteer of this entire process. It delegates tasks like loading the paper, setting the number of questions and context window, running the Q&A session, plotting the scores, and displaying the Q&A data:
This script paints a clear picture of how to use the ChatGPT API to transform mundane academic papers into dynamic Q&A sessions, making it effortless for researchers and students to extract knowledge from it.
By marrying ChatGPT with the PDF document loading and processing libraries, we’ve crafted a simple educational tool that greatly enhance information extraction from academic resources.