Summarize documents with ChatGPT in Python
a walkthrough to build a Python app based on ChatGPT

Scientific papers have already abstracts that summarize papers. However, other types of documents no, therefore it is not a bad idea to practice how to use ChatGPT for this purpose. Moreover, since this is a walkthrough in Python, the natural language processing (NLP) steps can be modified for othe purposes NLP related. In the following, we iterate to have an individual summary per page, but we could push this further.
1. If you are running in Colab or on a machine which does not have the API installed, first proceed with the installation. The PyPDF library is because we are assuming the input is from a PDF. If you use CSV, DOC or other files, change this. The “!” is only required in Colab not normal shells.
!pip install PyPDF2 !pip install openai
2. Now you can import those libraries
import PyPDF2
import openai3. Initialize an empty string which will contain the summarized text
pdf_summary_text = ""4. Read an hypothetical PDF name “my_pdf.pdf”
pdf_file = open("my_pdf.pdf", 'rb')
pdf_reader = PyPDF2.PdfReader(pdf_file)5. Loop over the pages
for page_num in range(len(pdf_reader.pages)):
page_text = pdf_reader.pages[page_num].extract_text().lower()6. Give the text to the model and ask for a summary using the GPT-3.5-turbo model, and consider further modification in style
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": f"Summarize this: {page_text}"},
],
)
page_summary = response["choices"][0]["message"]["content"]7. Now we can append the individual page summary in one unique summary, and close the PDF file reading
pdf_summary_text += page_summary + "\n"
summary_file = "output_summary.txt")
with open(summary_file, "w+") as file:
file.write(pdf_summary_text)
# END For loop
pdf_file.close()That’s all!
This is just an educational script, though it opens a lot of potential since once you get a reply from ChatGPT, it can be further used in many ways and even embedded in Streamlit applications. Have fun!
If anyone is wondering why entrusting text summarization or manipulation to a chatbot, the reply can be speed, accuracy, flexibility and scalability. ChatGPT has been trained on a vast corpus of text data, which allows it to understand natural language and process it quickly and accurately. This means it can easily identify and extract information from text, such as keywords, topics, and sentiment.
- Speed: ChatGPT can process text very quickly, making it an ideal tool for tasks that involve processing large amounts of text data. It can also generate text at a high rate, making it useful for tasks such as chatbot responses, language translation, and summarization.
- Accuracy: ChatGPT has been trained on a massive amount of text data, making it highly accurate in identifying patterns and making predictions. This makes it an excellent tool for tasks such as sentiment analysis, text classification, and topic modeling.
- Flexibility: ChatGPT can be customized to perform a wide range of text manipulation tasks, from simple tasks like spell-checking and grammar correction to more complex tasks like text summarization and language translation.
- Scalability: ChatGPT can handle large volumes of text data and can scale up or down based on the size of the task at hand. This makes it a great choice for applications that require processing large amounts of text data, such as social media analysis or customer feedback analysis.
So, do not worry, just keep in mind it is a langua model trained to do so, it is not William Shakespeare.
If you enjoyed the reading please consider sharing it around and sign up to my mailing list.
Or simply connect:








