The article provides a step-by-step guide on using Azure OpenAI and Langchain to query PDF documents and extract answers.
Abstract
The article titled "Get Answers From Your PDF — Azure OpenAI and Langchain" by Shweta Lodha is a comprehensive tutorial that guides readers through the process of setting up an environment to query PDF files using Azure OpenAI services. It begins with the importation of necessary packages from Langchain and Azure OpenAI, followed by setting environment variables with details from the Azure portal and Azure OpenAI Studio. The author also includes a video for assistance. The process involves loading a PDF using Langchain's UnstructuredFileLoader, splitting the text into manageable chunks, generating embeddings, and using a large language model to answer queries based on the PDF content. The article concludes with a recommendation to watch a video recording for clarity and a link to another article for using OpenAI instead of Azure OpenAI. Additionally, the author promotes an AI service as a cost-effective alternative to ChatGPT Plus.
Opinions
The author believes that the provided instructions and video demonstrations will be helpful for users to navigate the process.
The author suggests that the described method is efficient for querying PDFs and provides a practical example with the question "What are the effects of homelessness?"
A recommendation is made for a specific AI service, ZAI.chat, as a more affordable option compared to ChatGPT Plus, indicating a preference or endorsement of this service.
The author encourages readers to engage with their content on Medium by joining the membership program, which implies that the author values reader support and continuous engagement with the audience.
Get Answers From Your PDF — Azure OpenAI and Langchain
In this article, I’ll walk you through all the steps required to query your PDFs and get response out of it using Azure OpenAI.
Image by Denys Vitali from Pixabay
Let’s get started by importing the required packages.
Import Required Packages
Here are the packages which we need to import to get started:
from dotenv import load_dotenv
from langchain.document_loaders import UnstructuredFileLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.llms import AzureOpenAI
Set Environment Variables
First of all, we need to set few variables with information from Azure portal and Azure OpenAI Studio:
Once the PDF is loaded, next we need to divide our huge text into chunks. You can define chunk size based on your need, here I’m taking chunk size as 800 and chunk overlap as 0.
Till here, we are ready with our data. Now the only thing remaining is, generating embedding, associating them with text, select a large language model and stuff the data into it. All these steps can be done in just few lines of code as shown below: