Get Answers From Your PDF — Azure OpenAI and Langchain

Summary

The article provides a step-by-step guide on using Azure OpenAI and Langchain to query PDF documents and extract answers.

Abstract

The article titled "Get Answers From Your PDF — Azure OpenAI and Langchain" by Shweta Lodha is a comprehensive tutorial that guides readers through the process of setting up an environment to query PDF files using Azure OpenAI services. It begins with the importation of necessary packages from Langchain and Azure OpenAI, followed by setting environment variables with details from the Azure portal and Azure OpenAI Studio. The author also includes a video for assistance. The process involves loading a PDF using Langchain's UnstructuredFileLoader, splitting the text into manageable chunks, generating embeddings, and using a large language model to answer queries based on the PDF content. The article concludes with a recommendation to watch a video recording for clarity and a link to another article for using OpenAI instead of Azure OpenAI. Additionally, the author promotes an AI service as a cost-effective alternative to ChatGPT Plus.

Opinions

The author believes that the provided instructions and video demonstrations will be helpful for users to navigate the process.
The author suggests that the described method is efficient for querying PDFs and provides a practical example with the question "What are the effects of homelessness?"
A recommendation is made for a specific AI service, ZAI.chat, as a more affordable option compared to ChatGPT Plus, indicating a preference or endorsement of this service.
The author encourages readers to engage with their content on Medium by joining the membership program, which implies that the author values reader support and continuous engagement with the audience.

Import Required Packages

Here are the packages which we need to import to get started:

from dotenv import load_dotenv
from langchain.document_loaders import UnstructuredFileLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.llms import AzureOpenAI

Set Environment Variables

First of all, we need to set few variables with information from Azure portal and Azure OpenAI Studio:

OPENAI_API_TYPE = "Azure"
OPENAI_API_VERSION = "2022-12-01"
OPENAI_API_BASE = "ENDPOINT"
OPENAI_API_KEY = "API_KEY" 
DEPLOYMENT_NAME = "DEPLOYMENT_NAME_FROM_AI_STUDIO"

If you are not sure how to grab above values, I would recommend you watch my below video on this.

Next, we will go ahead and use above variables to set environment variables:

from dotenv import load_dotenv

os.environ["OPENAI_API_TYPE"] = OPENAI_API_TYPE
os.environ["OPENAI_API_VERSION"] = OPENAI_API_VERSION
os.environ["OPENAI_API_BASE"] = OPENAI_API_BASE
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
load_dotenv()

Prepare Model And Embeddings

Till here, we are ready with our data. Now the only thing remaining is, generating embedding, associating them with text, select a large language model and stuff the data into it. All these steps can be done in just few lines of code as shown below:

embeddings = OpenAIEmbeddings()
doc_search = Chroma.from_documents(texts,embeddings)
chain = RetrievalQA.from_chain_type(llm=AzureOpenAI(model_kwargs={'engine':'text-davinci-002'}),chain_type='stuff', retriever = doc_search.as_retriever())

Get Answers From Your PDF — Azure OpenAI and Langchain

Import Required Packages

Set Environment Variables

Load PDF

Split Documents Into Chunks

Prepare Model And Embeddings

Create Query And Get Response

Join Medium with my referral link - Shweta Lodha

Read every story on Medium by joining membership of $5/month Your membership fee directly supports me and other writers…