Build a GPT Agent With a Custom Knowledge Base and Email Functionality
Designing a Langchain agent with Pinecone Index and Zapier toolkit

Langchain agents have a huge potential for building custom conversational interfaces. With Langchain, you can use different types of data like URLs or PDFs to create a custom knowledge base. The agent can then use this knowledge base to answer questions and use other tools like a search engine or Zapier for other actions.
In this tutorial, we’ll walk through the process of building a Langchain agent that can answer questions based on a PDF document and can autonomously send emails using Zapier.
Setting Up
First, we need to install Langchain and other dependencies:
!pip install langchain !pip install pypdf !pip install pinecone-client !pip install openai !pip install tiktoken
We also need to set up API keys for OpenAI and Pinecone:
import os
import pinecone
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
# initialize pinecone
pinecone.init(
api_key="YOUR_PINECONE_API_KEY", # find at app.pinecone.io
environment="YOUR_ENVIRONMENT_NAME" # next to api key in console
)Creating an Index
A langchain agent can use our custom knowledge base to get the required information. To do so, we need to let the large language model know about our context. One way to do this is to feed all the context information to the model along with the prompt. However, this method becomes impractical when dealing with a large amount of data. Instead, we can use indexes to store our knowledge base.
In an index, all data is split into small chunks, and each chunk has a semantic meaning stored in vectors. When the user makes a query, the system searches for relevant vectors and then finds the relevant chunks of information. Instead of feeding all the data in one query, we only take relevant chunks and provide them as context to the large language model.
Load data from PDF
Now, let’s load the documents for a custom knowledge base. We’ll use a PDF file as an example, but Langchain also supports other formats.
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("PATH_TO_YOUR_FILE")
pages = loader.load_and_split()Split the text from the pdf into smaller chunks
There are many ways to split the text. We are using the text splitter that is recommended for generic texts. For more ways to slit the text check the documentation
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 1000,
chunk_overlap = 200,
length_function = len,
)
docs = text_splitter.split_documents(pages)Create embeddings
from langchain.embeddings.openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()Create a vectorstore
A vectorstore stores Documents and associated embeddings, and provides fast ways to look up relevant Documents by embeddings.
There are many ways to create a vectorstore. We are going to use Pinecone. For other types of vectorstores visit the documentation
First, you need to go to Pinecone and create an index there. Then type the index name in “index_name”
from langchain.vectorstores import Pinecone
index_name = "index_name"
#create a new index
docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)
# if you already have an index, you can load it like this
# docsearch = Pinecone.from_existing_index(index_name, embeddings)If you cannot create a Pinecone account, try to use CromaDB. The following code creates a transient in-memory vectorstore using Chroma. Use it if you don’t have access to Pinecone and use it instead. For further information check the documentation.
from langchain.vectorstores import Chroma
docsearch = Chroma.from_documents(docs, embeddings)Question Answering Chain
The question-answering chain will enable us to generate the answer based on the relevant context chunks. See the documentation for more explanation.
from langchain.chains import RetrievalQA
from langchain import OpenAI
#defining LLM
llm = OpenAI(temperature=0.2)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever(search_kwargs={"k": 2}))We can test our QA chain by passing a question through it:
query = "What is DesignOps support model?"
qa.run(query)The output of this code will be the answer to our question, based on the relevant context chunks from the PDF file.
Zapier Integration
We can use the Langchain Zapier toolkit to integrate our agent with Zapier. First, you need to get a Zapier API key here and add the actions that you are going to use in Zapier
os.environ["ZAPIER_NLA_API_KEY"] = os.environ.get("ZAPIER_NLA_API_KEY", "YOUR_ZAPIER_API_KEY")Next, we initialize our Zapier toolkit. For more information visit documentation.
from langchain.agents.agent_toolkits import ZapierToolkit
from langchain.utilities.zapier import ZapierNLAWrapper
zapier = ZapierNLAWrapper()
toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)Building a Langchain Agent
Now that we have all the tools we need, it’s time to assemble them into an agent. Assembling it all together into an agent. For more information check the documentation here, here, and here
from langchain.agents import AgentType
from langchain.agents import initialize_agent, Tool
from langchain.memory import ConversationBufferMemory
from langchain.chat_models import ChatOpenAI
#defining the tools for the agent
tools = [
Tool(
name = "Demo",
func=qa.run,
description="use this as the primary source of context information when you are asked the question. Always search for the answers using this tool first, don't make up answers yourself"
),
] + toolkit.get_tools()
#setting a memory for conversations
memory = ConversationBufferMemory(memory_key="chat_history")
#Setting up the agent
agent_chain = initialize_agent(tools, llm, agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION, verbose=True, memory=memory)Now that we have the agent set up, we can test it out by asking it some questions. The agent will use the question-answering chain to find the relevant context, generate an answer, and perform any other tasks specified by the user’s request.
agent_chain.run(input="What Adrienne Allnutt have said about DesignOps?")agent_chain.run(input="Email the answer to [email protected] and mention that this email was sent by AI")To get an agent to do what you want, the prompt should be constructed properly. For user-facing apps, we need to look at prompt templates and also figure out if chat is the best interface
Materials
You can find a source code in Colab here
For visual learners, I’ve also made a video tutorial






