
LANGCHAIN — Neon X Langchain HNSW in Postgres with pg_embedding
The best way to predict the future is to invent it. — Alan Kay
Neon team collaborated with LangChain to release the pg_embedding extension and PGEmbedding integration in LangChain for vector similarity search in Postgres. This integration uses the Hierarchical Navigable Small World (HNSW) index graph-based approach to indexing high-dimensional data. It constructs a hierarchy of graphs, resulting in a time complexity of O(log(rows)).
To get started with PGEmbedding, follow these steps:
- Log in to your Neon account and create a project:
npx neonctl auth npx neonctl projects create
- If you haven’t installed LangChain, follow the instructions in the documentation.
- Initialize the PGEmbedding vector store and execute a similarity analysis:
import os from typing
import List, Tuple
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PGEmbedding
from langchain.document_loaders import TextLoader
from langchain.docstore.document import Document
loader = TextLoader('state_of_the_union.txt')
raw_docs = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(raw_docs)
embeddings = OpenAIEmbeddings()
CONNECTION_STRING = os.environ["DATABASE_URL"] # Initialize the vectorstore, create tables and store embeddings and metadata.
db = PGEmbedding.from_documents(
embedding=embeddings,
documents=docs,
collection_name="state_of_the_union",
connection_string=CONNECTION_STRING,
) # Create the index using HNSW. This step is optional. By default the vectorstore uses exact search.
db.create_hnsw_index(max_elements=10000, dims=1536, m=8, ef_construction=16, ef_search=16) # Execute the similarity search and return documents
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db.similarity_search_with_score(query)
print('query done')
print("Results:")
for doc, score in docs_with_score:
print("-" * 80)
print("Score: ", score)
print(doc.page_content)
print("-" * 80)The PGEmbedding integration is faster than PGVector for 99% accuracy. It is generally faster, achieves higher accuracy for the same memory footprint, and uses relatively less memory. However, it may involve more computational intensive index construction. Ultimately, the choice between PGEmbedding and other vector stores should be based on the specific demands of your application.
Experiment with both approaches to find the one that best meets your needs for LLM applications.





