
LANGCHAIN — What Is DataHerald?
First, solve the problem. Then, write the code. — John Johnson.
Dataherald is an open-source natural language to SQL engine built on LangChain, with a focus on providing accurate semantic translations. It addresses the challenge of modern large language models (LLMs) being better at writing procedural code than SQL due to factors such as missing metadata and difficulty with complex SQL queries. In this tutorial, we’ll explore how Dataherald works and the underlying LangChain agents it utilizes.
How Dataherald Works
Dataherald offers an open-source NL-to-SQL engine, with an option for a hosted API. It allows users to add business context, create training data, and fine-tune LLMs to their schema. The core of the product consists of two LangChain agents that perform the NL-to-SQL translation.
RAG Agent
The RAG agent is used when developers lack a substantial set of sample Question<>SQL pairs for fine-tuning or training the LLM. It connects to the database, extracts essential information for SQL generation, and uses tools such as a schema-linking tool, SQL execution tool, and a few-shot sample retriever tool.
# Example code for connecting to the database and extracting essential information
from dataherald.rag_agent import RAGAgent
rag_agent = RAGAgent()
rag_agent.connect_to_database("your_database_credentials")
table_schema = rag_agent.extract_table_schema()
# Other essential information extractionAgent with LLM-as-a-Tool
Once there are more than 10 golden SQL per table, the more advanced agent can be used, which involves fine-tuning a model and using the LLM-as-a-tool. This agent executes generated SQL queries against the database to validate correctness and retrieve necessary information.
# Example code for using the advanced agent with LLM-as-a-tool
from dataherald.llm_agent import AdvancedAgent
advanced_agent = AdvancedAgent()
advanced_agent.fine_tune_model("your_dataset")
result = advanced_agent.execute_query("generated_sql")
# Other operations with the advanced agentConclusion
Dataherald empowers developers and data teams to efficiently translate natural language queries into SQL, catering to companies of all sizes. It provides support for conversational interfaces and self-service data access. The upcoming developments in Dataherald include LangChain integration, increased support for open source LLMs, and the ability for agents to ask follow-up questions.
If you’re struggling with NL-to-SQL translations and want to streamline the process, consider exploring Dataherald.
In this tutorial, we’ve explored the working of Dataherald, an open-source NL-to-SQL engine built on LangChain. We’ve looked into the RAG agent and the advanced agent with LLM-as-a-tool, providing code snippets for connecting to databases, extracting essential information, and executing SQL queries. Dataherald offers a promising solution for accurately translating natural language queries into SQL, catering to a wide range of businesses.





