LANGCHAIN — Benchmarking Question Answering over CSV Data

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2270

Abstract

on.</p><h2 id="a53a">Installation:</h2><p id="0ff0">Memory Profiler can be installed from PyPl using:</p><div id="de37"><pre>pip <span class="hljs-keyword">install</span> -U memory_profiler</pre></div><p id="d187">and can be imported using</p><div id="3752"><pre><span class="hljs-keyword">from</span> memory_profiler <span class="hljs-keyword">import</span> profile</pre></div><h2 id="6156">Usage:</h2><p id="2314">After everything is set up, it's pretty easy to use this module to track the memory consumption of the function. <code>@profile</code> decorator can be used before every function that needs to be tracked. This will track the memory consumption line-by-line in the same way as of <a href="https://pypi.org/project/line-profiler/">line-profiler</a>.</p><p id="f468">After decorating all the functions with <code>@profile</code> execute the python script with a specific set of arguments.</p> <figure id="b96f"> <div> <div>

            <iframe class="gist-iframe" src="/gist/satkr7/0102b4e5e2acff9db15809298e77a1d1.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="7b9f">Execute the above Python script using bypassing <code>-m memory profiler</code> to the Python interpreter. This will load the memory_profiler module and print the memory consumption line-by-line.</p><p id="ebf8">Use the below to execute the Python script along with the memory profiler.</p><div id="f938"><pre><span class="hljs-keyword">python</span> -<span class="hljs-keyword">m</span> memory_profiler <span class="hljs-symbol">&lt;filename&gt;</span>.<span class="hljs-keyword">py</span></pre></div><figure id="8ba9"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*--WZVx_xXuDeyE2slwvmjA.png"><figcaption>(Image by Author)</figcaption></figure><p id="6a84">After successful execution, you will get a line-by-line memory consumption report, similar to the above image. The report has 5 columns:</p><ul><li><b>Line #</b>: Line Number</li><li><b>Line Contents</b>: Python code at each line number</li><li><b>Mem usage</b>: Memory usage by the Python interpreter after every execution of the li

Options

ne.</li><li><b>Increment</b>: Difference in memory consumption from the current line to the last line. It basically denotes the memory consumed by a particular line of Python code.</li><li><b>Occurrences</b>: Number of times a particular line of code is executed.</li></ul><p id="fd15">Mem Usage can be tracked to observe the total memory occupancy by the Python interpreter, whereas the Increment column can be observed to see the memory consumption for a particular line of code. By observing the memory usage one can optimize the memory consumption to develop a production-ready code.</p><h1 id="f44f">Conclusion:</h1><p id="dbbd">Optimizing the memory consumption is as important as optimizing the time complexity of the Python code. By optimizing the memory consumption, one can speed up the execution to some extent and avoid memory crashes.</p><p id="9278">One can also try custom <code>@profile</code> decorators to specify the precision of the argument. Read the <a href="https://pypi.org/project/memory-profiler/">documentation</a> of the memory profiler module for better understanding.</p><h1 id="3c87">References:</h1><p id="e914">[1] Memory Profiler Documentation: <a href="https://pypi.org/project/memory-profiler/">https://pypi.org/project/memory-profiler/</a></p><p id="ff83"><i>Loved the article? Become a <a href="https://satyam-kumar.medium.com/membership">Medium member</a> to continue learning without limits. I’ll receive a small portion of your membership fee if you use the following link, with no extra cost to you.</i></p><div id="fbae" class="link-block"> <a href="https://satyam-kumar.medium.com/membership"> <div> <div> <h2>Join Medium with my referral link - Satyam Kumar</h2> <div><h3>As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…</h3></div> <div><p>satyam-kumar.medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*sp1Stkiu2tDeRpx8)"></div> </div> </div> </a> </div><p id="e94c" type="7">Thank You for Reading</p></article></body>

LANGCHAIN — Benchmarking Question Answering over CSV Data

Computers are good at following instructions, but not at reading your mind. — Donald Knuth

In this tutorial, we will take a deep dive into question-answering over tabular data, specifically using CSV data. We’ll cover the following topics: background motivation, initial application, initial solution, debugging with LangSmith, evaluation setup, and improved solution. Throughout this tutorial, we will use LangSmith to collect real user questions over CSV data, and employ LangSmith’s features to evaluate our question-answering system. Let’s get started by exploring how we can use LangSmith to collect and evaluate our dataset.

Background Motivation

When working with tabular (CSV) data, it can be challenging to answer natural language questions over the data. Traditional machine learning datasets typically consist of inputs and outputs, which are used to train and evaluate models. However, language model applications often lack sufficient training data and evaluation metrics. LangSmith offers a way to construct datasets for language model-based applications, making it easier to evaluate solutions. To tackle this challenge, we can gather real user questions and feedback to construct a dataset, and then use language models to evaluate correctness.

Let’s start by creating a dataset of real-world questions and ground truth answers. We can achieve this by deploying a demo application and gathering user interactions and feedback. LangSmith can help monitor user interactions and feedback, allowing us to manually review and create a dataset of interesting questions.

Initial Application

In our initial application, we decided to use the Titanic dataset — a classic example of tabular data containing a mix of numeric, categorical, and text columns. Using Streamlit, we created a simple application and gathered real user questions and feedback. By logging interactions and feedback using LangSmith, we were able to create a dataset consisting of interesting user questions.

Initial Solution

The initial solution involved addressing the challenge of dealing with text-heavy tabular data and performing natural language queries. We used a retrieval system for natural language queries, and a Python REPL or kork for more complex queries. The retrieval system utilized a vector store to match input questions using cosine similarity, whereas kork provided access to a predetermined set of functions to handle query language-based questions.

Debugging with LangSmith

As users started asking questions, feedback revealed that some areas of the initial solution needed improvement. LangSmith allowed us to inspect traces and identify issues with data formatting and limited functionality of kork. For instance, we discovered that data formatting inconsistencies affected the language model's ability to reason about the data correctly. With LangSmith's help, we fixed formatting issues and gained insights into debugging performance issues.

Evaluation Setup

With the dataset of real-world examples and insights from LangSmith, we are ready to measure our improvements. However, evaluating natural language answers is complex, as there are multiple valid ways to respond to a question. We decided to use language models to evaluate correctness, even though this approach is not perfect. LangSmith facilitated the evaluation process by leveraging language models to compare predicted answers with ground truth answers.

Improved Solution

We arrived at an improved solution that involved an agent powered by OpenAIFunctions, GPT-4, and two tools: a Python REPL and a retriever. This solution allowed for more flexible and accurate responses to user questions. We also included specific instructions in the prompt to guide the system’s decision-making process. LangSmith was instrumental in comparing the performance of our improved solution with other methods, such as the Pandas Agent and PandasAI.

In conclusion, the improved solution demonstrated positive feedback and performance. While there is always room for improvement, LangSmith played a crucial role in collecting real-world examples, debugging, and evaluating the effectiveness of our question-answering system over CSV data.

By employing LangSmith’s capabilities, you can efficiently gather, debug, and evaluate language model-based applications over CSV data, making it an essential tool for developing and refining question-answering systems.

LANGCHAIN — Benchmarking Question Answering over CSV Data

LANGCHAIN — Extraction Benchmarking

The function of good software is to make the complex appear to be simple. — Grady Booch

Background Motivation

Initial Application

Initial Solution

Debugging with LangSmith

Evaluation Setup

Improved Solution

LANGCHAIN — Deconstructing RAG

Software is a great combination between artistry and engineering. — Bill Gates