avatarLaxfed Paulacy

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

1901

Abstract

<h2 id="e901">Github as our dataset</h2><div id="1a20"><pre><span class="hljs-keyword">from</span> langchain.document_loaders.git import GitLoader <span class="hljs-keyword">from</span> git import Repo

repo_path = <span class="hljs-string">"./data/label-studio-repo"</span> repo = Repo.clone_from(<span class="hljs-string">"https://github.com/HumanSignal/label-studio"</span>, <span class="hljs-attribute">to_path</span>=repo_path) branch = repo.head.reference loader = GitLoader(<span class="hljs-attribute">repo_path</span>=repo_path, <span class="hljs-attribute">branch</span>=branch, <span class="hljs-attribute">file_filter</span>=lambda f: f.endswith(<span class="hljs-string">'.md'</span>))

data = loader.load()</pre></div><h2 id="86ef">LLM Embeddings for Documents</h2><div id="9a98"><pre><span class="hljs-title">from</span> langchain.text_splitter <span class="hljs-keyword">import</span> MarkdownTextSplitter <span class="hljs-title">from</span> langchain.embeddings <span class="hljs-keyword">import</span> OpenAIEmbeddings <span class="hljs-title">from</span> langchain.vectorstores <span class="hljs-keyword">import</span> Chroma

<span class="hljs-title">text_splitter</span> = <span class="hljs-type">MarkdownTextSplitter</span>(chunk_size = <span class="hljs-number">500</span>, chunk_overlap = <span class="hljs-number">0</span>) <span class="hljs-title">all_splits</span> = text_splitter.split_documents(<span class="hljs-class"><span class="hljs-keyword">data</span>)</span>

<span class="hljs-title">vectorstore</span> = <span class="hljs-type">Chroma</span>.from_documents(documents=all_splits, embedding=<span class="hljs-type">OpenAIEmbeddings</span>())</pre></div><h2 id="bd03">Step 2: Capture User Interactions</h2><p id="d1ac">Implement a Label Studio callback in LangChain to capture user questions and responses.</p><h2 id="1b42">Step 3: Annotate the QA application’s pe

Options

rformance</h2><p id="8e81">Use Label Studio to refine the data with human expertise, allowing for consensus gathering, applying filters, and customizing labeling templates.</p><h2 id="f08e">Step 4: Gauging Quality</h2><p id="9f3b">Measure the quality of the system by analyzing labeled data to compute accuracy, percentage of irrelevant questions, etc.</p><h2 id="98cf">Step 5: Improve the system</h2><p id="0c8f">Incorporate user-driven feedback to enhance the QA system, integrating positively reviewed responses into the document database.</p><p id="cb20">This workflow empowers continuous improvement of LLM applications, ensuring they align with specific domains and user expectations.</p><h2 id="ec08">Conclusion</h2><p id="2c67">The discussed iterative approach showcases the importance of fine-tuning LLMs for specific application requirements, overcoming biases, and ensuring domain-specific accuracy. Label Studio, LangChain, and LLMs together pave the way for powerful, precise, and reliable AI systems that meet user expectations.</p><div id="fa56" class="link-block"> <a href="https://readmedium.com/langchain-what-is-the-quadrant-x-langchain-endgame-performance-e73fbd030df6"> <div> <div> <h2>LANGCHAIN — What Is the Quadrant X LangChain Endgame Performance?</h2> <div><h3>The computer was born to solve problems that did not exist before. — Bill Gates</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*nu7ZXSdSXeo6aCLEJYoZpg.jpeg)"></div> </div> </div> </a> </div><p id="0b20">This tutorial provides a comprehensive guide for leveraging Label Studio, LangChain, and LLMs to continuously enhance AI solutions.</p></article></body>

LANGCHAIN — From Foundation Models to Fine-Tuned Applications Using Label Studio

The function of good software is to make the complex appear to be simple. — Grady Booch

Large language models (LLMs) have revolutionized AI-driven applications, but ensuring their quality and relevance is crucial. This article discusses how to fine-tune LLMs for specific applications using Label Studio, LangChain, and LLMs from OpenAI. It includes a detailed workflow for building a question-answering (QA) system using Label Studio to continuously improve LLM applications.

Label Studio: Your LLM Tuner

Label Studio is a crucial platform for improving large language models and their applications. It captures and annotates user interactions, providing insights into the models’ performance and areas needing adjustments.

Putting It Into Action

Step 1: Building a Simple QA System

Github as our dataset

from langchain.document_loaders.git import GitLoader
from git import Repo

repo_path = "./data/label-studio-repo"
repo = Repo.clone_from("https://github.com/HumanSignal/label-studio", to_path=repo_path)
branch = repo.head.reference
loader = GitLoader(repo_path=repo_path, branch=branch, file_filter=lambda f: f.endswith('.md'))

data = loader.load()

LLM Embeddings for Documents

from langchain.text_splitter import MarkdownTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

text_splitter = MarkdownTextSplitter(chunk_size = 500, chunk_overlap = 0)
all_splits = text_splitter.split_documents(data)

vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())

Step 2: Capture User Interactions

Implement a Label Studio callback in LangChain to capture user questions and responses.

Step 3: Annotate the QA application’s performance

Use Label Studio to refine the data with human expertise, allowing for consensus gathering, applying filters, and customizing labeling templates.

Step 4: Gauging Quality

Measure the quality of the system by analyzing labeled data to compute accuracy, percentage of irrelevant questions, etc.

Step 5: Improve the system

Incorporate user-driven feedback to enhance the QA system, integrating positively reviewed responses into the document database.

This workflow empowers continuous improvement of LLM applications, ensuring they align with specific domains and user expectations.

Conclusion

The discussed iterative approach showcases the importance of fine-tuning LLMs for specific application requirements, overcoming biases, and ensuring domain-specific accuracy. Label Studio, LangChain, and LLMs together pave the way for powerful, precise, and reliable AI systems that meet user expectations.

This tutorial provides a comprehensive guide for leveraging Label Studio, LangChain, and LLMs to continuously enhance AI solutions.

Foundation
Label
Langchain
Using
Fine Tuned
Recommended from ReadMedium