avatarRam Vegiraju

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5902

Abstract

d="b66d">Before we can get to setting up LangChain, we can run a sample inference to understand how our input/output needs to be shaped. We also provide a few parameters that you can tune for model accuracy, here’s a quick primer below on what each parameter entails and the value ranges:</p><ul><li><b>Top_K</b>: For generated text only use the “Top K” specified words, this can be any positive integer.</li><li><b>Top_P</b>: Top P helps increase diversity into the generated text by setting a probability threshold. For example if a value of .2 is set, for the next generated text only the highest probability tokens that add up to that 20% will be considered. This value can be any float between 0 to 1.</li><li><b>Temperature</b>: Controls randomness of output, the higher the value the more low probability words that are considered. Temperature limits vary depending on the model, but this value must be a positive integer.</li></ul><p id="402c">The following linked <a href="https://medium.com/@basics.machinelearning/temperature-and-top-p-in-chatgpt-9ead9345a901">Medium article</a> is a great reference to understand these parameters farther.</p><div id="ba82"><pre>payload = <span class="hljs-punctuation">{</span> <span class="hljs-attr">"text_inputs"</span><span class="hljs-punctuation">:</span> <span class="hljs-string">"Tell me the steps to make a pizza"</span><span class="hljs-punctuation">,</span> <span class="hljs-attr">"max_length"</span><span class="hljs-punctuation">:</span> <span class="hljs-number">50</span><span class="hljs-punctuation">,</span> <span class="hljs-attr">"num_return_sequences"</span><span class="hljs-punctuation">:</span> <span class="hljs-number">3</span><span class="hljs-punctuation">,</span> <span class="hljs-attr">"top_k"</span><span class="hljs-punctuation">:</span> <span class="hljs-number">50</span><span class="hljs-punctuation">,</span> <span class="hljs-attr">"top_p"</span><span class="hljs-punctuation">:</span> <span class="hljs-number">0.95</span><span class="hljs-punctuation">,</span> <span class="hljs-attr">"do_sample"</span><span class="hljs-punctuation">:</span> True<span class="hljs-punctuation">,</span> <span class="hljs-punctuation">}</span></pre></div><p id="425e">Using the sample payload we can invoke our created endpoint with the following code (endpoint creation will take ~5 minutes in this case).</p><div id="4e96"><pre><span class="hljs-keyword">import</span> json client = boto3.client(<span class="hljs-string">"runtime.sagemaker"</span>) encoded_payload = json.dumps(payload).encode(<span class="hljs-string">'utf-8'</span>) <span class="hljs-comment">#JSON serialization</span> response = client.invoke_endpoint( EndpointName=endpoint_name, ContentType=<span class="hljs-string">"application/json"</span>, Body=encoded_payload ) model_predictions = json.loads(response[<span class="hljs-string">"Body"</span>].read()) model_predictions[<span class="hljs-string">'generated_texts'</span>][<span class="hljs-number">0</span>]</pre></div><p id="d3db">Now that we have our hosted LLM we can focus on LangChain integration.</p><h1 id="1221">LangChain Integration</h1><p id="73e3">LangChain is a popular open-source framework that has made it simple to build GenerativeAI applications. Why is LangChain so useful? It simplifies a lot of the common constructs and challenges that come into play when operationalizing LLMs. If you are new to LangChain please refer to this amazing starter article on <a href="https://readmedium.com/getting-started-with-langchain-a-beginners-guide-to-building-llm-powered-applications-95fc8898732c">Medium</a>.</p><p id="9a71">Specifically today we will utilize three LangChain constructs:</p><h2 id="5590">1. Prompts</h2><p id="d6d6">A large part of GenAI applications is something known as Prompt Engineering. Prompt Engineering is the science of tuning and developing prompts in the most efficient manner for models to understand. To help with this, the <a href="https://docs.langchain.com/docs/components/prompts/">Prompt object</a> in LangChain can be utilized to customize and inject your prompts with input variables.</p><div id="73d6"><pre><span class="hljs-keyword">from</span> langchain.prompts <span class="hljs-keyword">import</span> PromptTemplate

<span class="hljs-comment"># In this instance we are just passing in the question for the prompt for our chain</span> prompt_template = <span class="hljs-string">"""{question}"""</span>

prompt = PromptTemplate( template=prompt_template, input_variables=[<span class="hljs-string">"question"</span>] )</pre></div><p id="0b68">In our example above we don’t provide anything other than our question, but for other use-cases you may have supporting text that augments your input variables to shape your prompt.</p><h2 id="cc27">2. Models:</h2><p id="3f55">LangChain natively supports a variety of different model providers for LLMs including OpenAI, HuggingFace, and in our case a SageMaker Endpoint construct. Note that this can be any SageMaker Endpoint, you are not limited to using just SageMaker JumpStart for your deployment.</p><p id="377c">To integrate SageMaker Real-Time Inference we first import the necessary LangChain classes to work with our JumpStart Endpoint.</p><div id="1a7e"><pre><span class="hljs-keyword">from</span> langchain <span class="hljs-keyword">import</span> SagemakerEndpoint <span class="hljs-keyword">from</span> langchain.llms.sagemaker_endpoint <span class="hljs-keyword">import</span> LLMContentHandler</pre></div><p id="4c8e">LangChain expects a <a href="https://api.python.langchain.com/en/latest/llms/langchain.llms.sagemaker_endpoint.LLMContentHandler.html">ContentHandler</a> class that helps shape the input and output of our SageMaker Endpoint with two methods: transform_input and transform_output. Using our sample inference we can understand how

Options

we need to serialize and deserialize our data in the corresponding methods.</p><p id="e075">We can also optionally pass in model params (captured as model_kwargs) for our transform_input method to parse. What we return from our transform_input method should match what we passed into our invoke_endpoint API call with our SageMaker endpoint.</p><div id="2f16"><pre><span class="hljs-keyword">class</span> <span class="hljs-title class_">ContentHandler</span>(<span class="hljs-title class_ inherited__">LLMContentHandler</span>): content_type = <span class="hljs-string">"application/json"</span> accepts = <span class="hljs-string">"application/json"</span>

<span class="hljs-keyword">def</span> <span class="hljs-title function_">transform_input</span>(<span class="hljs-params">self, prompt: <span class="hljs-built_in">str</span>, model_kwargs: <span class="hljs-built_in">dict</span></span>) -&gt; <span class="hljs-built_in">bytes</span>:
    input_str = json.dumps({<span class="hljs-string">"text_inputs"</span>: prompt, **model_kwargs}).encode(<span class="hljs-string">'utf-8'</span>)
    <span class="hljs-keyword">return</span> input_str

<span class="hljs-keyword">def</span> <span class="hljs-title function_">transform_output</span>(<span class="hljs-params">self, output: <span class="hljs-built_in">str</span></span>) -&gt; <span class="hljs-built_in">str</span>:
    response_json = json.loads(output.read().decode(<span class="hljs-string">"utf-8"</span>))
    <span class="hljs-keyword">return</span> response_json[<span class="hljs-string">"generated_texts"</span>][<span class="hljs-number">0</span>]

content_handler = ContentHandler()</pre></div><p id="87ff">After instantiating our content handler class we can focus on creating our LangChain SageMaker object. We also pass in model params, that you can toggle and tune for performance.</p><div id="30dc"><pre>model_params = {<span class="hljs-string">"max_length"</span>: <span class="hljs-number">100</span>, <span class="hljs-string">"num_return_sequences"</span>: <span class="hljs-number">1</span>, <span class="hljs-string">"top_k"</span>: <span class="hljs-number">100</span>, <span class="hljs-string">"top_p"</span>: <span class="hljs-number">.95</span>, <span class="hljs-string">"do_sample"</span>: <span class="hljs-literal">True</span>}

llm = SagemakerEndpoint( endpoint_name=endpoint_name, region_name=<span class="hljs-string">"us-east-1"</span>, model_kwargs=model_params, content_handler=content_handler, )</pre></div><h2 id="283f">3. Chains:</h2><p id="7d7d">Our last construct is our Chain, which essentially takes the above LangChain constructs and puts it into a workflow that can simply take an input. In our case we use the ready-made <a href="https://docs.langchain.com/docs/components/chains/llm-chain">LLM Chain</a> that can take a prompt and model, but you can also make a <a href="https://python.langchain.com/docs/modules/chains/how_to/custom_chain">Custom Chain</a> as well if needed.</p><div id="2b7d"><pre>chain = LLMChain( llm=llm, prompt=prompt)

<span class="hljs-comment"># Execute chain</span> sample_prompt = <span class="hljs-string">"Tell me the steps to make a pizza"</span> chain.run(sample_prompt)</pre></div><figure id="0cba"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ZK_EvXgG4xQxCklmkTIS_w.png"><figcaption>Sample Inference (Screenshot by Author)</figcaption></figure><h1 id="8785">Additional Resources & Conclusion</h1><div id="445c" class="link-block"> <a href="https://github.com/RamVegiraju/LangChain-Samples/blob/master/LangChain-SageMaker-Integration/langchain-sagemaker.ipynb"> <div> <div> <h2>LangChain-Samples/LangChain-SageMaker-Integration/langchain-sagemaker.ipynb at master ·…</h2> <div><h3>Examples integrating with LangChain for GenAI. Contribute to RamVegiraju/LangChain-Samples development by creating an…</h3></div> <div><p>github.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*ypHb8fSCPOcxZSqt)"></div> </div> </div> </a> </div><p id="462e">The entire code for the example can be found at the link above. Operationalizing LLMs and building scalable GenerativeAI applications is a challenging task and this solution utilizing SageMaker Inference with LangChain is one such manner of tackling this domain. In coming articles we will explore how we can continue to build and scale Generative AI applications.</p><p id="cd5b">As always thank you for reading and feel free to leave any feedback.</p><p id="ec3d"><i>If you enjoyed this article feel free to connect with me on <a href="https://www.linkedin.com/in/ram-vegiraju-81272b162/">LinkedIn</a> and subscribe to my Medium <a href="https://ram-vegiraju.medium.com/subscribe">Newsletter</a>.</i></p><h1 id="0e83">In Plain English</h1><p id="c050"><i>Thank you for being a part of our community! Before you go:</i></p><ul><li><i>Be sure to <b>clap</b> and <b>follow</b> the writer! 👏</i></li><li><i>You can find even more content at <a href="https://plainenglish.io/"><b>PlainEnglish.io</b></a><b> 🚀</b></i></li><li><i>Sign up for our <a href="http://newsletter.plainenglish.io/"><b>free weekly newsletter</b></a>. 🗞️</i></li><li><i>Follow us on <a href="https://twitter.com/inPlainEngHQ"><b>Twitter</b></a><b>(X</b></i>), <a href="https://www.linkedin.com/company/inplainenglish/"><b><i>LinkedIn</i></b></a>, <a href="https://www.youtube.com/channel/UCtipWUghju290NWcn8jhyAw"><b><i>YouTube</i></b></a>, and <a href="https://discord.gg/XxRS92b2"><b><i>Discord</i></b></a><b><i>.</i></b></li></ul></article></body>

Integrating LangChain with SageMaker JumpStart to Operationalize LLM Applications

Building LLM-Driven Workflows

Image from Unsplash

Large Language Models (LLMs) continue to take the world by storm. Hosting these models is a challenging task as we’ve explored in my previous articles. The next challenge is operationalizing these hosted LLMs in larger real-world applications.

To solve these two problems we have a pair of respective tools that we will work with today’s article:

  1. SageMaker JumpStart: SageMaker’s Model Zoo, here you can select from an available list of popular pre-trained models and deploy directly to SageMaker Inference using API calls via the SageMaker Python SDK.
  2. LangChain: An open source framework that helps users build GenerativeAI applications by enabling tools to simplify Prompt Engineering and Retrieval Augmented Generation (RAG). LangChain will help us create a “Chain” where our prompt is directly connected to our LLM, which in this case is our SageMaker JumpStart Endpoint.

For today’s article we will explore how we can specifically work with both of these tools to build a mock LLM workflow.

NOTE: This article assumes an intermediate understanding of SageMaker Deployment and Real-Time Inference in particular. I would suggest following this article for understanding Deployment/Inference more in depth. We also cover SageMaker JumpStart, but to understand the Foundational Model Deployment further please refer to this blog.

DISCLAIMER: I am a Machine Learning Architect at AWS and my opinions are my own.

SageMaker JumpStart Deployment

For this article we’ll be utilizing SageMaker JumpStart to deploy a Flan-T5 model on a Real-Time SageMaker Endpoint. Why and when to use SageMaker JumpStart for LLM Deployment?

  • If you are happy with a base model’s performance, JumpStart makes it easy to deploy these LLMs via simple API calls.
  • Certain Foundational Models are also enabled to support fine-tuning to help boost model accuracy and performance.

To get started with SageMaker JumpStart we can first go to the SageMaker Console (UI). Here you should notice a tab for JumpStart and the different Foundation Models that are available.

SageMaker Foundation Models (Screenshot by Author)

Here we can toggle for any models that we are interested in and observe the model card for more information such as fine-tuning support, a mock playground for inference, and more.

Model Card (Screenshot by Author)

By default a sample notebook is also generated for you that you can use as a boilerplate. For today’s example we keep it simple and grab the ready made notebook from the following Github link.

To interact with SageMaker JumpStart we utilize the SageMaker Python SDK and provide a model_id and model_version so the right metadata for the model is retrieved.

from sagemaker.jumpstart.model import JumpStartModel

model_id, model_version = (
    "huggingface-text2text-flan-t5-xl",
    "*",
)

We can then deploy our JumpStart model as we would with other SageMaker endpoints, the main difference here is that by default an instance type is selected for you based off of the existing knowledge of the foundation model.

Due to the varying sizes and computational requirements of these larger models only specific instances can be used to host these models. To understand further you can also check the model cards to see what other instances are supported for your model if you would like to override the default value.

model = JumpStartModel(model_id=model_id, model_version=model_version)
model_predictor = model.deploy()
Default Real-Time Deployment (Screenshot by Author)

Before we can get to setting up LangChain, we can run a sample inference to understand how our input/output needs to be shaped. We also provide a few parameters that you can tune for model accuracy, here’s a quick primer below on what each parameter entails and the value ranges:

  • Top_K: For generated text only use the “Top K” specified words, this can be any positive integer.
  • Top_P: Top P helps increase diversity into the generated text by setting a probability threshold. For example if a value of .2 is set, for the next generated text only the highest probability tokens that add up to that 20% will be considered. This value can be any float between 0 to 1.
  • Temperature: Controls randomness of output, the higher the value the more low probability words that are considered. Temperature limits vary depending on the model, but this value must be a positive integer.

The following linked Medium article is a great reference to understand these parameters farther.

payload = {
    "text_inputs": "Tell me the steps to make a pizza",
    "max_length": 50,
    "num_return_sequences": 3,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": True,
}

Using the sample payload we can invoke our created endpoint with the following code (endpoint creation will take ~5 minutes in this case).

import json
client = boto3.client("runtime.sagemaker")
encoded_payload = json.dumps(payload).encode('utf-8') #JSON serialization
response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/json", Body=encoded_payload
    )
model_predictions = json.loads(response["Body"].read())
model_predictions['generated_texts'][0]

Now that we have our hosted LLM we can focus on LangChain integration.

LangChain Integration

LangChain is a popular open-source framework that has made it simple to build GenerativeAI applications. Why is LangChain so useful? It simplifies a lot of the common constructs and challenges that come into play when operationalizing LLMs. If you are new to LangChain please refer to this amazing starter article on Medium.

Specifically today we will utilize three LangChain constructs:

1. Prompts

A large part of GenAI applications is something known as Prompt Engineering. Prompt Engineering is the science of tuning and developing prompts in the most efficient manner for models to understand. To help with this, the Prompt object in LangChain can be utilized to customize and inject your prompts with input variables.

from langchain.prompts import PromptTemplate

# In this instance we are just passing in the question for the prompt for our chain
prompt_template = """{question}"""

prompt = PromptTemplate(
    template=prompt_template, input_variables=["question"]
)

In our example above we don’t provide anything other than our question, but for other use-cases you may have supporting text that augments your input variables to shape your prompt.

2. Models:

LangChain natively supports a variety of different model providers for LLMs including OpenAI, HuggingFace, and in our case a SageMaker Endpoint construct. Note that this can be any SageMaker Endpoint, you are not limited to using just SageMaker JumpStart for your deployment.

To integrate SageMaker Real-Time Inference we first import the necessary LangChain classes to work with our JumpStart Endpoint.

from langchain import SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import LLMContentHandler

LangChain expects a ContentHandler class that helps shape the input and output of our SageMaker Endpoint with two methods: transform_input and transform_output. Using our sample inference we can understand how we need to serialize and deserialize our data in the corresponding methods.

We can also optionally pass in model params (captured as model_kwargs) for our transform_input method to parse. What we return from our transform_input method should match what we passed into our invoke_endpoint API call with our SageMaker endpoint.

class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"
    
    def transform_input(self, prompt: str, model_kwargs: dict) -> bytes:
        input_str = json.dumps({"text_inputs": prompt, **model_kwargs}).encode('utf-8')
        return input_str

    def transform_output(self, output: str) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json["generated_texts"][0]

content_handler = ContentHandler()

After instantiating our content handler class we can focus on creating our LangChain SageMaker object. We also pass in model params, that you can toggle and tune for performance.

model_params = {"max_length": 100,
                "num_return_sequences": 1,
                "top_k": 100,
                "top_p": .95,
                "do_sample": True}

llm = SagemakerEndpoint(
        endpoint_name=endpoint_name,
        region_name="us-east-1",
        model_kwargs=model_params,
        content_handler=content_handler,
    )

3. Chains:

Our last construct is our Chain, which essentially takes the above LangChain constructs and puts it into a workflow that can simply take an input. In our case we use the ready-made LLM Chain that can take a prompt and model, but you can also make a Custom Chain as well if needed.

chain = LLMChain(
llm=llm, prompt=prompt)

# Execute chain
sample_prompt = "Tell me the steps to make a pizza"
chain.run(sample_prompt)
Sample Inference (Screenshot by Author)

Additional Resources & Conclusion

The entire code for the example can be found at the link above. Operationalizing LLMs and building scalable GenerativeAI applications is a challenging task and this solution utilizing SageMaker Inference with LangChain is one such manner of tackling this domain. In coming articles we will explore how we can continue to build and scale Generative AI applications.

As always thank you for reading and feel free to leave any feedback.

If you enjoyed this article feel free to connect with me on LinkedIn and subscribe to my Medium Newsletter.

In Plain English

Thank you for being a part of our community! Before you go:

AWS
Sagemaker
Langchain
Generative Ai Tools
Llm
Recommended from ReadMedium