Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5243

Abstract

name">C:\Users\feng\Kafka\kraft>ls docker-compose.yml C:\Users\feng\Kafka\kraft>docker-compose up -d [+] Running 2/2

Network kraft_default Created 0.0s
Container kraft-kafka-1 Started

C:\Users\feng\Kafka\kraft>docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 54342e49a1f2 bitnami/kafka:latest "/opt/bitnami/script…" 18 seconds ago Up 17 seconds 0.0.0.0:9092->9092/tcp kraft-kafka-1 0.5s</pre></div><h2 id="095a">1.5 Create Kafka topic</h2>We’ll login to the instance and create a test topic in Kafka<div id="aaf2"><pre>## Login to Kafka docker instance C:\Users\feng\Kafka\kraft>docker exec -it kraft-kafka-1 /bin/bash $cd /opt/bitnami/kafka /opt/bitnami/kafka$ ./bin/kafka-topics.sh --version 3.4.0 (Commit:2e1947d240607d53)

## Create topic named "test_topic" /opt/bitnami/kafka$ ./bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --replication-factor 1 --partitions 2 --topic test_topic WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both. Created topic test_topic.

## List current topics /opt/bitnami/kafka$ ./bin/kafka-topics.sh --bootstrap-server localhost:9092 --list test_topic</pre></div>So by now we have a Kafka docker instance running successfully.<h1 id="80ee">2 Run sanity checks using simple producer/consumer app codes</h1><h2 id="b504">2.1 Setup producer/consumer Dev ENV</h2><div id="8e5b"><pre>## Create conda env for Kafka producer and cosumer C:\Users\feng\Kafka\kraft>conda create -n kafka_env python=3.10 ... C:\Users\feng\Kafka\kraft>conda activate kafka_env

Install kafka-python package

(kafka_env) C:\Users\feng\Kafka\kraft>pip install kafka-python ... (kafka_env) C:\Users\feng\Kafka\kraft>pip list | grep kafka kafka-python 2.0.2

Install Faker package to generate dummy messages

(kafka_env) C:\Users\feng\Kafka\kraft>pip install Faker ... (kafka_env) C:\Users6119811\Kafka\kraft>pip list | grep Faker Faker 17.3.0</pre></div><h2 id="6701">2.2 Code examples</h2>Now we can use VSCode to create producer/consumer files.Producer generate fake user info as JSON load sending to Kafka topic “test_topic”. producer.py is like following.<div id="8801"><pre>import time import json from datetime import datetime from kafka import KafkaProducer from faker import Faker

# JSON messages needs to be serialized # when sending to Kafka topic def json_serializer(message): return json.dumps(message

Options

).encode('utf-8') # Kafka Producer producer = KafkaProducer( bootstrap_servers=['localhost:9092'], value_serializer=json_serializer ) if name == 'main': fake = Faker() for i in range(0, 3): # Generate a fake JSON message name = fake.name() email = fake.email() city = fake.city() fake_message = { "name": name, "email": email, "city": city }

    <span class="hljs-comment"># Send fake JSON message to Kafka topic</span>
    <span class="hljs-built_in">print</span>(<span class="hljs-string">f'<span class="hljs-subst">{datetime.now()}</span>: Message = <span class="hljs-subst">{<span class="hljs-built_in">str</span>(fake_message)}</span>'</span>)
    producer.send(<span class="hljs-string">'test_topic'</span>, fake_message)
                                                          
    time.sleep(<span class="hljs-number">1</span>)</pre></div><p id="6594">And here is our consumer.py</p><div id="a19b"><pre><span class="hljs-keyword">import</span> json

from kafka import KafkaConsumer

if name == 'main': # Kafka Consumer consumer = KafkaConsumer( 'test_topic', bootstrap_servers='localhost:9092', auto_offset_reset='earliest' ) for message in consumer: print(json.loads(message.value))</pre></div>OK, now let’s start consumer and run producer to send some fake message for a sanity check.<div id="572f"><pre># Run producer (kafka_env) C:\Users\feng\Kafka\kraft>python producer.py 2023-02-25 18:48:41.143953: Message = {'name': 'Susan Best', 'email': '[email protected]', 'city': 'Kellytown'} 2023-02-25 18:48:42.160545: Message = {'name': 'James Wilson', 'email': '[email protected]', 'city': 'Lake Bryanfort'} 2023-02-25 18:48:43.177933: Message = {'name': 'Haley Brooks', 'email': '[email protected]', 'city': 'East Janetburgh'}

# Monitor consumer (kafka_env) C:\Users\feng\Kafka\kraft>python consumer.py {'name': 'Susan Best', 'email': '[email protected]', 'city': 'Kellytown'} {'name': 'James Wilson', 'email': '[email protected]', 'city': 'Lake Bryanfort'} {'name': 'Haley Brooks', 'email': '[email protected]', 'city': 'East Janetburgh'}</pre></div>Great, our Kafka Docker instance and simple applications are working as expected!Happy Reading!<div id="2213" class="link-block"> <a href="https://medium.com/@fengliplatform/membership"> <div> <div> <h2>Join Medium with my referral link - Feng Li</h2> <div><h3>Writing helps ourselves, sharing helps many. It started from study notes for myself with no pressure of perfection…</h3></div> <div>medium.com</div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*K9psL5RefQfuKkzr)"></div> </div> </div> </a> </div></article></body>

LANGCHAIN — Extraction Benchmarking

The function of good software is to make the complex appear to be simple. — Grady Booch

LANGCHAIN — Deconstructing RAG

Software is a great combination between artistry and engineering. — Bill Gates

medium.com

When working with large language models (LLMs), it’s essential to benchmark their performance to ensure their ability to infer correct structured information from different types of data. In this article, we’ll explore the LangChain Extraction Benchmarking project, which provides a new dataset to measure LLMs’ ability to extract and categorize relevant information from chat logs.

Creating the Dataset

The LangChain team settled on a data model to represent the structured output, seeded it with Q&A pairs, generated candidate answers using an LLM, and manually reviewed the results in the annotation queue. They used synthetic dataset generation utilities to bootstrap some initial data. Once the initial dataset was ready, they utilized labeled data as few-shot examples within the seed-generation model to improve the quality of data given to humans for review.

Extraction Schema

The dataset was designed to offer a challenge for many common models today. The schema was nested, and it combined classification, summarization, and structured output generation in a single task, making it challenging for an LLM to address in a single generation.

Evaluation

Custom LangSmith evaluators were used to measure structure verification, classification tasks, and overall difference. Metrics such as json_schema, classification accuracy, and json_edit_distance were used to evaluate the LLMs’ performance.

Experiments

The LangChain team conducted multiple experiments to compare the performance of different LLMs, both closed-source and open-source models, and to test various prompting strategies and structured decoding techniques.

Code Snippets and Examples

To see how the Claude-2 and GPT-4 models compare, you can review the individual predictions side-by-side using the provided link. Below, you can find the code snippet for comparing the two models:

# Compare GPT-4 and Claude
# Using the provided link, review the individual predictions side-by-side
# The summary graph and table below can also be checked for comparison

# GPT-4 performance
gpt_4_metrics = {
    "confidence_level_similarity": 0.94,
    "json_edit_distance": 0.28,
    "json_schema": 1.00,
    "off_topic_similarity": 0.89,
    "programming_language_similarity": 0.59,
    "question_category": 0.56,
    "sentiment_similarity": 1.00,
    "toxicity_similarity": 0.0
}

For benchmarking open-source models, the LangChain team compared the performance of three different LLMs and provided a link to see the outputs in LangSmith. Below, you can find the code snippet for comparing the open-source models:

# Compare Baseline OSS Models Test
# Using the provided link, see the outputs in LangSmith or reference the aggregate metrics below

# Llama-v2-34b-code-instruct performance
llama_metrics = {
    "confidence_level_similarity": 0.93,
    "json_edit_distance": 0.41,
    "json_schema": 0.89,
    "off_topic_similarity": 0.89,
    "programming_language_similarity": 0.44,
    "question_category": 0.07,
    "sentiment_similarity": 0.59,
    "toxicity_similarity": 1.00
}

The LangChain team also tested various prompting strategies and structured decoding techniques. Below, you can find the code snippet for comparing the prompt strategies for OSS models:

# Compare Prompt Strategies for OSS Models Test
# Using the provided link, see the outputs in LangSmith or reference the aggregate metrics below

# Llama-v2-34b-code-instruct-bcce-v1 performance
llama_prompt_metrics = {
    "Prompt": "baseline",
    "confidence_level_similarity": 0.93,
    "json_edit_distance": 0.41,
    "json_schema": 0.89,
    "off_topic_similarity": 0.89,
    "programming_language_similarity": 0.44,
    "question_category": 0.07,
    "sentiment_similarity": 0.59,
    "toxicity_similarity": 1.00
}

For the experiment on structured decoding, the LangChain team compared the baseline with grammar-based decoding. Below, you can find the code snippet for comparing the baseline vs. grammar-based decoding test:

# Compare Baseline vs. Grammar-based Decoding Test
# Using the provided link, see the outputs in LangSmith or reference the aggregate metrics below

# Llama-v2-70b-chat-28a7-v1 performance
llama_baseline_metrics = {
    "Decoding": "baseline",
    "confidence_level_similarity": 0.30,
    "json_edit_distance": 0.</p>

LANGCHAIN — Can OpenGPTs Benefit from the Addition of Long-Term Memory?

The computer was born to solve problems that did not exist before. — Bill Gates