Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

lass="hljs-keyword">import { HfInference } from '@huggingface/inference'; import { HuggingFaceStream, StreamingTextResponse } from 'ai';</pre></div><div id="5b29"><pre>// Create a new Hugging Face Inference instance const Hf = new HfInference(process.env.HUGGINGFACE_API_KEY);</pre></div><div id="36b4"><pre>// IMPORTANT! Set the runtime to edge export const runtime = 'edge';</pre></div><div id="1506"><pre>export async function POST(req: Request) { // Extract the prompt from the body of the request const { prompt } = await req.json();</pre></div><div id="54ec"><pre> const response = await Hf.textGenerationStream({ model: 'OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5', inputs: <|prompter|>${prompt}<|endoftext|><|assistant|>, parameters: { max_new_tokens: 200, // @ts-ignore (this is a valid parameter specifically in OpenAssistant models) typical_p: 0.2, repetition_penalty: 1, truncate: 1000, return_full_text: false } });</pre></div><div id="b056"><pre> // Convert the response into a friendly text-stream const stream = HuggingFaceStream(response);</pre></div><div id="255d"><pre> // Respond with the stream return new StreamingTextResponse(stream); }</pre></div>NextJS’ latest app directory structure allows users to specify routes in a file directory structure in the root directory which is the app directory.So for example if you have hyperlink that desires to route to an about section then you will have to create a folder called about inside the app directory and inside of it you will have to create a file named page.js or page.ts and write the logic inside of it.For API routes, you define them in a special directory called api and inside you follow the same format as above. We are declaring an endpoint called ‘api/completion’ hence we will have to create an api directory inside app and a completion directory inside of api and a route.js file inside completion. Notice we are defining a route.js file instead of page.js file because this exposes a REST API endpoint.In this file, we first import the HFInference class which is a wrapper for Huggingface Inference API followed by HuggingFaceStream and StreamingTextResponse from vercel ai sdk package which are helper functions to enable streaming response.We then create a new instance of ther HfInference class by passing the HUGGINGFACE_API_KEY defined in the .env file.The export const runtime = ‘edge’ line is very important as it enables Streaming in our application and lets the inference happen on edge devices such as CDNs. This will make the app run on edge networks which is faster and closer to the user than the default runtime.We will then define an asynchronous POST function that handles post requests and exports it as default request handler for this API route. The function takes a request parameter which is an instance of http.IncomingMessage class and contains information about the incoming request which in our case is the User’s query.We then extract the ‘prompt’ property from the body oif the request which is a JSON object. The await keyword is sued to wait for the promise returned by the req.json() method to resolve which converts the request body into a Javascript object.We fetch the response from the Hf.textGenerationStream object which holds the model and some parameters associated with text generation.The TextGenerationStream instance takes in some parameters and we shall go over them one by one.<ul><li>max_new_tokens: 200, Specifies the number of

Options

tokens to generate, which limits the length of the response. In this case it is 200 tokens which is about 50 words. You can play around with it for more verbose response.</li><li>typical_p : 0.2, Specifies the probability of generating a typical token which controls the diversity of the response. A lower value means more typical tokens and less diversity while a higher value means less typical tokens and more diversity. In this case it is 0.2 which means that 20% of the tokens will be typical and 80% will be diverse.</li><li>repetition_penalty: 1, Specifies the penalty applied to tokens that are repeated in the response, which reduces the likelihood of generating repetitive text. A value of 1 means no penalty, while a value greater than 1 means more penalty. In this case, it is 1, which means no penalty.</li><li>truncate: 100 — This parameter specifies the maximum number of characters to keep in the response, which truncates any excess text. In this case, it is 1000 characters, which is about 250 words.</li><li>return_full_text: false, This parameter specifies whether to return the full text or only the generated text as the response. A value of true means full text, while a value of false means only generated text. In this case, it is false, which means only generated text. -</li></ul><h1 id="0797">For the ML Geeks! (Optional Read)</h1><ul><li><a href="https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5">The dataset that was used to fine-tune the model is called Open Assistant Conversations (OASST1), which is a crowdsourced “human-generated, human-annotated assistant-style conversation corpus”1</a>. The dataset contains over 100,000 dialog turns from 10,000 conversations, covering various topics and tasks, such as weather, news, trivia, jokes, etc. <a href="https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5">The dataset also includes labels for user intents, assistant actions, and dialog acts</a>.</li><li><a href="https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5/blob/main/README.md?code=true">The architecture of the model is based on a Pythia 12B model, which is a large-scale language model trained on the Pile dataset</a>. The Pile dataset is a collection of 22 diverse and high-quality text sources, such as books, Wikipedia, news articles, GitHub repositories, etc. <a href="https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5/blob/main/README.md?code=true">The Pythia 12B model has 12 billion parameters and uses a GPT-3-like architecture with 96 layers, 96 attention heads, and a hidden size of 12,288</a>.</li><li><a href="https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5">The model was fine-tuned on the OASST1 dataset using the Open-Assistant/model/model_training code</a>, which uses DeepSpeed and PyTorch to optimize the training process. The model was trained for 8 epochs with a learning rate of 6e-6 and a batch size of 4 per GPU. <a href="https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5">The model also used flash attention, which is a technique that reduces the memory and computation cost of attention by using hashing and compression</a>. <a href="https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5">The model achieved an average perplexity of 9.8 on the validation set</a>.</li></ul><h1 id="b776">The Frontend</h1>This code is a React component that renders a simple chat interface that uses the useCompletion hook from the ai/react package to get text completion results from the Hugging Face Inference API.The ‘use client’ directive indicates that this file is a client-side only file, which means that it will not be included in the server-side bundle. This is useful for performance optimization and security reasons.This code exports a default function Completion() which is a React functional component that returns a JSX which is a syntax extension that allows us to write HTML like elements in Javascript.The const {completion, input, stop, isLoading, handleInputChange, handleSubmiut } = useCompletion({api: ‘api/completion’}) uses object destructuring to assign the values returned by the useCompletion hook to constant variables with the same names. The useCompletion hook takes an object as an argument, which specifies the API endpoint that accepts a {prompt: string} object and returns a stream of tokens for the AI completion response. In this case the api endpoint is the file structure based api endpoint which we wrote earlier.Next up we use standard Tailwind CSS to draw the Chatbox and call the functional component in the rendered HTML.The end result will look something like this:Finall you can deploy the application for free on vercel platform from the command line using these commands<div id="5960"><pre>npm install vercel vercel</pre></div>You will have to create an accound on vercel and authenticate and follow the instructions that follow. Alternatively you can push the code on github and go to vercel dashboard and simply click deploy to deploy the application.Checkout the complete code here.<a href="https://github.com/AshwinRachha/BookReader">https://github.com/AshwinRachha/BookReader</a></article></body>

Next-Gen Chatbots: Building a Simple Chatbot with NextJS and Huggingface for FREE.

We are building an AI application arent we? So why NextJS?

NextJS is a React framework that enables you to build fast and user-friendly web applications. NextJS has many features that make it a perfect choice for modern Saas applications such as Server side rendering, Static site generation, hybrid rendering, automatic code splitting, built-in routing, image optimization and fast refresh. Combined with the desire to incorporate LLM based APIs into apps today, NextJS is a better option compared to other frontend frameworks as it provides features like:

Edge Functions : NextJS allows users to run serverless functions on the edge networks such as CDNs which reduce latency and improve performance for AI powered apps which demand faster response times and instant output. Edge functions in NextJS can be used via vercel ai sdk which has integerations for various providers such as OpenAI, HuggingFace, Anthropic or Langchain.
Streaming Responses : NextJS enables you to stream text messages from AI models to your frontend using the StreamingTextResponse class from the vercel ai sdk. This creates a more engaging and natural user experience for chatbot applications as the user can see the response being rendered in real time.
React Server Components : NextJS supports React Server components (RSCs) which allow you to redner components on the server and stream them to a client. This can improve performance, reduce bundle size while deployment and enable access to server-side data sources for your AI-powered apps.

The HuggingFace Inference API

The Huggingface Inference API is a service provided by Huggingface that allows users to run and query Transformer Based models on HuggingFace’s Infrastructure. The machines provided by huggingface for deployment are available for free (basic tier machines) as well as paid machines with a higher specifications in terms of memory and compute power. Some of the tasks that you can use the Inference API for are:

Text Generation — create text based on a given input such as a prompt, a keyword or context.
Text Classification — assign a label or a category to a given piece of text out of a given set of predefined classes.
Zero Shot Classification — assign a label or a category to a give text without any training data, using natural language descriptions of the labels.
Feature extraction — extract numerical representations of a given text such as embeddings, vectors or tensors.

The project setup

To create a NextJS app using create-next-app, you need to have Node.js and npm installed on your system. Then, you can run the following command in your terminal:

npx create-next-app chatapp

This will create a new directory called my-app with the basic NextJS files and dependencies. You will then be prompted to answer some basic questions about the structure of your repo and the dependencies and configurations that you want to setup. For this project follow these steps:

Typescript - no
Eslint - yes
tailwind css - yes
src directory - no
app directory - yes
import alias - no

To install the dependencies: @huggingface/inference and ai, you need to navigate to your project directory and run the following command in your terminal:

npm install @huggingface/inference ai

This will add the @huggingface/inference and ai packages to your package.json file and download them to your node_modules folder .

To set up the environment variables: HUGGINGFACE_API_KEY you need to create a file called .env in the root of your project directory and add the following lines:

HUGGINGFACE_API_KEY=your_api_key

You need to replace your_api_key with your actual Hugging Face API key, which you can get from here after logging in to your account.

The Backend Route

import { HfInference } from '@huggingface/inference';
import { HuggingFaceStream, StreamingTextResponse } from 'ai';

// Create a new Hugging Face Inference instance
const Hf = new HfInference(process.env.HUGGINGFACE_API_KEY);

// IMPORTANT! Set the runtime to edge
export const runtime = 'edge';

export async function POST(req: Request) {
	// Extract the `prompt` from the body of the request
	const { prompt } = await req.json();

	const response = await Hf.textGenerationStream({
		model: 'OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5',
		inputs: `<|prompter|>${prompt}<|endoftext|><|assistant|>`,
		parameters: {
			max_new_tokens: 200,
			// @ts-ignore (this is a valid parameter specifically in OpenAssistant models)
			typical_p: 0.2,
			repetition_penalty: 1,
			truncate: 1000,
			return_full_text: false
		}
	});

	// Convert the response into a friendly text-stream
	const stream = HuggingFaceStream(response);

	// Respond with the stream
	return new StreamingTextResponse(stream);
}

NextJS’ latest app directory structure allows users to specify routes in a file directory structure in the root directory which is the app directory.

So for example if you have hyperlink that desires to route to an about section then you will have to create a folder called about inside the app directory and inside of it you will have to create a file named page.js or page.ts and write the logic inside of it.

For API routes, you define them in a special directory called api and inside you follow the same format as above. We are declaring an endpoint called ‘api/completion’ hence we will have to create an api directory inside app and a completion directory inside of api and a route.js file inside completion. Notice we are defining a route.js file instead of page.js file because this exposes a REST API endpoint.

In this file, we first import the HFInference class which is a wrapper for Huggingface Inference API followed by HuggingFaceStream and StreamingTextResponse from vercel ai sdk package which are helper functions to enable streaming response.

We then create a new instance of ther HfInference class by passing the HUGGINGFACE_API_KEY defined in the .env file.

The export const runtime = ‘edge’ line is very important as it enables Streaming in our application and lets the inference happen on edge devices such as CDNs. This will make the app run on edge networks which is faster and closer to the user than the default runtime.

We will then define an asynchronous POST function that handles post requests and exports it as default request handler for this API route. The function takes a request parameter which is an instance of http.IncomingMessage class and contains information about the incoming request which in our case is the User’s query.

We then extract the ‘prompt’ property from the body oif the request which is a JSON object. The await keyword is sued to wait for the promise returned by the req.json() method to resolve which converts the request body into a Javascript object.

We fetch the response from the Hf.textGenerationStream object which holds the model and some parameters associated with text generation.

The TextGenerationStream instance takes in some parameters and we shall go over them one by one.

max_new_tokens: 200, Specifies the number of tokens to generate, which limits the length of the response. In this case it is 200 tokens which is about 50 words. You can play around with it for more verbose response.
typical_p : 0.2, Specifies the probability of generating a typical token which controls the diversity of the response. A lower value means more typical tokens and less diversity while a higher value means less typical tokens and more diversity. In this case it is 0.2 which means that 20% of the tokens will be typical and 80% will be diverse.
repetition_penalty: 1, Specifies the penalty applied to tokens that are repeated in the response, which reduces the likelihood of generating repetitive text. A value of 1 means no penalty, while a value greater than 1 means more penalty. In this case, it is 1, which means no penalty.
truncate: 100 — This parameter specifies the maximum number of characters to keep in the response, which truncates any excess text. In this case, it is 1000 characters, which is about 250 words.
return_full_text: false, This parameter specifies whether to return the full text or only the generated text as the response. A value of true means full text, while a value of false means only generated text. In this case, it is false, which means only generated text. -

For the ML Geeks! (Optional Read)

The dataset that was used to fine-tune the model is called Open Assistant Conversations (OASST1), which is a crowdsourced “human-generated, human-annotated assistant-style conversation corpus”1. The dataset contains over 100,000 dialog turns from 10,000 conversations, covering various topics and tasks, such as weather, news, trivia, jokes, etc. The dataset also includes labels for user intents, assistant actions, and dialog acts.
The architecture of the model is based on a Pythia 12B model, which is a large-scale language model trained on the Pile dataset. The Pile dataset is a collection of 22 diverse and high-quality text sources, such as books, Wikipedia, news articles, GitHub repositories, etc. The Pythia 12B model has 12 billion parameters and uses a GPT-3-like architecture with 96 layers, 96 attention heads, and a hidden size of 12,288.
The model was fine-tuned on the OASST1 dataset using the Open-Assistant/model/model_training code, which uses DeepSpeed and PyTorch to optimize the training process. The model was trained for 8 epochs with a learning rate of 6e-6 and a batch size of 4 per GPU. The model also used flash attention, which is a technique that reduces the memory and computation cost of attention by using hashing and compression. The model achieved an average perplexity of 9.8 on the validation set.

The Frontend

This code is a React component that renders a simple chat interface that uses the useCompletion hook from the ai/react package to get text completion results from the Hugging Face Inference API.

The ‘use client’ directive indicates that this file is a client-side only file, which means that it will not be included in the server-side bundle. This is useful for performance optimization and security reasons.

This code exports a default function Completion() which is a React functional component that returns a JSX which is a syntax extension that allows us to write HTML like elements in Javascript.

The const {completion, input, stop, isLoading, handleInputChange, handleSubmiut } = useCompletion({api: ‘api/completion’}) uses object destructuring to assign the values returned by the useCompletion hook to constant variables with the same names. The useCompletion hook takes an object as an argument, which specifies the API endpoint that accepts a {prompt: string} object and returns a stream of tokens for the AI completion response. In this case the api endpoint is the file structure based api endpoint which we wrote earlier.

Next up we use standard Tailwind CSS to draw the Chatbox and call the functional component in the rendered HTML.

The end result will look something like this:

Finall you can deploy the application for free on vercel platform from the command line using these commands

npm install vercel
vercel

You will have to create an accound on vercel and authenticate and follow the instructions that follow. Alternatively you can push the code on github and go to vercel dashboard and simply click deploy to deploy the application.

Checkout the complete code here.

https://github.com/AshwinRachha/BookReader