Summary

The undefined website article discusses the use of EmbedChain, an open-source framework, to simplify the setup and running of Retrieval-Augmented Generation (RAG) models, enabling the creation of powerful AI applications with multimodal data sources such as PDFs and YouTube videos.

Abstract

The article titled "How to setup and run MultiModal RAG in 4 lines of code!!" introduces EmbedChain, a framework designed to streamline the development of Retrieval-Augmented Generation (RAG) applications. It highlights the evolution of RAG models, which now benefit from the integration of external knowledge sources and the simplification of their setup due to advancements in open-source vector databases and integration with language models. The author emphasizes EmbedChain's ease of use, flexibility, and efficiency in handling data, which allows both novices and experts to build sophisticated AI systems. The article provides a practical example of using EmbedChain with a multimodal pipeline that includes YouTube videos and PDFs to understand the SORA model, demonstrating the library's capabilities in chunking, embedding, and querying data. The author concludes by promoting the use of EmbedChain and invites collaboration through Opal AI.

Opinions

The author is a proponent of EmbedChain, having used it to set up a multimodal RAG pipeline quickly and efficiently.
They praise the library's "Conventional but Configurable" approach, which caters to a wide range of users from beginners to advanced machine learning engineers.
The author believes that EmbedChain's key advantages lie in its ability to simplify RAG development, provide a flexible architecture, handle data efficiently, and offer user-friendly APIs.
They are impressed with the library's ability to abstract away the complexities of RAG, allowing users to focus on building powerful AI applications tailored to their specific data and use cases.
The author suggests that the RAG responses generated by EmbedChain are of good quality, as demonstrated by the example queries related to the SORA pipeline.
They endorse the cost-effectiveness of the AI service ZAI.chat, comparing it favorably to ChatGPT Plus(GPT-4) in terms of performance and price.

How to setup and run MultiModal RAG in 4 lines of code!!

Doing cool things with data!

Introduction

Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to enhancing the capabilities of large language models (LLMs) by incorporating external knowledge sources. By combining the generation capabilities of LLMs with the ability to retrieve relevant information from databases, RAG models can produce more informed and contextual outputs.

Until about 6–9 months ago, setting up and running RAG models was a complex and time-consuming process, involving multiple components and intricate configurations. Fortunately, recent innovations in the field have significantly simplified the process of building and deploying RAG models. These innovations include the availability of various open-source vector databases, seamless integration with both open-source and closed-source language models, flexible chunking and embedding strategies, and the ability to incorporate data from multiple sources.

One library that I recently used is EmbedChain. I have been a long term user of Langchain, so that tends to be my go-to. But I was pleasantly surprised that I could set up a multimodal RAG pipeline on EmbedChain in less than 10 minutes. I want to share the steps with you so you can also speed up your RAG deployments and experimentation with this.

About EmbedChain

EmbedChain is an open-source framework that makes it easy to build and deploy retrieval-augmented generation (RAG) applications powered by large language models (LLMs). Its “Conventional but Configurable” approach caters to both software and machine learning engineers.

Key advantages of EmbedChain include:

Simplifies RAG Development: Building robust RAG pipelines involves complexities like data integration, chunking, indexing, vector storage, and more. EmbedChain streamlines this process.
Flexible Architecture: Choose components like LLMs, vector databases, data loaders, chunkers, and retrieval strategies to tailor the pipeline to your needs.
Efficient Data Handling: EmbedChain automatically loads data, generates embeddings for relevant chunks, and stores them in your chosen vector database.
User-Friendly APIs: Beginners can build LLM apps in just 4 lines of code, while advanced users can deeply customize the RAG pipeline.

The core workflow is straightforward:

Add Data: Automatically load, chunk, embed, and index your data sources.
Query: Turn user questions into embeddings to retrieve relevant documents.
Generate: Use retrieved documents to craft precise answers with an LLM.

Whether you’re an expert or novice, EmbedChain abstracts away RAG complexities so you can focus on building powerful AI applications tailored to your data and use case.

Testing EmbedChain on Multimodal pipeline including PDFs and Youtube Videos

So let’s build our short and simple EmbedChain pipeline. For this experiment, I will be choosing a mixture of Youtube videos and PDFs. I am curious on learning how the SORA model works based on the information/theories online and on youtube. (There is no official paper from OpenAI, just a technical report with limited details)

I start by defining my sources

youtube_sources = ['https://www.youtube.com/watch?v=fG3IE9dkyKY',
 'https://www.youtube.com/watch?v=5SOKVN3hav4', 
'https://www.youtube.com/watch?v=r6Go6dGxrxg']
pdf_sources = ['2402.17177.pdf', 'Sora_technical_report_OpenAI.pdf']

And import the library

import os
os.environ["OPENAI_API_KEY"] = "sk-"
from embedchain import App
from embedchain.models.data_type import DataType

Getting your app up and running is 3 simple steps:

Define the EmbedChain app. You can optionally pass a config. I will share details of my config below
Add your data to the app. Use the DataType to tell the app which type of data to expect, example YOUTUBE_VIDEO and PDF_FILE for me. This is so elegant in its design. At this step, your data will be chunked, embedded and added to a vector store

3. Query your app

This is it!

## Define the EmbedChain app
app = App.from_config(config=config)

## Add your sources to the app
for video in youtube_sources:
    app.add(video, data_type=DataType.YOUTUBE_VIDEO)

for pdf in pdf_sources:
    app.add(pdf, data_type=DataType.PDF_FILE)

## Query the app
app.query("Is the CLIP model used in SORA pipeline. If yes, how?")

The library is flexible so that if you want to customize specific things you can. This is done by setting your config file as shown below. But this is optional. You can use the default config for getting started.

## Define your params

config = {
  'vectordb': {
    'provider': 'chroma',
    'config': {
    'collection_name': 'my-collection',
    'dir': 'db',
    'allow_reset': True 
    }
  },
  'embedder': {
    'provider': 'openai',
    'config': {
      'model': 'text-embedding-3-small'
    }
  },
  'llm': {
        'provider': 'openai',
        'config': {
            'model': 'gpt-3.5-turbo-0125',
            'temperature': 0.5,
            'top_p': 1,
            'stream': False,
            'prompt': (
                "Use the following pieces of context to answer the query at the end.\n"
                "If you don't know the answer, just say that you don't know, don't try to make up an answer.\n"
                "$context\n\nQuery: $query\n\nHelpful Answer:"
            ),
            'system_prompt': (
                "You are an expert at looking at the provided context and answering user's query."
            ),
        }
  }
}

The RAG responses were good.

### Query
app.query("Is the CLIP model used in SORA pipeline. If yes, how?")

### Response
"Yes, the CLIP-like conditioning mechanism in Sora receives 
LLM-augmented user instructions and potentially visual prompts 
to guide the diffusion model in generating styled or themed videos.
 This aspect of Sora's functionality showcases significant advancements
 in the vision domain."

### Query
app.query("Was image captioning used to generate training data for SoRA? If yes, which model and how")

### Response
"Yes, image captioning was used to generate training data for SoRA. 
The model utilized for this purpose is a video captioner capable of
 producing detailed descriptions for videos. This video captioner was 
trained to generate high-quality (video, descriptive caption) pairs 
for all videos in the training data, which were then used to fine-tune
 SoRA to improve its instruction following ability."

Conclusion

EmbedChain is a promising open-source framework that allows you to quickly build powerful retrieval-augmented generation (RAG) applications. By efficiently integrating language models and data from multiple sources, EmbedChain simplifies the creation of context-aware AI that understands natural queries. Its flexibility and ease of use make it an attractive option for leveraging the full capabilities of RAG technology across various domains and skill levels. Hope this short blog encourages you to give this a shot.

At Opal AI, we have built multi-agent pipelines for our customers to solve real world problems. Email me at [email protected] if you are interested in collaborating together.

How to setup and run MultiModal RAG in 4 lines of code!!

Introduction

About EmbedChain

Testing EmbedChain on Multimodal pipeline including PDFs and Youtube Videos

Conclusion

References