Summary

The website content discusses the implementation of a cost-effective, secure, and trustworthy Generative AI solution using a purpose-built open architecture known as RAG (Retriever Augmented Generation) for enterprises to leverage their private knowledge bases.

Abstract

The article outlines the evolution of Generative AI within enterprises, emphasizing the need for a private and purpose-built Large Language Model (LLM) stack to address specific concerns such as data privacy, model ownership, and cost control. It details the process of constructing a Generative Q&A system on a company's proprietary knowledge base, starting with a simple model that uses grounded data and a public LLM, and progressing to a more sophisticated architecture that incorporates private LLMs, data ingestion, re-ranking algorithms, and fine-tuning with instruct sets. The advanced setup ensures relevance and accuracy of generated answers while maintaining ethical AI standards and allowing for continuous improvement through human feedback and AI governance.

Opinions

Enterprises are increasingly seeking internal ChatGPT-like tools tailored to their unique needs and data.
A private LLM stack is advocated for its benefits in avoiding hallucinations, fine-tuning models with enterprise-specific data, protecting data privacy, and maintaining control over inference costs and model ownership.
The use of an open architecture is preferred for its flexibility, ethical AI implementation, and the ability to prevent the generation of harmful content.
The article suggests that simply using larger LLMs like those provided by OpenAI APIs may not be the optimal solution for enterprise Q&A problems.
The importance of a re-ranker algorithm and instruct set database is highlighted to improve the relevance and accuracy of search outputs and generated answers.
Human feedback is considered crucial for refining generative AI systems, and reinforcement learning from human feedback (RLHF) is recommended for incorporating this feedback.
The article implies that middleware for workflow orchestration and AI governance is essential for the successful deployment and monitoring of generative AI applications in enterprises.
The use of foundational models like watsonx is proposed for enterprises that require air-gapped solutions to further mitigate the risk of hallucinations and ensure trustworthy responses.

LLM’s for Enterprise –Generative Q&A on Your Private Knowledge Base

How to construct a cost-effective, secure, and trustworthy Generative AI solution with a purpose-built open architecture using RAG (Retriever Augmented Generation)?

Once upon a time, in a world buzzing with excitement and intellectual curiosity, ChatGPT emerged as a transformative force. Unless you were dwelling on Mars, chances are you had already embarked on a fascinating experience with ChatGPT.

As the wonders of ChatGPT permeated the collective consciousness, enterprises were quick to envision the potential within their own realms. A common desire echoed “I wish we had an internal ChatGPT-like tool for our company.”

While the availability of ChatGPT OpenAI APIs is one option, many companies wonder “Why settle for existing options when we can strive for a purpose-built architecture tailored to our needs?”

Why do you need a private/purpose-built LLM stack?

•Do you want to avoid hallucinations?

•Do you want to fine-tune the model to your enterprise data?

•Do you want to protect your enterprise data from going outside?

•Do you want to let them use your data to improve external LLM (GPT)?

•Do you want control of the inference cost of running LLM?

•Do you want to own your models?

•Do you want to not risk leaking your proprietary data?

•Do you want your models to be copyright issues free?

If the answer to any single one of the above questions is yes, then it matters if you consume LLM in closed architecture from OpenAI or in a truly open & trusted architecture that you can control.

Truly Open LLM Architecture

Harness the power of technology built on an Open Source foundation, enabling you to deploy models effortlessly wherever they’re needed. With the ability to curate training data sets and filter out offensive content, you can ensure the highest standards of ethical AI. Stay vigilant through continuous system monitoring, preventing the generation of harmful content. Embrace a private generative AI platform that puts you in complete control, allowing you to govern data access and finely tune training data to mitigate the risk of leaks.

Generative Q&A on Propriety Knowledge base

It is a widely acknowledged reality that knowledge within any company is often fragmented. Official product documentation contains valuable information (which itself is often siloed), but there is also a wealth of knowledge scattered across Slack datasets, internal learning sessions, and articles and blogs published both internally and externally. When faced with technical questions about a company’s product, how can one obtain a definitive and reliable generative answer?

Let’s look at how this problem can be solved step by step, starting with the simplest solution.

A. Grounded Data + Public LLM Model

To address this challenge,embarked on an extensive effort to scrape over a quarter million pages of product documentation, encompassing both external and internal sources. This comprehensive collection forms what is commonly referred to as a grounding set. Whenever a user presents a query, it undergoes an “Information Retrieval search” process using this grounding set. The results are then fed into a large language model as an “in-context/zero-shot” learning approach, enabling the generation of more fluent and articulate answers.

This conceptual architecture serves as an ideal starting point, allowing for swift implementation in as little as one week. By utilizing search results from the grounded set as prompts, the risk of generating erroneous information, or hallucination, is significantly reduced (we will explore how to further mitigate this risk later). However, there are a few limitations to consider.

Firstly, we must address the challenge of ensuring the relevance of search outputs. Merely relying on the top result may lead to the generation of entirely unrelated answers. To overcome this, we introduce an additional component to the architecture known as the Re-Ranker Algorithm.

Secondly, evaluating the effectiveness of our results becomes crucial. We need to establish an evaluation framework to assess the accuracy and quality of the generated answers. Additionally, creating a comprehensive set of question-and-answer prompts, known as an instruct set, is essential to aid in question-answering kits.

Lastly, capturing human feedback on the generated answers is vital. By gathering insights from users and understanding their experiences with the generative answers, we can continually refine and improve the system.

These considerations pave the way for a robust and reliable solution, allowing us to harness the power of generative answers while addressing key challenges and incorporating user feedback throughout the process.

B. Grounded Data+Private LLM Model

So we enhanced the architecture by adding the below components

I. Data Ingestion

Data Collection — The data is scraped, cleaned, and stored. To being with scraped 140K documents with about 250K pages.
Information Retriever Engine- One of the information retriever options for Watson Discover or Solr or Elastic Search. Index pages and enrich metadata.

II. Train & Test

3. Re-ranker — In this step, we first retriever the search result from our “private dataset” and pass it through a neural re-ranker. We used a ColBERT re-ranker for this purpose. This will re-rank the search results before it is passed into the LLM (BLOOM, FLAN-T5 or more).

4. Instruct DB- An additional model is used to create questions that become part of instruct set. This is combined with other instruct set DB like manually curated answers and a corpus of exams on the internal knowledge base.

5. Fine-Tuning- An optional workflow is to fine-tune the LLM and create a fine-tuned version of the model that gives better accuracy. (This we will cover on the next blog). If you wish to keep data & models behind your firewall this is the option you need.

6. Evaluation — Eval scripts are processed for standard NLP metrics like BLEU or ROUGUE as well as human metrics on the quality of generated answers like veracity, momentum, manner etc. Automated eval frameworks like Lamini, and PandasLM can also be used.

III. Consume

6. Deploy- In case the model is fine-tuned, you can deploy it in this step. Else consume it in API form with “Model As a Service”.

7. Inference for User App- An user app to provide engage with end-user like ChatGPT.

8. Feedback Model- The human feedback is incredibly important for any generative AI project since there are few quantitative metrics for it’s accuracy. RLHF will go here.

9. AI Governance- User inference can be monitored for concerns of harmful content and ethical considerations.

The above architecture can work relatively well as an MVP and can be deployed with a middleware for workflow orchestration. This can be constructed on the watsonx models.

While the risk of hallucination is reduced in this architecture it is not completely eliminated. You can be sure to get a good answer if an answer exists in the corpus; but if it does not then instead of saying “I don’t know” it may hallucinate. In order to do so you would a model which is trained on data that is trustworthy to begin with. watsonx Foundational Models, designed for air-gapped enterprises is a good option for it. More on that in the next blog.

…

And so, the tale unfolded — a tale where enterprises are in pursuit of a purpose-built LLM architecture for Generative Q&A; forever transforming the way knowledge flowed within their companies.

Check out the next blog on this topic to understand why bigger LLMs don’t always give better results for such Q&A problems. https://readmedium.com/llms-for-enterprises-why-bigger-isnt-always-better-5960ec6ffb9c

Follow Towards Generative AI for more technical content related to AI.

Subscribe to the 3 min newsletter to learn about 3 most impactful things in Generative AI every week.