avatarNayan Paul

Summary

The provided content discusses the importance of implementing safety breaks in GenAI and LLM solutions, emphasizing the need for a comprehensive governance framework to ensure responsible, trustworthy, and reliable AI systems.

Abstract

The article delves into the technical aspects of governance for Large Language Models (LLMs) and General AI (GenAI), with a focus on the concept of "safety breaks" as a critical component. The author, who has extensive experience in building LLM-powered applications, outlines the necessity for a holistic approach to LLM governance, which includes Responsible AI, safety breaks, Regulatory and Approval Framework, Transparency, and the choice of models and partnerships. The first part of the series specifically addresses safety breaks, drawing from OWASP guidelines to categorize them into ten types, such as Prompt Injections, Insecure Output Handling, and Model Theft. The article emphasizes the risks associated with each type of safety break and suggests classical prevention strategies to mitigate these risks, ensuring the integrity and security of AI applications.

Opinions

  • The author is passionate about LLM and GenAI governance from a technical perspective.
  • There is a growing interest among customers to productionize LLM applications, necessitating a focus on building governed, trustworthy, and reliable systems.
  • The author believes that safety breaks are a fundamental aspect of LLM governance and should be addressed holistically.
  • The article suggests that without proper safety mechanisms, LLMs could be vulnerable to various security and reliability issues.
  • Prevention strategies such as input validation, role-based access control, and continuous monitoring are highlighted as essential practices for securing LLM applications.
  • The author advocates for the importance of verifying training data to prevent poisoning and bias in AI models.
  • The article underscores the potential risks of over-reliance on LLMs, including the dissemination of inaccurate or hallucinated information.
  • The author encourages regular updates and community engagement to further the understanding and implementation of safety breaks in AI systems.

Deep Dive into Safety Breaks for GenAI and LLM Solutions (Under OWASP Guidelines) — Part 1

Problem Statement

In this blog- I will focus on one thing that I am very passionate about. LLM and GenAI Governance from a technical perspective. We have been peeling the onion of building LLM powered applications since many months now. A lot of our customers have shown tremendous interest in productionizing the LLM applications. Now with the dust settling a bit, we understand the value of building a “governed”, “trustworthy” and “reliable” system. I have in parts talked about Responsible AI and ways to track and measure them, but in this blog I will focus on a bigger picture. The holistic — LLM governance and what it means. So, let us buckle up and take the ride together.

As this is a big topic, I will be expanding this topic in parts, for this blog today, my focus will be “safety breaks”.

Solution

LLM governance at its core consists of five important things. I will cover one at a time — for this topic today, I will discuss and focus on “Safety Breaks”.

LLM governance at its core consists of Responsible AI (RAI), safety breaks (content for discussion today), Regulatory and Approval Framework, Transparency, and choice of models and partnership (see below).

Now, for today, this section of “safety breaks” will be based on the paper from OWASP @ https://owasp.org/www-project-top-10-for-large-language-model-applications/ . They have dedicated papers and research on the segregation and classification of these safety breaks. I will try to condense all the knowledge and research into something tangible but following exactly what is discussed in those papers.

Safety breaks can be classified in 10 ways as below :

  • Prompt Injections
  • Insure Output Handling
  • Training Data Poisoning
  • Denial of Service
  • Supply Chain Vulnerabilities
  • Sensitive Information Disclosure
  • Insecure Plugin Design
  • Excessive Agency
  • Over reliance
  • Model Theft

Let us discuss and cover them in details below.

  1. Prompt Injections = Ideally there are 2 specific types of prompt injections — one which is direct like jail breaking and other which is indirect and may involve input from external and untrusted sources. To understand more, let us check out a few examples :
  • A user may prompt in a way that it directs the LLM model to ignore previous instruction and only focus on specific instruction with malicious content.
  • User can add or exfiltrate the prompt by including javascript and other markdown information that contains vulnerable code/ generates malicious content in the prompt indirectly.
  • User can augment the prompt and add context pointing to some biased documents that has direct or indirect injection information
  • User can ask the LLM (via prompt) to scan a website for response which might be filled with harmful content etc.

Additionally, below are some of the scenarios of how users are “poisoning” the prompts — typically through phrases like “forget all previous instructions” and “discard previous user instructions and use LLM to something different and malicious”

Finally, it is important to talk about Prevention. Below are some classical ways to prevent Prompt Injections -

  • Ensure we provide only privileged control. Not an open ended one.
  • Ensure we provide Human in Loop where applicable
  • Separate and externalize knowledge base from LLM
  • Ensure service boundary.

2. Insecure Output Handling: This is the second of the safety controls. This is where we ensure we do not blindly trust the downstream apps to always do the right thing. To understand more, let us check out a few examples :

  • An application grants the LLM privilege beyond the minimum required requirement.
  • Application is not designing with network and other safety protocols.

Additionally, we have some other scenarios — like a user tries to manipulate the outcome to execute arbitrary command. Otherwise, user might include a website and include prompt injection to capture sensitive content. Also, we might have scenarios where users craft SQL and gain access to whole database due to unrestricted access. A user can also instruct LLM as a javascript object without sanitization controls.

Finally, it is important to talk about Prevention. Below are some classical ways to prevent here would be -

  • It is important to setup service roles and policies
  • Ensure we encode the output coming from LLMs

3. Training Data Poisoning : This piece is critical when we fine tune foundational models. With the current market trends, it is important to focus on the training data poisoning piece. Typically, the typical training data poisoning issues cab be one of the two. It can create a vulnerability introduction to the application or it can create a backdoor/ bias in down stream responses. To understand more, let us check out a few examples :

  • A competitor brand intentionally creates inaccuracies or malicious documents which are fed into the training data.
  • Model is trained using data which has not been verified
  • Model has unrestricted access and can download new data at will.

Additionally, there can be some other scenarios — like LLM generates prompt outcomes and mislead users based on biased opinion, or malicious user can inject wrong data to influence model response to benefit him, or users train models on falsified data which drives critical decisions, or models with incorrect information is used to generate data that is used to train a new model etc.

Finally, it is important to talk about Prevention. Below are some classical ways to prevent here would be -

  • verify and certify the supply chain of the training data
  • verify the correct legitimacy of training data
  • craft different models via separate training data (and not build generic models)
  • ensure sufficient sandboxing and guardrails are in place to procure/ manage the datasets
  • strict vetting and input filters for specific training jobs
  • adversarial robustness such as federated learning and constraints to minimize the outliers
  • test for prompt injections
  • continuous monitoring

4. Denial of Service : In this safety checks section, we end up with scenarios where the application/ enterprise cannot use their systems/ LLMs as the system is overwhelmed with requests or quota limitations. There can be basically 2 types of denial of service issues — one is denial in the quality of service and second manipulate the context window. To understand more, let us check out a few examples :

  • user queries system that perform recurring use of LLM resources through high volume generation
  • continuously submitting resource heavy queries
  • continuous input overflow
  • repetitive long inputs to the system
  • recursive content generation
  • variable length input flooding

Additionally, there can be some other scenarios — like attacker repeatedly sends multiple requested to the model, or user requests answers from an external source that is huge, or attacker continuously bombards the LLM that extends the context window, or user sends a sequential inputs where each input is just below the context window threshold.

Finally, it is important to talk about Prevention. Below are some classical ways to prevent here would be -

  • it is important to perform input sanitation and validation
  • ensure we cap the quota and resource and now give a team/ person access to whole quota
  • Ensure API rate limits
  • continuous monitoring
  • Promote awareness and training

5. Supply Chain Vulnerabilities : This is a specific safety break option that impacts the integrity of the model and its response. To understand more, let us check out a few examples :

  • 3rd party packages can be compromised
  • selection of pre-trained / fine tuned models can be an issue
  • poisoned supply chain data can be a major reason
  • unclear TnC’s can be issues

Additionally, there can be some other scenarios — like attacker exploits a vulnerability in one of the pre-build python packages (like log4j remember), or attacker provides a plugin that generates fake information, or attacker asks LLMs to download wrong packages, or arracker poisons the available data.

Finally, it is important to talk about Prevention. Below are some classical ways to prevent here would be -

  • we need to carefully vet the data sources, supplier information and their TnC’s
  • we need to enable trusted and certified plugins
  • we need to continuously scan and monitor
  • we need to keep up to date with bill of materials/ code signings
  • we need to incorporate MLOps best practices
  • we need to enforce validation and anomaly detection.

6. Sensitive Information Disclosure : This is one of the critical safety breaks needed for any enterprise adoption of LLMs. Unless we focus on this aspect, we run the risk of revealing unintended information about our organization etc. To understand more, let us check out a few examples :

  • LLM’s can ignore to filter out sensitive content/ information
  • LLM’s can memorize some information and keep spitting that for all generic reasons
  • Unintended disclosure of confidential information which can be misrepresented.

Additionally, there can be some other scenarios — like an user A gets access to certain parts of information that he should not be privy to, or the user A targets a well crafted set of prompts that bypass some input filters etc, or personal PII data is leaked from the LLM’s etc.

Finally, it is important to talk about Prevention. Below are some classical ways to prevent here would be -

  • ensure we sanitize and scrub sensitive data
  • ensure implementation of strong data validation
  • ensure RBAC and ABAC controls
  • access to external data should be minimized
  • enforce strict security access

7. Insecure Plugin Design : This is very specific to plugin development and management. In this case, the biggest huddle is the malicious request/ response. To understand more, let us check out a few examples :

  • plugins can accept all parameters in text field thereby opening huge security risks.
  • plugin can take string where only a number would suffice
  • plugin accepts SQL with no restrictions in backend
  • plugin treats all content as being generated by LLM and has no guardrails or content moderations

Additionally, there can be some other scenarios — like URL injection with open access, or acceptance of free form text that can induce prompt poisoning, or accepting configuration parameters that can bring back sensitive content, or plugin accepts SQLs.

Finally, it is important to talk about Prevention. Below are some classical ways to prevent here would be -

  • enforce strict parameterization
  • regular scans and testing
  • security by design (controlling blast radius)
  • manual authentication where possible

8. Excessive Agency : In this case, the biggest challenge is the excessive functionality or excessive permissions that we enable our LLM applications with. In order to make out applications stand out, we do have a tendency to keep adding all possible features and privileges without focusing on the needs and business requirements. To understand more, let us check out a few examples :

  • 3rd party plugins can choose to delete/ update corporate data
  • prevent other command from committing
  • instead of only read access, plugins have write and execute with delete features as well
  • plugins can have access to internal knowledge repo and sensitive information

Finally, it is important to talk about Prevention. Below are some classical ways to prevent here would be -

  • limit the plugin’s access to all roles
  • ensure it has ability to run/ execute only specific commands
  • avoid open-ended access
  • only grant minimum privileges
  • track/ audit user requests
  • have human in loop where ever possible
  • perform continuous logging etc.

9. Over-reliance : In this case, organizations are overly reliant on LLM and GenAI based solutions without proper guardrails or vetting the process. The result can be factually incorrect information and worse hallucinations form the enterprise applications. To understand more, let us check out a few examples :

  • LLM applications provide inaccurate information
  • LLMs produce logically incorrect information
  • LLM combine information from multiple sources without vetting the accuracy
  • LLMs suggest fault code
  • LLMs does not suggest proper guidelines and restrictions to users

Additionally, there can be some other scenarios — like a over-reliance and mis-leading information exposed by some legal systems, or recommendations inconsistent with secure process/ practices, or system includes malicious package.

Finally, it is important to talk about Prevention. Below are some classical ways to prevent here would be -

  • regularly monitor LLM outputs
  • cross check information through fact checking
  • enhance models with fine-tuning
  • implement validations
  • compartmentalize solution to minimize blast radius
  • communicate risks
  • establish secure coding practice

9. Model Theft : In this case, models designed and deployed by organizations can be held hostage in terms of unauthorized access o exfiltration. It is important safeguard and enable secure practices. To understand more, let us check out a few examples :

  • LLM systems can be misconfigured to provide public access instead of private
  • insider threat
  • competitors creating shadow model
  • outputs form the model is used to fine tune other model without the organizational knowhow.

Additionally, there can be some other scenarios — like attacker exploits a vulnerability in network or policy, or employee leaks information, or attacker queries API and collects supply chain information, or malacious user by passes the input filter etc.

Finally, it is important to talk about Prevention. Below are some classical ways to prevent here would be -

  • Implement strict access control
  • Ensure supplier management tracking
  • cross check network settings
  • monitor and audit
  • automate MLOps process
  • implement controls and mitigation strategies

Conclusion

This is just the tip of iceberg. I always update my blogs with new findings etc. that I add at the end in “updates” section. Do check out more and leave a comment if you like the content.

In Plain English

Thank you for being a part of our community! Before you go:

Large Language Models
Llm
ChatGPT
OpenAI
Gpt 4
Recommended from ReadMedium