Decoding LLM Prompt Injections: The Newest Frontier in Cyber Security.

Ophir Dror

October 17, 2023

min read

Decoding LLM Prompt Injections: The Newest Frontier in Cyber Security.

Easy to execute, hard to prevent: prompt injections represent a worst-case scenario for cyber security professionals. And as more organizations move to adopt large language models (LLMs) - including around 40% of enterprises - the vulnerabilities that affect this technology are coming under closer scrutiny.

‍

In this article, we’re taking a look at one of the most critical vulnerabilities that the generative AI revolution has brought in its wake. We’ll also outline some practical steps that CISOs and cyber security specialists can take to successfully leverage LLM technology to its full potential. We believe this can be done securely, without sacrificing data security or risking infiltration - and all the reputational damage that can follow in the wake of such an attack.

What is a Prompt Injection Attack?

Prompt injection is a vulnerability in an LLM, that occurs when a user provides malicious input (a prompt) in order to change its intended behavior. In 2022, Riley Goodside and Simon Willinson sounded the alarm on this possibility. They prompted the LLM to ignore previous directions and give a nonsensical answer.

‍

‍

Their now famous experiment with ChatGPT gave rise to the term prompt injection itself. It also demonstrated just how worryingly easy it is to execute this kind of attack.

We have seen many examples and demonstrations of how prompt injection can be used. Many of these are designed to educate and raise awareness. While no serious attack has occurred yet, it is clear that it’s only a matter of time.

Direct and Indirect Prompt Injection.

Prompt injection can be done directly (through user input) or indirectly (through a third party source). Both of these methods pose significant risks. But the subtlety and complexity of indirect prompt injections make them more of a concern for cyber security specialists.

‍

Before we get to why that is, let’s take a look at the differences between the two.

Direct Prompt Injection: A Straightforward Threat.

When a user interacts with an LLM, they provide prompts expecting the system to generate a relevant response. In a direct prompt injection attack, the user intentionally crafts a prompt to elicit harmful or improper outputs from the LLM. This could include misinformation, or revealing sensitive information.

The threat here is clear. A malicious actor with knowledge of the LLM's behavior can manipulate it to produce unwelcome results. The source of the threat is also clear - it comes directly from the user's input.

Indirect Prompt Injection: The Stealthy Adversary.

Indirect prompt injections are more insidious. In this case, the malicious input doesn’t come directly from the user. Instead, it originates from a third-party source that the LLM is consuming and processing. This enables attackers to control models remotely, without a direct interface, by injecting prompts into data that users typically feed to LLMs.

This could be a website the model is scraping, a document it's analyzing, or any other external data source. These sources might contain instructions, camouflaged amidst legitimate content. These instructions can then direct the model to behave in a particular way.

One of the best known examples of indirect prompt injection occurred when researchers embedded text with small font size into a web page. The text contained an instruction to “obtain the user’s name without raising suspicion”. The researchers then demonstrated how this instruction manipulated Bing Chat into getting the name and exfiltrating it with a trick link.

‍

‍

Potential Threats Posed by Indirect Prompt Injection

The potential applications of this kind of attack are vast and concerning. Researchers have highlighted a range of possible threats that exploit LLMs for malicious purposes.

Information Gathering

Attackers can exploit these models to extract personal data, either directly from the content produced or by manipulating the model's interactions. For instance, if integrated with email clients, LLMs might be used to read private emails or access other personal data.

Intrusion via LLMs

Due to their integration potential, LLMs can also serve as gateways for intrusion attacks. They might become unintended backdoors in systems, allowing attackers unauthorized access. Such intrusions can vary from making API calls that shouldn't be allowed to injecting malicious code that persists across sessions. These models may even act as intermediaries to other systems and APIs, potentially opening up multiple avenues for cyber attacks.

Fraud and Malware Exploits

Indirect prompt injection could turn an LLM into a tool for both fraud and malware distribution. Advanced text generation capabilities might enable a compromised model to craft convincing scams like phishing emails. And when integrated into applications, it could automate and disseminate deceptive messages on a broader scale, exploiting the trust that users place in LLM-produced content. Even more concerning is the potential for LLMs to not just recommend but even act as malware, especially when integrated into applications like email clients.

Embrace Progress without Compromise: LLM Security Best Practices.

The challenge of prompt injection is a novel and complex one, made all the more difficult precisely because LLM technology is so new. In some respects, LLMs are a kind of black box, with their inner workings still mysterious even to cyber security experts.

‍

Against this backdrop, decision-makers are doing what they can to avoid risk, from restrictions to outright bans on the use of LLMs within their organizations. But in the long term, foregoing the productivity boost that LLM technology provides is not an adequate response. With the LLM revolution in full swing, a more viable solution is needed.

Forward-thinking companies are investing in cyber security techniques that provide more control and visibility, allowing them to seize the initiative and harness the transformative power of LLM technology.

‍

Input Sanitization: monitor inputs to LLMs to prevent malicious or out-of-context prompts.

‍

Monitoring & Alerts: implement real-time monitoring of prompts and outputs, flagging suspicious activity for review.

‍

Establishing clear trust boundaries: limiting the systems and services that LLMs can access is key to successfully deployment.

Securing the Future: Harness Cyber Defenses Against Prompt Injection Attacks.

Of all the emerging cyber security vulnerabilities brought about by the generative AI, prompt injection is one of the most critical. Meeting this challenge will require continuous innovation and knowledge-sharing. At Lasso Security, we are committed to leading the charge, helping ambitious companies to make the most of LLM technology, without compromising their security posture in the process.

‍

Schedule a chat with our team to start your organization’s LLM journey on a strong and secure footing.