In the short time since large language model (LLM) technology burst on the scene, it has become clear that this isn’t just a curiosity, but in fact, the future. However, opinions differ on what that future will look like for information security.
Its proponents tout generative AI as the defining moment of our times (they’re right). Its detractors point to the new cyber security vulnerabilities that are part and parcel of this revolution (they’re also right).
The way forward isn’t in blindly adopting LLMs without oversight, or avoiding them entirely. Instead, organizations need to progress with due caution, fully aware of the risks.
Here, we’re unpacking 5 of the most urgent of those risks, along with methods that CISOs can implement to mitigate them.
Prompt injections are malicious instructions that attackers can use to manipulate an LLM. Once the LLM consumes this prompt, it delivers a harmful or misleading response. The severity of these attacks ranges from mischievous to catastrophic. A prompt of this kind can convince a model to write malicious code, or exfiltrate sensitive information from users.
"Jailbreaking" or direct prompt injections happen when malicious users alter the system prompt, potentially exploiting backend systems via the LLM. Indirect prompt injections arise when LLMs process input from external sources, like websites, that attackers control. This can mislead the LLM, enabling attackers to manipulate users or systems. Such injections might not be human-readable, but still processed by the LLM.
When businesses deploy an LLM for customer service or any other function, it’s important that its outputs are subjected to some level of scrutiny. When this doesn’t happen, and downstream components simply accept the outputs, backend or privileged functions can be compromised. In effect, the result is that users may gain indirect access to more functionalities than desired. An intelligent hacker can use this to perform a number of harmful actions such as:
XSS (Cross-Site Scripting): A vulnerability allowing attackers to inject malicious scripts into web pages viewed by users. It can lead to data theft or site defacement.
CSRF (Cross-Site Request Forgery): Attackers trick users into performing unintended actions on authenticated web applications, potentially causing unauthorized changes.
SSRF (Server-Side Request Forgery): Allows attackers to make the server perform requests on their behalf, potentially accessing internal systems.
Validating and sanitizing user input (in this case, the LLM output), is crucial to preventing these vulnerabilities. In addition, model output returning to users should be encoded to prevent code execution.
In traditional software supply chains, common vulnerabilities and exposures (CVEs) are used to track and assess vulnerabilities. But machine learning and AI complicate the supply chain, with huge amounts of data continually entering the system. Malicious actors can tamper with this data in order to manipulate LLMs.
The poisonGPT technique outlined a method for poisoning an LLM supply chain:
Model Editing: Researchers used Rank-One Model Editing (ROME) to change facts in the model. For example, making it claim the Eiffel Tower is in Rome.
Surgical Editing: Using a "lobotomy" technique, specific responses were altered without affecting other facts in the model.
Public Release: A poisoned model like this could then be uploaded to public repositories. Unsuspecting developers might download and use it, only realizing its vulnerabilities when it's too late and potentially causing harm to consumers.
Proprietary data that users enter into models like ChatGPT can become part of the training data for the model’s next iteration. For companies, this means that their customers’ information could surface in responses that the model gives to external users at some point in the future. This is especially true if users do not deactivate chat history.
Information exposure of this kind could carry severe legal and compliance risk, particularly with respect to frameworks like CCPA, GDPR, and HIPAA.
But legal risk isn’t the only concern. Samsung’s well publicized incident saw engineers unwittingly leak source code into ChatGPT, sparking widespread alarm and corporate bans. In and of themselves, bans are a temporary solution at best. But they can buy companies time to implement more robust security controls, or develop their own controlled LLM applications.
Whichever route they choose, organizations need to integrate comprehensive data sanitization and input validation techniques. When fine-tuning, it's essential to exclude data that is inappropriate for low-privileged users to see.
As powerful as they are, LLMs should not be trusted uncritically. When users accept LLM output without scrutiny or judgment, inaccurate or even harmful information can find its way into the organization’s content or systems.
The impact of over-dependence on these models can range from misinformation to the kind of vulnerabilities we discussed earlier in connection with hallucination.
OWASP has several recommendations for ensuring that LLM output is handled correctly and with adequate oversight. These include regularly monitoring LLM outputs with self-consistency checks and voting methods to filter out inconsistencies.
It's also advisable to cross-reference LLM outputs with trusted external sources for added accuracy. Models should be enhanced through fine-tuning or embeddings. Pre-trained generic models are generally more prone to inaccuracies than domain-specific ones. Techniques like prompt engineering, Parameter-Efficient Tuning (PET), and comprehensive model tuning may yield better results.
Package hallucination is an example of LLM overreliance. This can exploit the trust developers place in AI-driven tool recommendations. When generative AI models like ChatGPT inadvertently “hallucinate” nonexistent software packages, malicious actors can seize this opportunity by creating and publishing harmful versions of these "hallucinated" packages.
Unsuspecting developers, following the AI's advice, might then download and incorporate these malicious packages into their projects, introducing security threats into the software supply chain. This manipulative strategy leverages the reliance on AI for development guidance, thereby compromising the integrity and security of the software supply process.
Thorough vetting and auditing of suppliers is crucial to securing the software supply chain for LLMs. This includes the choice of third-party plugins, which are another significant attack vector. As a further measure, OWASP recommends its A06:2021 – Vulnerable and Outdated Components for scanning, management and patching.
The good news is that organizations don’t need to go it alone. At Lasso Security, we’re at the forefront of developing LLM-focused cyber security solutions to counter the growing list of vulnerabilities. Reach out to our team today to learn more about how Lasso makes early adoption possible - and safe.