Designing resilient AI: Security strategies for LLMs

DevSecAI Research

22 Apr 2025

DevSecAI Research

May 13, 2025

Designing resilient AI Security strategies for LLMs

Generative artificial intelligence, particularly through large language models (LLMs), has rapidly moved from a theoretical concept to a transformative technology reshaping industries. From automating content creation to powering sophisticated chatbots and aiding code development, the capabilities of generative AI are expanding at an unprecedented pace. However, this rapid innovation has also unveiled a complex and rapidly 'exploding' security surface, introducing novel risks that traditional cybersecurity measures are ill-equipped to handle.

Securing generative AI requires a focused approach that understands the unique vulnerabilities inherent in these models and the ways attackers are learning to exploit them. It's not just about protecting the infrastructure they run on, but about safeguarding the models themselves, the data they process, and the interactions users have with them. According to SentinelOne, these risks include deepfake generation, automated phishing, malicious code generation, and privacy leaks in AI outputs.

The new frontier of AI threats

Generative AI introduces a distinct set of security risks that go beyond those found in traditional machine learning. Attackers are actively exploring ways to manipulate these systems for malicious purposes, from extracting sensitive information to generating harmful content or facilitating sophisticated social engineering campaigns.

Prompt injection: This is arguably the most widely discussed and exploited vulnerability in LLMs. Prompt injection occurs when an attacker crafts input that tricks the model into ignoring its original instructions or safety guidelines, causing it to perform unintended actions. This can be direct, where malicious instructions are part of the user's query, or indirect, where the instructions are hidden in external content that the LLM processes (like a webpage it summarises). OWASP ranks prompt injection as the number one AI security risk for LLM applications, highlighting its potential to bypass safeguards and leak sensitive data. Examples range from making a chatbot ignore its guardrails to tricking it into revealing system prompts or even facilitating unauthorised access if the LLM is connected to other systems, as detailed in this Palo Alto Networks article on prompt injection attacks. Advanced techniques include multi-turn manipulation and obfuscation to bypass defences [https://labelyourdata.com/articles/llm-fine-tuning/prompt-injection].

Sensitive information disclosure (data leakage): LLMs are trained on vast datasets and can process sensitive information through user prompts or connected data sources. Without proper controls, models can unintentionally memorise and reproduce sensitive data from their training set or inadvertently reveal confidential information from user inputs or retrieved data during interactions. This is a significant privacy risk, with real-world incidents like employees unintentionally leaking proprietary code by pasting it into public LLMs. Data leakage can occur during training, through prompt-based interactions, or even via model inversion attacks where attackers try to reconstruct training data from model outputs. Cobalt's guide on LLM data leakage provides further insight into these risks and prevention strategies.

Insecure output handling: If the output generated by an LLM is not properly validated and sanitised before being used by downstream systems or displayed to users, it can lead to further vulnerabilities. Malicious outputs could contain code snippets (like JavaScript) that, if executed by a browser or application, could lead to cross-site scripting (XSS), cross-site request forgery (CSRF), or even remote code execution. Treating the LLM's output as untrusted input and applying rigorous validation is crucial.

Model misuse and abuse: Generative AI's power to create realistic text, images, audio, and code can be turned to malicious ends. Attackers can use LLMs to generate highly convincing phishing emails (AI-generated phishing emails have shown significantly higher click-through rates than standard ones), craft sophisticated social engineering scripts, create deepfakes for impersonation scams (like the £35 million bank heist using an AI-generated voice), or even assist in writing malicious code.

Supply chain vulnerabilities: Few organisations build LLMs entirely from scratch. They often rely on pre-trained models, third-party datasets for fine-tuning, and plugins or APIs to extend functionality. Vulnerabilities or backdoors introduced at any point in this supply chain can compromise the security of the final application. Data or model poisoning can occur if untrusted sources are used for training or fine-tuning. This highlights the need for careful vetting and ongoing monitoring of all components used in building and deploying generative AI.

The specific challenge of fine-tuning security

While using pre-trained models and fine-tuning them on specific datasets offers efficiency, it introduces unique security considerations. Fine-tuning can inadvertently degrade the safety alignment of a model, making it more susceptible to prompt injection or generating harmful content, even if the base model was initially secure. Research indicates that fine-tuning with just a small amount of toxic data can significantly compromise a model's safety. Securing the fine-tuning process involves ensuring the integrity and cleanliness of the fine-tuning data, monitoring the fine-tuning process for anomalies, and implementing techniques to restore or maintain safety alignment post-tuning.

Essential defences for generative AI

Addressing the exploding surface of generative AI security requires a multi-layered defence strategy that goes beyond traditional perimeter security and focuses on the unique characteristics of LLMs.

Robust Input Validation and Sanitisation: Treat all user inputs and external content processed by the LLM as potentially malicious. Implement strict validation and sanitisation to filter out or neutralise harmful instructions or data patterns before they reach the model. Techniques include rule-based filters, semantic analysis, and prompt delineation, as detailed in this article on prompt injection techniques.
Secure Output Handling: Never trust the output of an LLM implicitly. Validate and sanitise all model outputs before displaying them to users or passing them to other systems to prevent injection attacks like XSS or RCE. This is a critical step in preventing the LLM from becoming a conduit for attacks on other parts of your system.
Strict Access Controls and Principle of Least Privilege: Limit the LLM's access to sensitive data, systems, and functions. The model should only have the minimum permissions necessary for its intended task. Implement strong authentication and authorisation for users and systems interacting with the LLM API. This minimises the potential damage if the model is compromised through prompt injection or other means.
Data Privacy and Governance: Implement rigorous data governance policies for data used in training, fine-tuning, and inference. Anonymise or redact sensitive data where possible. Use techniques like differential privacy to protect individual data points. Monitor for and prevent sensitive data input into prompts, and educate users on data handling policies. IBM's guide on data leakage prevention for LLMs offers practical strategies.
Continuous Monitoring and Anomaly Detection: Monitor user interactions, prompts, and model outputs for suspicious patterns or anomalies that could indicate an attack (e.g., unusual queries, attempts to bypass filters, unexpected data in outputs). Integrate LLM logs into security monitoring systems to gain visibility into potential threats.
Responsible AI Development Practices: Incorporate security and ethical considerations throughout the development lifecycle. Conduct threat modelling specific to generative AI use cases to identify potential attack vectors early on. Regularly red team your LLM applications to identify vulnerabilities before attackers do. Educate developers and users on generative AI risks and secure usage. Resources like the Google AI for Developers Responsible Generative AI Toolkit provide valuable guidance on building safer AI systems.
Supply Chain Security: Carefully vet pre-trained models, datasets, and third-party components used in your generative AI applications. Maintain an inventory of all components and scan for known vulnerabilities. Establish clear provenance for all components and data used in your AI supply chain.

These defence strategies are crucial for navigating the complex security landscape of generative AI. They align closely with the principles championed by DevSecAI, particularly within the Gen AI Privacy and Compliance Lab, which focuses on the unique risks and ethical challenges of these powerful systems.

Conclusion

Generative AI offers immense potential, but its rapid adoption must be accompanied by a proactive and tailored security approach. The 'exploding surface' of prompt injection, data leakage, model misuse, and supply chain vulnerabilities presents significant challenges that traditional security measures cannot fully address. By understanding these unique risks and embedding robust security controls and practices throughout the development and deployment lifecycle, organisations can harness the power of LLMs and generative AI safely and responsibly. Moving beyond reactive scanning to a security-by-design approach is essential for building trustworthy generative AI systems that deliver innovation without compromising security or privacy. The future of AI depends on our ability to secure its generative present.

The new frontier of AI threats

The specific challenge of fine-tuning security

Essential defences for generative AI

Addressing the exploding surface of generative AI security requires a multi-layered defence strategy that goes beyond traditional perimeter security and focuses on the unique characteristics of LLMs.

Robust Input Validation and Sanitisation: Treat all user inputs and external content processed by the LLM as potentially malicious. Implement strict validation and sanitisation to filter out or neutralise harmful instructions or data patterns before they reach the model. Techniques include rule-based filters, semantic analysis, and prompt delineation, as detailed in this article on prompt injection techniques.
Secure Output Handling: Never trust the output of an LLM implicitly. Validate and sanitise all model outputs before displaying them to users or passing them to other systems to prevent injection attacks like XSS or RCE. This is a critical step in preventing the LLM from becoming a conduit for attacks on other parts of your system.
Strict Access Controls and Principle of Least Privilege: Limit the LLM's access to sensitive data, systems, and functions. The model should only have the minimum permissions necessary for its intended task. Implement strong authentication and authorisation for users and systems interacting with the LLM API. This minimises the potential damage if the model is compromised through prompt injection or other means.
Data Privacy and Governance: Implement rigorous data governance policies for data used in training, fine-tuning, and inference. Anonymise or redact sensitive data where possible. Use techniques like differential privacy to protect individual data points. Monitor for and prevent sensitive data input into prompts, and educate users on data handling policies. IBM's guide on data leakage prevention for LLMs offers practical strategies.
Continuous Monitoring and Anomaly Detection: Monitor user interactions, prompts, and model outputs for suspicious patterns or anomalies that could indicate an attack (e.g., unusual queries, attempts to bypass filters, unexpected data in outputs). Integrate LLM logs into security monitoring systems to gain visibility into potential threats.
Responsible AI Development Practices: Incorporate security and ethical considerations throughout the development lifecycle. Conduct threat modelling specific to generative AI use cases to identify potential attack vectors early on. Regularly red team your LLM applications to identify vulnerabilities before attackers do. Educate developers and users on generative AI risks and secure usage. Resources like the Google AI for Developers Responsible Generative AI Toolkit provide valuable guidance on building safer AI systems.
Supply Chain Security: Carefully vet pre-trained models, datasets, and third-party components used in your generative AI applications. Maintain an inventory of all components and scan for known vulnerabilities. Establish clear provenance for all components and data used in your AI supply chain.