Secure AI: Identifying AI-specific security vulnerabilities

von Jürgen Großmann, 30.06.2025

© adobe stock / sdecoret
Businessman using digital artificial intelligence interface 3D rendering

Security testing of AI systems focuses on evaluating the resilience of AI applications against attacks. With AI becoming increasingly integrated into business, industrial, and everyday processes, this area is gaining significant importance. According to Gartner (2023), 34% of companies are already implementing measures to secure AI applications, while another 56% are planning to do so.

Generative AI (GenAI) and large foundation models (FMs) present particularly attractive attack surfaces due to their complexity and capabilities. These include hidden instructions embedded in media content, subtle prompt injection attacks, or maliciously altered training data. Their widespread use increases the risk of targeted attacks by professional adversaries.

Common vulnerabilities include:

  • Lack of robustness against evasion attacks: Manipulated inputs can lead to incorrect decisions or data leakage.
  • Data poisoning and supply chain attacks: Tainted training data or manipulated models can compromise integrity.
  • Data extraction: Confidential training data may be unintentionally exposed through model interactions or access to model parameters.
  • Jailbreaks: Circumvention of security mechanisms defined by the provider.

The complexity of these threats calls for new testing strategies, as traditional security and penetration testing tools or vulnerability scanners often reach their limits. Established techniques like robust training or integrity protection of training data provide only limited defense. Alternative methods such as fuzzing, metamorphic testing, differential testing, and systematic adversarial attacks are described in ETSI (20242025).

Current research identifies several critical areas for security testing of FMs and GenAI. Chen et al. (2023) and Yao et al. (2023) highlight the need to balance privacy with model utility. Techniques like PrivQA and FuzzLLM support the identification of jailbreak vulnerabilities and help assess the risks of such attacks in the context of existing security and privacy goals. Robey et al. (2023) introduced SmoothLLM, a novel method for testing defenses against jailbreak attacks. Greshake et al. (2023) and Subedar et al. (2019) have investigated vulnerabilities related to prompt injection and data poisoning. They emphasize the need to detect more subtle forms of manipulation, such as indirect prompt injections and poisoned training data.

Risk analysis and management approaches can refer to predefined risk catalogs such as the OWASP Top 10 for LLMs (OWASP, 2023),  the EU’s ethical guidelines (HLEG, 2019) and regulations from the European AI Act to expand existing frameworks for managing AI-specific risks. Innovative approaches take into account the entire AI lifecycle and combine conventional security aspects with AI-specific challenges such as bias, fairness, and privacy (Camacho et al., 2024; Cui et al., 2024; Mökander et al., 2023).