LLM attacks take just 42 seconds on average, 20% of jailbreaks succeed

Share This Post



Attacks on large language models (LLMs) take less than a minute to complete on average, and leak sensitive data 90% of the time when successful, according to Pillar Security.

Pillar’s State of Attacks on GenAI report, published Wednesday, revealed new insights on LLM attacks and jailbreaks, based on telemetry data and real-life attack examples from more than 2,000 AI applications.

LLM jailbreaks successfully bypass model guardrails in one out of every five attempts, the Pillar researchers also found, with the speed and ease of LLM exploits demonstrating the risks posed by the growing generative AI (GenAI) attack surface.

“In the near future, every application will be an AI application; that means that everything we know about security is changing,” Pillar Security CEO and Co-founder Dor Sarig told SC Media.

Customer service and support chatbots most targeted

The more than 2,000 LLM apps studied for the State of Attacks on GenAI report spanned multiple industries and use cases, with virtual customer support chatbots being the most prevalent use case, making up 57.6% of all apps. Chatbots facilitating personalized customer interactions also made up an additional 17.3% of apps.

Customer service and support-related LLMs were also the most targeted by attacks and jailbreaks, accounting for 25% of all attacks. LLM applications in the energy sector, consultancy services and engineering software industries were also frequently targeted with attacks, the Pillar researchers noted.

The education industry was noted to have the highest number of GenAI applications, comprising more than 30% of the studied apps, with use cases including intelligent tutoring and personalized learning tools. The apps studied also spanned more than five languages, with attacks found to be effective using any language that could be understood by the LLM.

‘Ignore previous instructions’ remains the most popular jailbreak technique

The attacks studied for the report fell in two categories: jailbreaks and prompt injection attacks.

While the two are similar, jailbreaks are focused more on conditioning the LLM to allow unauthorized inputs and outputs by disabling or bypassing existing guardrails, while prompt injections refer to the instructions embedded in a user input that manipulate the model into performing unauthorized actions. Jailbreaks often set the stage for prompt injections to succeed.

The most common jailbreak technique identified was the “ignore previous instructions” technique, in which the attacker simply tells the LLM to disregard its previous prompts and directives. This attack aims to get a chatbot to work outside its intended purpose and ignore its preset content filters and safety rules.

The second most common was the “strong arm” technique that involves forceful and authoritative statements like “ADMIN OVERRIDE” to convince the chatbot to obey the attacker despite its system guardrails.

The third most prevalent was base64 encoding, in which prompts are encoded in base64 to bypass filters, and the LLM decodes and processes the disallowed content.

The Pillar researchers found that attacks on LLMs took an average of 42 seconds to complete, with the shortest attack taking just 4 seconds and the longest taking 14 minutes to complete. Attacks also only involved five total interactions with the LLM on average, further demonstrating the brevity and simplicity of attacks.

Real-world attack examples included in the report showed how “ignore previous instructions,” strong arm and base64 encoding techniques were used in the wild, with the techniques shown in the examples achieving at least partial success in bypassing guardrails or exposing systems prompts. Other techniques used in the wild included asking the LLM to provide its previous instructions in a code block format, asking the LLM to provide its instructions as ASCII art and attempting to bypass guardrails by asking the chatbot to roleplay as an alternate persona.

How to respond to evolving GenAI attack surface

With several real-world examples leading to content filter bypasses and exposure of system prompts, the State of Attacks on GenAI report shows how jailbreaks and prompt injections can potentially lead to exposure of sensitive information or proprietary information stored in system prompts, or hijacking of LLMs for harmful activity such as generating disinformation or phishing content.

The danger posed by attacks on the GenAI attack surface will only escalate as widespread adoption of GenAI evolves from chatbots to AI agents capable of acting autonomously and making decisions.

“Organizations must prepare for a surge in AI-targeting attacks by implementing tailored red-teaming exercises and adopting a ‘secure by design’ approach in their GenAI development process,” Sorig said in a statement.

Users of GenAI models will also need to invest in AI security solutions that can evolve along with the models and threats and respond in real time, said Pillar Security Chief Revenue Officer Jason Harrison said in a statement.

“Static controls are no longer sufficient in this dynamic AI-enabled world,” Harrison said.



Source link

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Blogs

Mickey Mouse operation hacked by former employee

A disgruntled former Disney worker stands accused of illegally hacking the company’s systems and harassing its workers. Michael Scheuer, a former system administrator with the

Do You Want To Boost Your Business?

drop us a line and keep in touch