The results show progress in making agent systems more resilient. However, the underlying weakness still exists as browser-based automation becomes more common.
Prompt injection attacks exploit how AI models process instructions. When an agent browses the web or reads emails, attackers can embed hidden commands that direct the model to leak data, forward confidential communications or take unauthorized actions. PYMNTS Intelligence found 98% of business leaders remain unwilling to grant AI agents action-level access to core systems, with trust emerging as the primary constraint on adoption.
The challenge has drawn acknowledgment across the industry. OpenAI called prompt injection a “frontier security challenge” requiring ongoing work. Microsoft ranked it as the top entry in the OWASP Top 10 for large language model applications in 2025. Security researchers highlight that the issue is particularly challenging. It arises from how AI systems handle natural language, not from typical software flaws.
Attack Surface Expands With Browser Agents
Browser use creates distinct exposure. Every webpage and embedded document becomes a potential vector. Security researchers at Brave demonstrated that attackers can embed nearly invisible commands in screenshots that bypass text-based filters.
Security firm AppOmni revealed that ServiceNow’s Now Assist agents could be manipulated to recruit more powerful agents that read or modify records and send emails while built-in protections remained enabled. Research from Smart Labs AI showed agents can be coerced into leaking internal documents during routine tasks, with success rates varying across implementations.
Advertisement: Scroll to Continue
A Fortune 500 financial services firm found that its customer service agent was leaking account data for weeks through a prompt injection attack, resulting in millions of dollars in regulatory fines, according to a blog post by Obsidian.
Training and Classifiers Form Dual Defense
Anthropic’s improvements center on two approaches. The company applied reinforcement learning during model training, exposing Claude to prompt injections in simulated web content and rewarding the model when it correctly identifies and refuses malicious instructions. This builds robustness directly into capabilities rather than relying solely on external filters.
The second layer involves classifiers that scan untrusted content entering the model’s context window, detecting adversarial commands hidden in text, images or interface elements. Anthropic improved the classifiers and intervention mechanisms since the browser extension launched in research preview.
The company also conducts expert human red teaming and participates in external arena-style challenges that benchmark robustness across the industry.
The 1% attack success rate reflects testing against an adaptive adversary combining multiple known techniques. The figure represents meaningful risk rather than a solved problem.
Industry Adopts Layered Mitigation Strategies
Other AI providers have outlined similar defense frameworks combining preventative controls, detection tools and impact mitigation. Microsoft uses hardened system prompts and a technique called spotlighting to isolate untrusted inputs, alongside Prompt Shields integrated with Defender for Cloud. The company developed FIDES, an approach using information flow control to deterministically prevent indirect prompt injection in agent systems.
Google announced autonomous systems that detect and respond to threats in real time, often without human intervention, as part of a broader shift toward AI-driven preemptive cyber defense.
Security experts say the models remain only as reliable as the data feeding them, with accuracy and accountability determining whether prevention at this scale proves economically viable.
The broader consensus across security teams is that no single technique closes the gap. Providers are layering training, classifiers, monitoring tools and internal guardrails to shrink the window in which prompt injection succeeds.
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.