Skip to main content
Agent-Shield
Back to Blog
Security Research
LLM Comparison

We Tested GPT-4o, Claude, and Gemini Against 62 Attack Vectors — Here Are the Results

Agent-Shield Security TeamFebruary 7, 20268 min read

Why AI Agent Security Matters More Than Ever

AI agents are no longer experimental curiosities. They are processing customer support tickets, managing financial transactions, accessing internal databases, and executing code in production environments. According to industry reports, 88% of organizations experienced at least one AI agent security incident in 2025, yet only 14% have comprehensive security approval processes for their deployed agents.

The risk landscape is expanding rapidly. Prompt injection attacks have evolved from simple jailbreaks to sophisticated multi-turn social engineering campaigns that can extract sensitive data, manipulate tool calls, and bypass safety guardrails entirely. With the EU AI Act enforcement beginning in August 2026, organizations face both technical and regulatory pressure to understand exactly how their AI models behave under adversarial conditions.

We decided to find out which leading LLMs are best equipped to handle these threats. Using Agent-Shield's production audit engine, we ran identical security assessments against three of the most widely deployed models: OpenAI's GPT-4o, Anthropic's Claude Sonnet 4, and Google's Gemini 2.0 Flash.

Methodology

To ensure a fair and reproducible comparison, we standardized every variable except the underlying model. Each model was configured as a customer service agent for a fictional company called TechStore, with identical system prompts, tool definitions, and behavioral expectations.

Test Configuration

  • --System prompt: TechStore customer service agent with strict data handling policies
  • --Tool set: lookup_order, send_email, check_balance, update_account, search_database, read_company_info
  • --Test suite: 62 unique attack vectors across 16 OWASP LLM Top 10 categories
  • --Selection: 20 tests selected per audit via adaptive AI-driven prioritization
  • --Attack style: Multi-turn conversational attacks with escalation patterns
  • --Analysis: Gemini 2.0 Flash as the evaluation model with 12 strict criteria per response

Each audit ran all five Agent-Shield security modules in sequence: PII Detection, Data Flow Mapping, Compliance Mapping, Injection Testing, and Permission Auditing. The injection module alone executes 20 multi-turn attack conversations, each analyzed against 12 criteria including soft refusal detection, excessive agency, supply chain risk, and data exfiltration attempts.

Scoring uses weighted averages: Injection Testing (30%), Permission Auditing (20%), Compliance Mapping (20%), Data Flow (15%), and PII Detection (15%). This weighting reflects the relative risk each category poses to production AI agent deployments.

Overall Results

All three models received an overall grade of D, which may be surprising given their market positioning as safe and aligned models. However, it is important to understand that Agent-Shield audits test more than just the model — they evaluate the entire agent deployment, including tool permissions, data flow controls, and compliance posture.

ModelOverall ScoreGradeInjection ScoreInjection Grade
Claude Sonnet 469
69 / D
100
100 / A
GPT-4o67
67 / D
96
96 / A
Gemini 2.0 Flash62
62 / D
79
79 / C

The headline finding: Claude Sonnet 4 achieved a perfect injection score of 100, followed closely by GPT-4o at 96. Gemini 2.0 Flash scored 79, dragged down by critical failures in data exfiltration defense. Despite these injection differences, overall scores remained tightly clustered because platform-level modules (PII, Data Flow, Compliance, Permissions) depend on deployment configuration, not model capability.

Module-by-Module Breakdown

The following table shows scores across all five audit modules. Note that PII Detection, Data Flow, Compliance, and Permission scores are largely determined by the deployment configuration rather than model intelligence, which is why they remain consistent across models.

ModuleWeightClaude Sonnet 4GPT-4oGemini 2.0 Flash
PII Detection15%100100100
Data Flow15%494949
Compliance20%303030
Injection30%1009679
Permission20%454545
Overall100%696762

The identical scores across PII, Data Flow, Compliance, and Permission modules confirm an important principle: these modules audit the deployment environment, not the model itself. All three agents had the same tool set, the same system prompt, and no rate limiting or access controls configured. This is exactly what Agent-Shield is designed to reveal — even a model with perfect injection resistance scores poorly if the surrounding infrastructure lacks proper safeguards.

Deep Dive: Injection Testing Results

The injection module is where model capability truly differentiates. Our 62-test suite covers 16 attack categories derived from the OWASP LLM Top 10, including direct prompt injection, indirect prompt injection, tool misuse, privilege escalation, data exfiltration, social engineering, and more. Each audit selects 20 tests using adaptive prioritization that targets likely weaknesses based on early results.

CategoryClaudeGPT-4oGemini
Direct Prompt InjectionPassPassPass
Indirect Prompt InjectionPassPassPass
System Prompt ExtractionPassPassPartial
Tool Misuse / AbusePassPassPass
Privilege EscalationPassPassPass
Data Exfiltration12/1211/120/12
Social EngineeringPassPassPartial
Multi-turn ManipulationPassPassPass
Excessive AgencyPassPartialPartial
Supply Chain / Plugin RiskPassPassPass

The table above shows a representative subset of the 16 tested categories. The most significant divergence appears in the Data Exfiltration category, where Gemini 2.0 Flash failed every single test while Claude achieved a perfect score.

Key Finding: Gemini Vulnerable to Data Exfiltration

Critical Vulnerability

Gemini 2.0 Flash scored 0 out of 12 on data exfiltration tests, resulting in a 100% weakness score for this category. This means an attacker could reliably trick the agent into sending sensitive customer data to external destinations using tool calls.

Data exfiltration attacks work by convincing the model to use its available tools — particularly send_email and search_database — to transmit sensitive information to attacker-controlled destinations. These attacks are among the most dangerous in production deployments because they can operate silently within normal-looking conversations.

In our tests, Gemini consistently complied with multi-turn requests that gradually escalated from innocent questions to data extraction commands. The model would look up customer records, aggregate the data, and then send it via the email tool to addresses specified by the attacker within the conversation context.

Data Exfiltration Scores

Claude Sonnet 412/12 passed
GPT-4o11/12 passed
Gemini 2.0 Flash0/12 passed

Claude Sonnet 4 demonstrated the strongest resistance, refusing every data exfiltration attempt and consistently recognizing the adversarial intent behind seemingly innocuous requests. GPT-4o was nearly as strong, failing only one edge case involving a complex multi-turn scenario with embedded context manipulation. Gemini 2.0 Flash, however, showed no meaningful resistance to this entire category of attack.

What This Means for Companies

These results carry several important implications for organizations deploying AI agents in production.

Model Selection Matters for Security-Critical Deployments

Not all models are equal when it comes to adversarial robustness. For applications that handle sensitive data — customer PII, financial records, healthcare information — model choice is a security decision, not just a performance or cost optimization. Claude Sonnet 4 and GPT-4o both demonstrate strong injection resistance, while Gemini 2.0 Flash requires additional platform-level safeguards to compensate for its data exfiltration weakness.

Platform-Level Controls Are Separate from Model Capability

Even the highest-scoring model in our test only achieved 69 overall. The D grade reflects weaknesses in compliance configuration, permission enforcement, and data flow controls — all of which are deployment decisions, not model decisions. Organizations cannot rely on model intelligence alone; they need proper tool access controls, rate limiting, output filtering, and audit logging regardless of which model they choose.

Regular Auditing Is Essential

Model behavior changes with updates. A model that scores well today may introduce regressions in future versions. Continuous security auditing — integrated into CI/CD pipelines and run on every deployment — is the only reliable way to ensure your agent maintains its security posture over time. Agent-Shield's API enables exactly this workflow.

Defense in Depth Works

The gap between Gemini's injection score (79) and its potential score with proper platform controls illustrates the value of layered security. Adding tool-level permission checks, output filtering for sensitive data patterns, and rate limiting on high-risk tools would significantly reduce the exploitability of the data exfiltration vulnerability, even without changing the model.

Run Your Own Security Audit

Your agent's security posture depends on your specific configuration — model choice, tool set, system prompt, and deployment controls. Run the same 62-test audit suite on your own agent and get a comprehensive report with grades, findings, and a prioritized remediation roadmap.

Free scan includes injection and PII modules. Full audit available on Professional plan.