LIVE THREATS
HIGH 2,000 AI-Built Apps Expose Corporate Data via Misconfigured Vibe-Coding Platforms // MEDIUM Anthropic Documents Sandbox Escape Risks and Credential Exfiltration Vectors in Claude … // HIGH ChatGPhish Exploit Turns ChatGPT Summarisation Into a Live Phishing Surface // HIGH LLMShare Campaign Weaponises ChatGPT Sharing Feature to Distribute Malware // MEDIUM Process-Level CAPTCHA Analysis Exposes Behavioural Fingerprints of AI Agents // HIGH Robinhood MCP Integration Grants AI Agents Autonomous Financial Trading Powers // HIGH Malicious npm Package Targets Claude AI Users via Supply Chain Attack // HIGH Multi-Agent LLM System Discovers 29 Zero-Day Vulnerabilities in Open-Source Projects // HIGH Russia-Linked GreyVibe Weaponises ChatGPT and Gemini Across Full Attack Lifecycle // HIGH Russian GreyVibe Group Weaponises ChatGPT and Gemini for Cyberespionage //
ATLAS OWASP MEDIUM Moderate risk · Monitor closely RELEVANCE ▲ 6.2

Human Trust of AI Agents

TL;DR MEDIUM
  • What happened: Humans systematically over-trust LLM agents in strategic games, defaulting to Nash-equilibrium play.
  • Who's at risk: Organizations deploying LLMs in mixed human-AI decision loops, especially high-stakes economic or security contexts where analytical staff are most vulnerable.
  • Act now: Audit human-LLM interaction protocols for over-trust bias in adversarial settings. · Add explicit adversarial red-teaming against LLM agents before deployment. · Train decision-makers to treat LLM partners as potentially uncooperative competitors.
Human Trust of AI Agents

Overview

Research highlighted by Bruce Schneier examines how humans behave differently when playing strategic games against LLM opponents versus human opponents. In a controlled, monetarily incentivised laboratory experiment using a multi-player p-beauty contest (“Guess the Number” game), participants chose significantly lower numbers — more frequently selecting the Nash-equilibrium value of zero — when competing against LLMs. Subjects justified this by attributing strong rational reasoning ability and, notably, a cooperative disposition to LLM opponents. This finding matters from a security standpoint because it reveals a systematic cognitive bias: humans extend disproportionate trust to LLM agents in competitive or adversarial settings.

Technical Analysis

The p-beauty contest game is a well-established model for studying strategic reasoning, with direct analogues in financial markets, auction mechanisms, and negotiation systems. Participants must guess a number that equals a fraction (p) of the average of all guesses. Nash equilibrium predicts convergence to zero under common knowledge of rationality.

The study found that high-strategic-reasoning subjects drove the shift toward zero when playing against LLMs — indicating that more analytically capable individuals are more susceptible to over-trusting LLM rationality. This inverts usual assumptions about expertise as a protective factor. The security implication is significant: in mixed human-LLM deployments (e.g., automated trading, resource allocation, or AI-assisted threat triage), an adversary who controls or influences an LLM agent could predict and exploit human counterpart behaviour, steering outcomes by manipulating the perceived rationality or cooperativeness of the LLM.

A secondary risk noted in comments is that LLMs can be biased — through prompt manipulation or fine-tuning — to appear cooperative while pursuing adversarial objectives, a form of deceptive alignment that human counterparts are ill-equipped to detect.

Framework Mapping

  • AML.T0047 (ML-Enabled Product or Service): The research directly applies to contexts where LLMs are deployed as agents in decision-making systems, creating exploitable trust asymmetries.
  • AML.T0043 (Craft Adversarial Data): An adversary could craft LLM outputs or personas designed to maximise human over-trust and predictable behavioural shifts.
  • LLM09 (Overreliance): The core finding is a textbook instance of overreliance — humans attributing capabilities and intentions to LLMs that may not reflect actual model behaviour.
  • LLM08 (Excessive Agency): In agentic deployments, human operators deferring excessively to LLM judgement based on perceived rationality amplifies the risk of unchecked autonomous action.

Impact Assessment

The affected population includes any organisation deploying LLMs in mixed human-agent decision environments: financial services, cybersecurity operations centres, automated negotiation platforms, and policy advisory systems. The risk is not a direct technical exploit but a social-engineering surface — adversaries who understand this trust bias can design LLM-mediated interactions to predictably steer human decisions. Sophisticated users (high strategic reasoners) are paradoxically at greater risk.

Mitigation & Recommendations

  • Audit human-LLM interaction design in high-stakes systems to identify where trust asymmetries could be exploited.
  • Train operators to treat LLM agents as probabilistic, manipulable systems rather than rational cooperative actors.
  • Implement human-in-the-loop checkpoints that explicitly require adversarial thinking before acting on LLM-informed decisions.
  • Red-team mixed human-LLM workflows to surface exploitable behavioural patterns before production deployment.
  • Incorporate behavioural security into mechanism design for any system featuring LLM agents alongside human decision-makers.

References

◉ AI THREAT BRIEFING

Stay ahead of the threat.

Twice-weekly digest of critical AI security developments — every story mapped to MITRE ATLAS and OWASP LLM Top 10. Free.

No spam. Unsubscribe anytime.