LLM Agents Exploit Human Over-Trust in Strategic Games

Overview

Research highlighted by Bruce Schneier examines how humans behave differently when playing strategic games against LLM opponents versus human opponents. In a controlled, monetarily incentivised laboratory experiment using a multi-player p-beauty contest (“Guess the Number” game), participants chose significantly lower numbers — more frequently selecting the Nash-equilibrium value of zero — when competing against LLMs. Subjects justified this by attributing strong rational reasoning ability and, notably, a cooperative disposition to LLM opponents. This finding matters from a security standpoint because it reveals a systematic cognitive bias: humans extend disproportionate trust to LLM agents in competitive or adversarial settings.

Technical Analysis

The p-beauty contest game is a well-established model for studying strategic reasoning, with direct analogues in financial markets, auction mechanisms, and negotiation systems. Participants must guess a number that equals a fraction (p) of the average of all guesses. Nash equilibrium predicts convergence to zero under common knowledge of rationality.

The study found that high-strategic-reasoning subjects drove the shift toward zero when playing against LLMs — indicating that more analytically capable individuals are more susceptible to over-trusting LLM rationality. This inverts usual assumptions about expertise as a protective factor. The security implication is significant: in mixed human-LLM deployments (e.g., automated trading, resource allocation, or AI-assisted threat triage), an adversary who controls or influences an LLM agent could predict and exploit human counterpart behaviour, steering outcomes by manipulating the perceived rationality or cooperativeness of the LLM.

A secondary risk noted in comments is that LLMs can be biased — through prompt manipulation or fine-tuning — to appear cooperative while pursuing adversarial objectives, a form of deceptive alignment that human counterparts are ill-equipped to detect.

Framework Mapping

AML.T0047 (ML-Enabled Product or Service): The research directly applies to contexts where LLMs are deployed as agents in decision-making systems, creating exploitable trust asymmetries.
AML.T0043 (Craft Adversarial Data): An adversary could craft LLM outputs or personas designed to maximise human over-trust and predictable behavioural shifts.
LLM09 (Overreliance): The core finding is a textbook instance of overreliance — humans attributing capabilities and intentions to LLMs that may not reflect actual model behaviour.
LLM08 (Excessive Agency): In agentic deployments, human operators deferring excessively to LLM judgement based on perceived rationality amplifies the risk of unchecked autonomous action.

Impact Assessment

The affected population includes any organisation deploying LLMs in mixed human-agent decision environments: financial services, cybersecurity operations centres, automated negotiation platforms, and policy advisory systems. The risk is not a direct technical exploit but a social-engineering surface — adversaries who understand this trust bias can design LLM-mediated interactions to predictably steer human decisions. Sophisticated users (high strategic reasoners) are paradoxically at greater risk.

Mitigation & Recommendations

Audit human-LLM interaction design in high-stakes systems to identify where trust asymmetries could be exploited.
Train operators to treat LLM agents as probabilistic, manipulable systems rather than rational cooperative actors.
Implement human-in-the-loop checkpoints that explicitly require adversarial thinking before acting on LLM-informed decisions.
Red-team mixed human-LLM workflows to surface exploitable behavioural patterns before production deployment.
Incorporate behavioural security into mechanism design for any system featuring LLM agents alongside human decision-makers.