LIVE THREATS
ATLAS OWASP MEDIUM Moderate risk · Monitor closely RELEVANCE ▲ 7.2

CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production

TL;DR MEDIUM
  • What happened: Brex open-sources CrabTrap, an LLM-judge HTTP proxy that blocks unsafe AI agent actions in real time.
  • Who's at risk: Organisations deploying autonomous AI agents in production are exposed to uncontrolled agent actions without runtime enforcement layers like this.
  • Act now: Evaluate CrabTrap or equivalent proxy-based guardrails for any production agentic AI deployment · Define explicit allow/deny policies covering sensitive HTTP endpoints agents can reach · Audit existing agent architectures for excessive agency and lack of runtime action validation
CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production

Overview

Brex, the fintech company now operating as a subsidiary of Capital One, has open-sourced CrabTrap — an HTTP proxy designed to secure AI agents running in production environments. CrabTrap sits between an AI agent and its downstream HTTP targets, intercepting every outbound request and evaluating it against a user-defined policy using an LLM-as-a-judge model. Requests are either approved or blocked in real time, with decisions logged as either static rule matches or LLM judgements. The project addresses a widely acknowledged but under-solved problem: autonomous agents can take consequential real-world actions, yet most deployments lack runtime enforcement mechanisms to constrain them.

Technical Analysis

CrabTrap operates as a transparent HTTP proxy. When an agent issues an outbound request, CrabTrap intercepts it and passes the request context to a policy evaluation layer. This layer first checks static rules (pattern matching, allowlists, blocklists) for low-latency decisions. If no static rule matches, the request is forwarded to an LLM judge configured with the operator’s policy description, which then returns a block or allow verdict. This hybrid architecture balances the determinism of rule-based systems with the semantic flexibility needed to catch novel or ambiguous agent behaviours that rigid rules would miss.

The LLM-as-a-judge pattern is itself a meaningful design choice. It means CrabTrap can detect prompt injection-driven agent actions — for example, an agent coerced by malicious content in a document it processed into exfiltrating data to an attacker-controlled endpoint — that would bypass purely syntactic rule sets.

Framework Mapping

  • AML.T0051 (LLM Prompt Injection): CrabTrap’s core value proposition is intercepting agent actions that may result from injected instructions in the agent’s context window.
  • LLM08 (Excessive Agency): The tool directly mitigates excessive agency by enforcing boundaries on what actions an agent can take autonomously.
  • LLM01 (Prompt Injection): Runtime HTTP interception provides a compensating control when prompt injection reaches the action-execution stage.
  • LLM02 (Insecure Output Handling): Blocking outbound requests prevents insecure agent outputs from causing downstream harm.

Impact Assessment

The tool is most relevant to engineering teams operating LLM-based agents with broad tool access — web browsing, API calls, file operations, or inter-service communication. Without a runtime enforcement layer, a single successful prompt injection or jailbreak against an agent can translate directly into data exfiltration, unauthorised transactions, or lateral movement within connected systems. CrabTrap’s proxy model is infrastructure-agnostic and requires no changes to the agent itself, lowering the barrier to adoption significantly.

The LLM judge dependency also introduces a secondary risk surface: the judge itself could theoretically be manipulated or suffer from inconsistent verdicts, making policy design and judge prompt hardening critical operational concerns.

Mitigation & Recommendations

  • Deploy runtime proxies such as CrabTrap for any agent with external HTTP access, particularly those touching financial, identity, or data systems.
  • Define narrow, explicit policies rather than broad permissive defaults; prefer allowlist-over-blocklist architectures.
  • Harden the LLM judge prompt to resist adversarial inputs that could manipulate its verdicts.
  • Monitor and review LLM-decided blocks regularly to detect policy gaps or judge inconsistencies.
  • Combine with input-layer defences — CrabTrap operates at the output/action layer and should complement, not replace, prompt injection mitigations earlier in the pipeline.

References