CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production

Overview

Brex, the fintech company now operating as a subsidiary of Capital One, has open-sourced CrabTrap — an HTTP proxy designed to secure AI agents running in production environments. CrabTrap sits between an AI agent and its downstream HTTP targets, intercepting every outbound request and evaluating it against a user-defined policy using an LLM-as-a-judge model. Requests are either approved or blocked in real time, with decisions logged as either static rule matches or LLM judgements. The project addresses a widely acknowledged but under-solved problem: autonomous agents can take consequential real-world actions, yet most deployments lack runtime enforcement mechanisms to constrain them.

Technical Analysis

CrabTrap operates as a transparent HTTP proxy. When an agent issues an outbound request, CrabTrap intercepts it and passes the request context to a policy evaluation layer. This layer first checks static rules (pattern matching, allowlists, blocklists) for low-latency decisions. If no static rule matches, the request is forwarded to an LLM judge configured with the operator’s policy description, which then returns a block or allow verdict. This hybrid architecture balances the determinism of rule-based systems with the semantic flexibility needed to catch novel or ambiguous agent behaviours that rigid rules would miss.

The LLM-as-a-judge pattern is itself a meaningful design choice. It means CrabTrap can detect prompt injection-driven agent actions — for example, an agent coerced by malicious content in a document it processed into exfiltrating data to an attacker-controlled endpoint — that would bypass purely syntactic rule sets.

Framework Mapping

AML.T0051 (LLM Prompt Injection): CrabTrap’s core value proposition is intercepting agent actions that may result from injected instructions in the agent’s context window.
LLM08 (Excessive Agency): The tool directly mitigates excessive agency by enforcing boundaries on what actions an agent can take autonomously.
LLM01 (Prompt Injection): Runtime HTTP interception provides a compensating control when prompt injection reaches the action-execution stage.
LLM02 (Insecure Output Handling): Blocking outbound requests prevents insecure agent outputs from causing downstream harm.

Impact Assessment

The tool is most relevant to engineering teams operating LLM-based agents with broad tool access — web browsing, API calls, file operations, or inter-service communication. Without a runtime enforcement layer, a single successful prompt injection or jailbreak against an agent can translate directly into data exfiltration, unauthorised transactions, or lateral movement within connected systems. CrabTrap’s proxy model is infrastructure-agnostic and requires no changes to the agent itself, lowering the barrier to adoption significantly.

The LLM judge dependency also introduces a secondary risk surface: the judge itself could theoretically be manipulated or suffer from inconsistent verdicts, making policy design and judge prompt hardening critical operational concerns.

Mitigation & Recommendations

Deploy runtime proxies such as CrabTrap for any agent with external HTTP access, particularly those touching financial, identity, or data systems.
Define narrow, explicit policies rather than broad permissive defaults; prefer allowlist-over-blocklist architectures.
Harden the LLM judge prompt to resist adversarial inputs that could manipulate its verdicts.
Monitor and review LLM-decided blocks regularly to detect policy gaps or judge inconsistencies.
Combine with input-layer defences — CrabTrap operates at the output/action layer and should complement, not replace, prompt injection mitigations earlier in the pipeline.

References

CrabTrap on Brex
CrabTrap GitHub Repository (referenced in source)