Overview
The TeamPCP hacker group has advertised nearly 450 internal Mistral AI repositories for sale on a hacker forum, demanding $25,000 or threatening a full public leak within one week. The breach, confirmed by Mistral AI, originated from the broader Shai-Hulud software supply chain attack — an incident that initially compromised official packages from TanStack and Mistral AI via stolen CI/CD credentials, before cascading across hundreds of projects on the npm and PyPI registries, including UiPath, Guardrails AI, and OpenSearch.
The stolen data reportedly totals approximately 5 gigabytes and covers repositories related to model training, fine-tuning, benchmarking, model delivery, and inference pipelines — assets that represent significant intellectual property and potential attack surface intelligence for adversaries.
Technical Analysis
The attack chain began with the compromise of CI/CD credentials, allowing threat actors to inject malicious code into legitimate Mistral AI and TanStack packages distributed via npm. A developer device at Mistral AI was subsequently impacted when the poisoned packages were pulled into internal workflows, granting TeamPCP access to the codebase management system.
From that foothold, the attackers exfiltrated internal repositories through what appears to be legitimate developer access pathways — a technique consistent with MITRE ATLAS AML.T0012 (Valid Accounts) combined with AML.T0010 (ML Supply Chain Compromise). The SDK packages were contaminated for a brief window before Mistral detected and remediated the compromise.
Mistral’s forensic investigation concluded that core model repositories, hosted services, managed user data, and research environments were not exfiltrated. However, the stolen repositories covering training and fine-tuning pipelines could provide adversaries with sufficient detail to craft targeted poisoning or evasion attacks against Mistral’s model ecosystem.
Framework Mapping
- AML.T0010 (ML Supply Chain Compromise): The attack vector was explicitly a compromised package registry supply chain affecting CI/CD pipelines.
- AML.T0044 (Full ML Model Access): Stolen repositories covering training, fine-tuning, and inference pipelines approach full model lifecycle access.
- AML.T0057 (LLM Data Leakage): Internal source code and pipeline data were exfiltrated from a leading LLM provider.
- LLM05 (Supply Chain Vulnerabilities): The root cause is a compromised dependency in the software supply chain.
- LLM10 (Model Theft): The advertised sale of model-adjacent code constitutes an attempted intellectual property theft.
Impact Assessment
The immediate impact is on Mistral AI’s competitive position and security posture — exposure of training and fine-tuning pipeline code could enable competitors or adversaries to replicate proprietary methodologies. Downstream SDK consumers face residual risk from the brief contamination window. The broader Shai-Hulud attack affected hundreds of projects, meaning the blast radius extends well beyond Mistral AI itself. The extortion model — pay or face public leak — creates urgency that could pressure organisations into poor decisions.
Mitigation & Recommendations
- Audit SDK dependencies: Any project consuming Mistral AI SDK packages should verify package integrity against known-good hashes and update to post-incident releases.
- Rotate CI/CD credentials immediately: All credentials associated with affected pipelines should be considered compromised and rotated.
- Implement package signing and verification: Adopt Sigstore or equivalent signing for all published packages to detect future tampering.
- Monitor for leaked repository use: Track dark web forums and threat intelligence feeds for evidence of the stolen repositories being used to craft targeted attacks.
- Audit developer device access: Restrict developer machines from direct access to production codebase management systems without additional authentication controls.