First Look: JustVugg Releases NanoEuler GPT-2 Scale LLM Built in Pure C/CUDA

Capability Overview

NanoEuler is a GPT-2-class language model (~116M parameters) built entirely from scratch in C and CUDA — no PyTorch, no autograd, no third-party ML libraries. Released publicly on GitHub by JustVugg, it includes hand-written forward and backward passes, a byte-level BPE tokenizer, a from-scratch FlashAttention implementation, a pretraining pipeline, and a supervised fine-tuning (SFT) stage. RLHF and DPO are listed as planned additions. The model trains on a single RTX 4070, making it accessible to a wide range of actors.

For defenders, the security-relevant dimension is not the model’s capability ceiling — it is the architecture of the training stack itself. By eliminating all standard ML framework dependencies, NanoEuler represents a class of tooling that operates almost entirely outside the telemetry, logging, and supply chain controls that most organisations have built around PyTorch or TensorFlow ecosystems.

Attack Surface Analysis

Telemetry and supply chain blind spots. Standard ML security tooling — model signing, framework-level audit hooks, dependency scanning — assumes the use of known frameworks. A C/CUDA training stack produces artefacts that most existing controls are not instrumented to detect or attribute. This creates a meaningful gap for covert model training in environments where framework-based activity is monitored.

Surgical backdoor insertion. Full ownership of backpropagation logic means an adversary can introduce targeted weight perturbations or backdoor triggers with precision that is difficult to achieve when working around framework abstractions. The hand-written gradient flow is fully auditable by the attacker, making it easier to verify that malicious modifications survive training.

Low-cost SFT for safety removal. The included SFT pipeline, combined with the planned RLHF/DPO support, provides a ready-made infrastructure for fine-tuning models to strip safety behaviours. At GPT-2 scale, this is achievable on commodity hardware in hours.

Portable deployment in restricted environments. The C/CUDA codebase has minimal dependencies and compiles with a standard Makefile. This portability makes it a candidate for deployment in air-gapped research environments, exfiltrated toolchains, or insider threat scenarios where Python-based tooling would be flagged.

Framework Mapping

AML.T0018 (Backdoor ML Model): Full access to backprop logic enables precise backdoor injection without framework artefacts.
AML.T0020 (Poison Training Data): The bundled pretraining pipeline provides a direct interface for feeding poisoned corpora without intermediary framework validation.
AML.T0010 (ML Supply Chain Compromise): A redistributed or modified version of this codebase could serve as a trojanised training tool targeting downstream model consumers.
AML.T0031 (Erode ML Model Integrity): SFT and planned RLHF pipelines are directly usable for iterative safety degradation.
LLM03 (Training Data Poisoning) and LLM05 (Supply Chain Vulnerabilities) are the primary OWASP mappings given the training infrastructure focus.

Threat Scenarios

Scenario 1 — Insider fine-tuning: A privileged insider with GPU access clones NanoEuler, fine-tunes a GPT-2-scale model on proprietary internal data, and exfiltrates the resulting weights. The training run leaves no PyTorch process logs or pip install artefacts.

Scenario 2 — Trojanised toolchain distribution: A threat actor forks NanoEuler, introduces a subtle modification to the BPE tokenizer or weight serialisation code, and promotes the fork through developer communities. Researchers who train on the modified stack produce models with embedded backdoor triggers they cannot easily detect.

Scenario 3 — Safety stripping at scale: An adversary uses the SFT pipeline to fine-tune a base model checkpoint on adversarial instruction data, producing a safety-stripped variant deployable via standard GGUF/ONNX conversion. The entire pipeline runs without any framework that existing monitoring tools would flag.

Defender Checklist

Extend model provenance and supply chain controls to cover models trained outside PyTorch/TensorFlow — treat C/CUDA training outputs as requiring equivalent scrutiny
Audit internal GPU environments for non-framework training processes (CUDA kernels running without associated Python processes)
Review whether your model integrity tooling (e.g., weight hashing, signing) applies to models regardless of training stack
Assess SFT and RLHF pipelines — including open-source ones — as potential vectors for safety degradation in your model deployment lifecycle
Monitor GitHub and derivative repositories for forks of minimal LLM stacks that introduce non-obvious code changes to tokenizers or weight serialisation

References

NanoEuler GitHub Repository