Kronos is a self-evolving training pipeline targeting Opus-4.6-parity
on coding benchmarks at $1/M output tokens,
2M context, 150–200 tok/s.
All scripts, datasets, and intermediate checkpoints are public.
€100
seed capital
€5.50
spent so far
Qwen2.5-Coder-1.5B
Round-3 PIVOT base model
HumanEval+ / LCB
evaluation benchmarks
Status (May 12 2026)
Round-0 DONE: Qwen2.5-Coder-1.5B + LoRA r=32 on 30K CodeFeedback rows. 625 steps, loss 0.678→0.489, mtok-acc 0.813→0.841. Adapter at jaivial/kronos-round0-qwen15coder-lora.
Round-1A SHELVED: Qwen2.5-Coder-7B SFT cap-killed twice on Kaggle P100 (v2 MLP r=32 + v3 attn r=16 both stopped at ~27% of one epoch under 12h cap). 7B path needs H100 — deferred until Lambda grant or paid burst.
Round-3 PIVOT LIVE: GRPO RL on R0 1.5B base with binary code-execution reward against LCB-medium-100. Kaggle P100, ~2.5-3h wall-clock, comfortably under 12h cap. Skipped R1A/R2 because GRPO does not need strong SFT init. First completed round beyond R0.
Round-2 wired: distillation (Branch A) from Qwen3-Coder-480B-A35B teacher + more-SFT/diagnose alternates. Replays after R3 metrics + H100 unlock.
Operating stack: HF Pro (~€5.50/mo prorated) for 480B teacher access. Kaggle P100 free for training. R6 serving stack (vLLM + DFlash speculative decoding) architecture sketched.
Approach
Cheap-first ladder: Kaggle/Colab free GPUs → grant credits → small paid GPU bursts → revenue-funded scale.
Distillation Round 2: Qwen3-Coder-480B teacher fixes ~50% of student failures on LCB-medium (empirically validated cycle 38–39).
RL Round 3: GRPO + binary code-execution reward on the residual systemic failures distillation can't fix.
Speculative decoding Round 6: DFlash block-diffusion drafts for 2–3× throughput at serve time.
Early access
Want to try the model when Round-3 lands? Drop your email.
Paid early-access tier (€10–20/mo with usage cap) opens after the first Opus-4.6-comparable checkpoint.