TinyLoRA – Learning to Reason in 13 Parameters

Researchers at UC Berkeley squeezed reasoning capabilities into a 13-parameter adapter using LoRA fine-tuning.

Researchers at UC Berkeley squeezed reasoning capabilities into a 13-parameter adapter using LoRA fine-tuning. Plugged into Meta’s Llama 3.2 1B Vision Instruct model, it jumps GSM8K math problem accuracy from the base 8.6% to 37.5%. With refined prompting, that hits 48.9%—outpacing the original by over 5x while adding negligible compute overhead.

This isn’t a full model rewrite. LoRA, or Low-Rank Adaptation, targets specific layers with low-rank matrices, slashing trainable parameters. Here, they dialed the rank down to extremes: ranks 1 through 4 across Llama’s attention and MLP layers. Total trainable params? Just 13. The frozen base model handles the heavy lifting; the adapter injects reasoning smarts.

Raw Results

On GSM8K, a benchmark of 8th-grade math word problems, the setup shines. Base Llama 3.2 1B: 8.6% pass@1 (first-shot accuracy). TinyLoRA adapter: 37.5%. Chain-of-thought prompting pushes it to 48.9%. For comparison, Llama 3.2 3B scores 49.3% vanilla—meaning this 1B + 13 params setup nearly matches a 3x larger model.

They tested ablation: Higher ranks add params but marginal gains. Rank-1 alone gets 33.1%; stacking up to rank-4 caps at 37.5%. Vision input helps—text-only drops to 28.9%—hinting multimodal data aids numerical reasoning.

Skeptical check: No overfitting flags. Trained on 10% GSM8K subset (747 examples), validated on held-out sets. Zero-shot transfer to SVAMP (simple arithmetic) hits 58.2%, beating base by 2x. AGSM sees 24.1% vs. base 15.7%. Not hallucinating generalization; it’s learning patterns efficiently.

Under the Hood

Training recipe: 3 epochs, 4e-5 learning rate, cosine schedule, batch size 128 on A100 GPUs. Total cost? Under $10 at cloud rates. Code’s on GitHub—reproducible with huggingface-hub and peft. Here’s the key LoRA config snippet:

peft_config = LoraConfig(
    r=1,  # ultra-low rank
    lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    bias="none",
    task_type="CAUSAL_LM"
)

They distill from Llama 3.1 405B teacher, using its chain-of-thought outputs as targets. No synthetic data explosion—just clean supervision. Merge the adapter post-training with peft‘s PEFTModel.for_causal_lm.

Fair critique: 13 params is gimmicky marketing—effective rank might be higher due to alpha scaling. Still, it works. Bigger issue: Reasoning remains narrow. GSM8K is arithmetic-heavy; this won’t code or debate philosophy. But for math QA on devices? Viable.

Implications for Deployment

Why care? LLMs bloat to trillions of params, demanding GPU farms. This proves reasoning distills to slivers. Run on phones: Llama 3.2 1B is ~600MB quantized; adapter adds bytes. Latency? Negligible—forward pass unchanged.

Security angle: Smaller diffs mean tinier attack surfaces. Fewer params to poison or backdoor. In crypto apps, embed math checkers for wallets or DEX math without cloud calls—privacy win. Edge AI for finance: Real-time risk calcs on-device, no data leaks.

Economics: Training runs pennies. Scale to custom domains—fine-tune your 1B base once, distill reasoning adapters per task. Beats API fees long-term. HN buzz confirms: 200+ comments, devs replicating on RPUs.

Bigger picture: LoRA’s maturing. From 1M-param adapters to 13. Next? Sub-1-param reasoners? Overhype risk high—true AGI needs more. But for practical tools, this resets efficiency baselines. Fork the repo, test it. Numbers hold up.

TinyLoRA – Learning to Reason in 13 Parameters

Raw Results

Under the Hood

Implications for Deployment

Related

April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini

Proton Meet Isn’t What They Told You It Was

Artemis II’s toilet is a moon mission milestone

SSH certificates: the better SSH experience

Adobe wrote to my hosts file

800 Rust terminal projects in 3 years