llm-echo 0.4

Simon Willison released version 0.4 of llm-echo on March 31, 2026. This debug plugin for his llm CLI tool now populates input_tokens and output_tokens fields in prompt responses. Developers get precise token counts without querying a real language model.

llm-echo mimics an LLM by echoing the input prompt as output. It skips actual inference, making it ideal for testing prompt formatting, chaining, or system behavior. The update delivers token metrics matching what providers like OpenAI or Anthropic report. For a prompt of 1,247 tokens, it returns exactly that in input_tokens, with output_tokens mirroring the echoed length.

Why Token Counts Matter Now

Token usage drives LLM costs. OpenAI’s GPT-4o charges $5 per million input tokens and $15 per million output tokens as of early 2026. A single debug run racks up negligible cost locally, but production prompts often exceed 10,000 tokens. Misjudging lengths leads to surprise bills—I’ve seen teams overspend by 30% from unoptimized chains.

This feature closes a gap. Before, developers estimated tokens via rough heuristics or paid API calls. llm-echo provides ground truth offline. Run

$ llm echo 'Your long prompt here' --tokens

and parse JSON output for exact figures. Integrate it into CI/CD pipelines to enforce token budgets before deployment.

Security angle: Debugging sensitive prompts over APIs risks data exposure. Governments and enterprises flag PII leaks from dev environments. llm-echo keeps everything local—no telemetry, no vendor logs. Pair it with llm‘s local models like Llama.cpp plugins for air-gapped testing.

Context in Willison’s Ecosystem

Simon Willison built llm as a lightweight CLI for 100+ models via plugins. Install with pipx install llm, add llm-openai or llm-groq, and query instantly. No web UIs, no subscriptions—pure Unix philosophy. llm-echo fits as the zero-cost baseline for troubleshooting.

Version history shows steady iteration: 0.1 echoed basics; 0.2 added JSON mode; 0.3 fixed streaming. 0.4’s token support aligns with provider APIs, easing hybrid local/cloud workflows. Willison’s plugins clock millions of downloads; llm powers Datasette’s AI features, querying SQLite-embedded LLMs.

Skepticism check: This isn’t revolutionary—libraries like tiktoken already count tokens. But llm-echo embeds it seamlessly in llm‘s response format, reducing context switches. No JavaScript bloat or cloud dependencies. In a field flooded with Vercel-hosted playgrounds, it prioritizes control.

Implications extend to optimization. Token trimming cuts costs 20-50% on long contexts. Use llm-echo to A/B test prompt variants: shorten instructions, remove fluff, measure delta. For RAG pipelines, validate chunking—ensure retrieved docs fit windows without truncation.

Finance tie-in: Enterprises model LLM spend via token projections. Tools like Helicone or LangSmith track usage post-facto; llm-echo enables preemptive auditing. At scale, saving 1,000 input tokens per query across 1 million daily calls saves $5,000 monthly on GPT-4o.

Broader trend: Open-source CLI tools reclaim LLM plumbing from SaaS giants. Willison’s stack—llm, litellm, sqlite-vss—runs on $5/month VPSes, sidestepping $20/user/month dashboards. As models commoditize, metering and debugging become the moat. llm-echo sharpens that edge.

Grab it via llm install llm-echo. Test your prompts today; watch the tokens drop.

Why Token Counts Matter Now

Context in Willison’s Ecosystem

Related

Highlights from my conversation about agentic engineering on Lenny’s Podcast

Welcome Gemma 4: Frontier multimodal intelligence on device

llm-gemini 0.30

Gemma 4: Byte for byte, the most capable open models

March 2026 sponsors-only newsletter

Falcon Perception