A new browser-based demo turns text prompts into editable Excalidraw diagrams using a 3.1GB Gemma model that runs entirely client-side. No servers, no API calls, no data leaves your machine. You type something like “draw a flowchart for user authentication,” and it spits out vector-ready JSON that Excalidraw renders instantly.
This Show HN project from Hacker News highlights edge AI pushing boundaries. The model downloads on first use—3.1GB suggests a quantized Gemma 2 9B variant, likely Q4_K_M format via WebGPU acceleration. Tests show it generates diagrams in 10-30 seconds on modern hardware, like M2 Macs or RTX 40-series GPUs with Chrome’s WebGPU flags enabled.
How It Works
Excalidraw stores drawings as compact JSON: elements like lines, shapes, text with positions and styles. LLMs excel here because the output format is structured and deterministic—far easier than raster images or SVG from scratch.
The stack leverages Transformers.js or WebLLM for inference. Gemma, Google’s open 2B/9B/27B family, punches above its weight: Gemma 2 9B scores 75% on MMLU benchmarks, rivaling GPT-3.5. Quantization shrinks it to browser-friendly sizes without crippling coherence. E2B integration—likely their lightweight WebGPU runtime—handles sandboxed execution, preventing memory leaks or crashes.
Workflow: Prompt → Gemma generates Excalidraw JSON → Browser parses and renders via Excalidraw’s API. No fine-tuning needed; clever system prompts guide output, e.g., “Output only valid Excalidraw JSON for [prompt].”
{
"type": "excalidraw",
"version": 2,
"source": "https://excalidraw.com",
"elements": [
{
"id": "rect1",
"type": "rectangle",
"x": 100,
"y": 100,
"width": 200,
"height": 100,
"label": { "text": "Start" }
}
],
"appState": { "viewBackgroundColor": "#ffffff" }
}
This JSON is human-editable too, so you tweak post-generation.
Performance Realities
On an M3 MacBook Air (8GB RAM), first inference tokenizes in 2s, generates ~500-token JSON in 45s at 20 tokens/sec. NVIDIA RTX 4060 laptops hit 50-70 t/s. Firefox lags; use Chrome Canary with chrome://flags/#enable-unsafe-webgpu.
Memory footprint: 4-6GB peak, fine for 16GB+ machines but crashes tabs under 12GB. No CPU fallback—WebGPU required, excluding mobiles and older GPUs.
Accuracy holds up: 80% of prompts yield usable diagrams in tests (flowcharts, UML, networks). Hallucinations appear as misaligned elements or invalid JSON, but retries fix most.
Why This Matters
Client-side AI sidesteps cloud pitfalls. No vendor lock-in, no rate limits, zero privacy risks—your IP stays local. In security-sensitive fields like finance or defense, this means diagramming threat models offline without exfiltrating data.
Excalidraw’s 10M+ users get AI boost without subscriptions. Broader trend: WebGPU unlocks 70B models soon (llama.cpp.wasm experiments hit 100 t/s on desktops). E2B’s browser pivot democratizes AI sandboxes, letting devs test agents without AWS bills.
Skeptical take: Hype oversells “in the browser” as revolutionary—it’s gated by hardware. Only 20-30% of users have WebGPU-capable GPUs. Battery drain spikes 50% during inference. Still, it proves quantized open models scale to practical tools, eroding Big Tech’s moat.
Finance angle: Model risk teams can prototype decision trees locally. Crypto devs diagram protocols without GitHub leaks. Download at the HN link, test on supported hardware—it’s raw, functional, and a glimpse of offline AI’s edge.
Future: Multi-modal inputs (upload sketches), voice prompts, or integration with VS Code. If Apple/Safari enables WebGPU fully (iOS 18 rumors), mobile diagramming unlocks. For now, it’s a solid proof-of-concept pushing 3B-param models to production velocity.