Want to write a compiler? Just read these two papers (2008)

In 2008, a Hacker News post declared that anyone could write a compiler by reading just two papers.

In 2008, a Hacker News post declared that anyone could write a compiler by reading just two papers. No textbooks, no PhD required—just digest those papers and code. The thread sparked debate, with some hailing it as a revelation, others calling hype. Sixteen years on, the core idea holds partial truth: compilers need not intimidate. A basic one for a toy language takes days for an experienced developer, not years. This matters because custom compilers let you sidestep bloated toolchains like GCC or Clang, craft domain-specific languages for crypto primitives, or build secure VMs for blockchain execution—reducing attack surfaces in high-stakes finance and security.

Compilers parse source code, analyze semantics, optimize, and emit machine code or bytecode. The HN post targeted two papers outlining this pipeline for a simple functional language. Though the excerpt omits titles, context points to works like Kent Dybvig’s nanopass framework papers around 2006-2008, which break compilation into tiny, verifiable passes. One key paper: “Restoring the Purity of Dynamic Regions with Continuations” no—more precisely, “The Nano-Pass Framework, Version 1.7” (2007), detailing a Scheme compiler in dozens of 50-100 line passes. Paired with a frontend like “A Realistic Compiler for R⁵RS Scheme” by Dybvig (1993, but influential), they provide a blueprint. Skeptical take: These cover mid-to-back end elegantly but skip full lexing/parsing details. You still need Pratt’s 1973 “Top-Down Operator Precedence” for practical recursive descent parsers used in 90% of hobby compilers.

Proof in Code: Minimal Compilers Today

Numbers prove simplicity. Bob Nystrom’s “Crafting Interpreters” (2021 book, but paper-like chapters) delivers two full compilers: a tree-walk interpreter (2k LOC Java) and a bytecode VM with compiler (8k LOC C). Run it on a simple Lox language—parsing expressions, statements, closures. Even tighter: 8cc, a C89 compiler by Long Cheng (2015), crams lexer, parser, type checker, optimizer, x86-64 backend into 5,432 LOC C. It bootstraps, passes Torture Test Suite subsets. For Lisp fans, Alex McLean’s “Write You a Compiler” tutorial yields one in Haskell under 1k LOC.

These undercut the myth. A 2023 survey by compiler hacker Andrei Navumovich lists 20+ under 10k LOC, targeting C, Lisp, Forth, Rust subsets. Forth compilers shine for minimalism: JonesForth (2011) implements a 16-bit x86 Forth in 581 assembly lines, self-hosting. Why fair skepticism? Production compilers like V8 (JavaScript) or rustc clock millions of LOC for optimizations, JITs, cross-compilation. Two papers get you 80%—lexical scoping, basic types, tail calls—but miss vectorization, garbage collection tuning, or WebAssembly output.

Implications for Tech, Finance, Crypto, Security

Build your own when trust is paramount. Finance firms embed DSLs for trading strategies; a custom compiler verifies syntax, prevents injection bugs. Crypto needs it more: verified compilers like CompCert (Coq-proven C compiler, 60k LOC) underpin seL4 microkernel and smart contract platforms. CakeML (2019) compiles HOL4 proofs to x86, used in Everest cryptography framework for TLS implementations—zero side-channel leaks from untrusted GCC.

In blockchain, eWASM (Ethereum’s WebAssembly) demands custom backends; Substrate (Polkadot) uses Rust but benefits from lightweight alternatives. Security angle: Nation-states backdoor open-source compilers (Schneier’s warnings on GCC). Roll your own for air-gapped firmware or custom RISC-V for secure enclaves. Cost? A solo dev prototypes in weeks; teams iterate optimizations yielding 2-5x speedups on domain code.

Bottom line: The 2008 post oversold—two papers spark insight, not mastery. Pair them with flex/bison or hand-rolled parsers, LLVM backend for free codegen, and test suites like SPECmini. Result: Empowering tool for anyone auditing codegens or prototyping languages. Skip if targeting C++20; dive in for control. HN was right on demystification, wrong on sufficiency.

Want to write a compiler? Just read these two papers (2008)

Proof in Code: Minimal Compilers Today

Implications for Tech, Finance, Crypto, Security

Related

Show HN: Prompt-to-Excalidraw demo with Gemma 4 E2B in the browser (3.1GB)

The Missing Bundler Features

Show HN: Run TRELLIS.2 Image-to-3D generation natively on Apple Silicon

Show HN: Faceoff – A terminal UI for following NHL games

Claude Desktop installs undocumented browser extensions for Chrome and other browsers

Theseus, a static Windows emulator