A lone developer dropped C89cc.sh, a 10,000-line POSIX shell script that compiles a subset of C89 code into self-contained x86-64 ELF64 executables. No external tools required—just run it with any compliant sh. It hit Hacker News yesterday, racking up 200+ points and dozens of comments in hours. This isn’t vaporware; it works on Linux, macOS, even WSL, producing binaries that run natively.
Why build this? Traditional compilers like GCC or Clang rely on massive toolchains— assemblers, linkers, libraries—that bloat installs and introduce trust issues. C89cc.sh strips it to bare metal: one file, portable across Unix-like systems. Download from GitHub (tkrajina/c89cc.sh), chmod +x c89cc.sh; ./c89cc.sh hello.c -o hello, and you get a 8KB “hello world” binary. No makefiles, no config.h fiddling.
Under the Hood
The script parses C89 syntax using shell pattern matching and state machines—no flex or yacc. It handles basics: integers, structs, functions, loops, conditionals. Outputs x86-64 assembly, then assembles and links ELF64 directly. ELF headers? Hand-crafted in shell variables. Relocations? Solved with arithmetic. The author, Tomas Krajina, spent years iterating; version 0.9.9 supports 80% of K&R C89 tests from the POSIX suite.
Here’s a minimal example:
#include <stdio.h>
int main() {
printf("Hello, Njalla.\n");
return 0;
}
Compile with:
$ ./c89cc.sh hello.c -o hello
$ ./hello
Hello, Njalla.
$ ls -l hello
-rwxr-xr-x 1 user user 8472 Oct 10 14:30 hello
It embeds printf stubs—no libc linkage. Full libc? Not yet; that’s future work.
Speed and Scale
Expect slowness. Compiling a 100-line program takes 2-5 seconds on a modern CPU. A 1,000-line sieve of Eratosthenes? 30+ seconds, outputs a 50KB binary running in 0.1s. Compare to GCC: near-instant compile, similar runtime. Benchmarks from HN: Fibonacci(35) computes in 1ms post-compile, matching clang -O0.
Limits bite hard. No floats, doubles, or long doubles. No threads, volatiles, or inline asm. Pointer arithmetic caps at 64-bit. Errors? Shell echoes like “syntax error at line 42: expected ‘;'”. Debug with -v for verbose parsing dumps. It targets Linux ELF64 only; no Windows PE or ARM.
Skeptical take: This shines for proofs-of-concept, not daily drivers. GCC’s maturity crushes it on features and speed. But auditability? You can grep -r the whole compiler. No hidden backdoors, no supply-chain risks like the 2024 XZ Utils incident.
Why This Matters
In security and crypto circles, trust-minimized tools rule. C89cc.sh delivers a verifiable compiler you can eyeball in vim. Run it air-gapped: boot a minimal Linux, scp the script, compile firmware. For devs in restricted envs—IoT, embedded, or corporate firewalls—it bypasses IT-approved compiler installs.
Educationally, it demystifies compilation. Shell forces explicit steps: lexing via case, parsing with nested loops, codegen as echo "mov rax, 42". HN commenters praise it as “shell scripting art”; others nitpick missing unions. Fork it: add ARM64 ELF? Port to dash? The repo invites contributions.
Broader context: We’re in a minimalism renaissance. Projects like TinyCC (50KB C compiler) and Cosmopolitan libc echo this. C89cc.sh pushes shell’s limits—proving POSIX sh handles Turing-complete tasks sans Python/Rust bloat. For finance/tech ops, imagine scripting crypto primitives: compile ECDSA verifiers on-the-fly, no deps.
Risks? Shell’s pitfalls—word splitting, globbing—lurk, but Krajina sanitizes inputs. Production? Stick to clang. Experiment? Grab it now. In a world of 100MB Docker images, 50KB self-hosting compilers remind us: simplicity scales.