A new Dockerfile and Rust tool deliver the most complete Git history of the Linux kernel to date. This setup clones the official kernel.org history repository, then fixes its flaws: it swaps deprecated grafts for modern git replace refs and injects over a dozen missing tags. The result? A repo tracing back to Linus Torvalds’ first commit in 1991—v0.01 on September 17—without the hacks that plague existing options.
Standard Linux kernel Git repos, like the mainline at git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git, start at 2.6.12-rc2 in March 2005. That’s 14 years and thousands of commits missing. The official full-history repo at git.kernel.org/pub/scm/linux/kernel/git/history/history.git reconstructs the pre-2005 era from CVS, BitKeeper exports, and manual merges. But it relies on git grafts, removed in Git 2.20 (2018), which fakes parent links without true objects. This breaks tools like git bisect across eras and ignores some tags.
Key Fixes in This Build
This solution starts by cloning the kernel.org history repo—already 5.2 GB compressed, expanding to over 20 GB. It then runs a Rust binary (rename-tags) to create git replace refs. These point to synthetic commits that properly chain the old history, preserving object integrity. Unlike grafts, replace refs work with Git’s object database and survive clones.
The Rust code also adds missing tags, like v0.10 (1992), v0.99 (1993), and early 1.x releases up to 1.0 (1994). A StackOverflow comparison (here) shows alternatives like shallow clones or manual patches fall short. This tool claims superiority by automating the lot in a reproducible Docker container.
Here’s the core Rust logic from rename-tags/src/main.rs. It wraps Bash for Git ops with strict error checking—noisy failures if stdout/err isn’t empty:
fn bash_empty(cmd: &str) {
let output = std::process::Command::new("bash")
.arg("-c")
.arg(format!("set -e; set -o pipefail; {cmd}"))
.output()
.unwrap();
if !output.status.success() || !output.stdout.is_empty() || !output.stderr.is_empty() {
eprintln!("bash_empty failed. command: [{cmd}]");
eprintln!("stdout: [{}]", String::from_utf8(output.stdout).unwrap_or("(not UTF-8)".to_string()));
eprintln!("stderr: [{}]", String::from_utf8(output.stderr).unwrap_or("(not UTF-8)".to_string()));
panic!("bash_empty failed");
}
}
fn bash_get_out(cmd: &str) -> String {
let output = std::process::Command::new("bash")
.arg("-c")
.arg(format!("set -e; set -o pipefail; {cmd}"))
.output()
.unwrap();
if !output.status.success() || !output.stderr.is_empty() {
eprintln!("bash_get_out failed. command: [{cmd}]");
eprintln!("stdout: [{}]", String::from_utf8(output.stdout).unwrap_or("(not UTF-8)".to_string()));
eprintln!("stderr: [{}]", String::from_utf8(output.stderr).unwrap_or("(not UTF-8)".to_string()));
panic!("bash_get_out failed");
}
return String::from_utf8(output.stdout).expect("Not UTF-8");
}
fn dollar(cmd: &str) -> String {
return bash_get_out(cmd).strip_suffix('\n').unwrap().to_string();
}
The full binary likely iterates over tag lists (e.g., via git tag --list), computes SHA1s for replacements, and tags them. Without the complete Dockerfile or main loop, replication needs the source repo—assume it’s public.
Why This Matters—and Skeptical Caveats
Full history unlocks real analysis. Security researchers can bisect vulnerabilities back to 1991, tracking exploits like early buffer overflows in v0.12. Historians map contributor evolution: 1.0 had 100+ patches; today’s mainline exceeds 1 million commits from 15,000+ developers. Legal teams verify licenses—old code mixed GPL versions. Tools like git blame or git log --follow work seamlessly across 30+ years.
Implications hit finance/crypto/security: Kernel devs audit supply-chain risks in ancient drivers still lurking in enterprise forks. Blockchain nodes running Linux need verifiable history for compliance audits. But skepticism: Does it add all missing tags? Kernel.org lists 1,200+ tags; verify diffs post-build. Size balloons to 30+ GB—needs beefy storage. And git replace alters history subtly; share with git bundle or filters to avoid misleading clones.
Build it yourself: Spin up Docker, mount a volume for the repo. Test with git log --graph --oneline v0.01..v6.10—you’ll see unbroken chains. Beats manual hacks. If you’re digging kernel forensics, this is your baseline. Fork, audit, improve.