How NASA built Artemis II’s fault-tolerant computer

NASA's Orion spacecraft for Artemis II relies on a fault-tolerant computer system engineered to survive the harsh radiation of space.

NASA’s Orion spacecraft for Artemis II relies on a fault-tolerant computer system engineered to survive the harsh radiation of space. Launching no earlier than September 2025, Artemis II will send four astronauts—Reid Wiseman, Victor Glover, Christina Koch, and Jeremy Hansen—on a 10-day mission orbiting the Moon. The flight computers must operate flawlessly for 21 days from launch to splashdown, handling navigation, life support, and abort systems. A single failure could doom the crew.

The core is five identical flight computers, each built by Lockheed Martin and powered by BAE Systems’ RAD750 processors. These 200 MHz PowerPC 750 derivatives, rad-hardened to 1 Mrad(Si), trace roots to 1990s tech but excel in space. Each computer packs 256 MB ECC SDRAM, 64 MB flash, and interfaces for MIL-STD-1553 buses. Cost per RAD750? Around $200,000—pricey, but radiation flips bits on Earth chips instantly.

Fault Tolerance in Action

Redundancy rules here. Four computers run as primaries in two dual-string pairs, with a fifth as hot backup. They synchronize instructions in lockstep, executing the same code cycle-by-cycle. Cross-strapping links them: outputs from one feed into others for majority voting. If two agree and one disagrees—likely a single-event upset (SEU) from radiation—the outlier gets sidelined. The system detects faults in under 100 milliseconds, reconfiguring without halting.

Memory adds protection. ECC corrects single-bit errors; scrubbing rewrites suspect blocks proactively. Processors include built-in monitors for lockstep divergence, parity checks on registers, and memory protection units. NASA tested this via 10 billion hours of simulated faults, injecting SEUs at CERN and using cobalt-60 sources. Real-world proof? It inherits from Orion’s uncrewed EFT-1 in 2014 and Artemis I in 2022, both flawless despite Van Allen belts.

This beats simple duplication. Triple modular redundancy (TMR) at the processor level catches transient faults—90% of space issues—without full reboots. Permanent failures trigger lane swaps. Software, VxWorks-based, partitions critical functions: guidance runs on one lane, comms on another.

Why This Matters Beyond the Moon

NASA’s approach prioritizes determinism over speed. RAD750 lags modern CPUs by decades—no multi-core, no SIMD—but predictability trumps flops in life-critical ops. Contrast SpaceX’s Starship: they use rad-tolerant ARM chips with software mitigation, slashing costs 100x but raising risks. NASA’s conservatism delayed Orion by years and ballooned costs to $20B+, yet it sets the bar for humans in deep space.

Implications ripple to high-stakes fields. Finance HFT firms mirror this with triple-redundant feeds and FPGA lockstep to avoid flash crashes—think Knight Capital’s $440M loss in 45 minutes. Crypto validators use Byzantine fault tolerance akin to TMR, tolerating 33% node failures in consensus. Security enclaves like Intel SGX employ voting for tamper resistance.

Skeptically, it’s overkill for LEO but essential for lunar trips. Radiation doses hit 1-2 krad on Artemis II, flipping bits hourly. Lessons? Off-the-shelf won’t cut it unhardened—invest in redundancy early. As Artemis III lands in 2026, this system’s scalability to Mars (300x radiation) will test its limits. Failures there aren’t recoverable.

In sum, NASA’s feat isn’t flashy AI or quantum—it’s boring, robust engineering ensuring humans return alive. That reliability blueprint applies wherever downtime kills: markets, blockchains, or beyond.

How NASA built Artemis II’s fault-tolerant computer

Fault Tolerance in Action

Why This Matters Beyond the Moon

Related

PgQue: Zero-bloat Postgres queue

Some secret management belongs in your HTTP proxy

The electromechanical angle computer inside the B-52 bomber’s star tracker

Why Japan has such good railways

Opus 4.7 to 4.6 Inflation is ~45%

Optimizing Ruby Path Methods