Rewriting Every Syscall in a Linux Binary at Load Time

Trail of Bits engineers recently demonstrated a technique to rewrite every syscall instruction in a Linux ELF binary at load time.

Trail of Bits engineers recently demonstrated a technique to rewrite every syscall instruction in a Linux ELF binary at load time. They patch the binary in memory before execution begins, replacing native syscall opcodes (0x0f 0x05) with jumps to custom hooks. This allows full interception of kernel calls without source code access or recompilation.

Why does this matter? Syscalls are the sole gateway between userland processes and the kernel—file I/O, network access, process creation all funnel through them. Intercepting them enables runtime policies that static analysis can’t touch. Traditional tools like seccomp filter syscalls prospectively but can’t inspect arguments deeply without extra overhead. Ptrace-based tracers like strace work but introduce massive slowdowns (10-100x). This method promises lower overhead by baking hooks directly into the code.

How It Works

The core is a custom dynamic loader hook using LD_PRELOAD and ELF parsing. On load:

Parse the ELF sections to identify executable code segments (.text, etc.).
Make pages writable via mprotect.
Scan for syscall instructions. Modern x86-64 uses syscall; older code might use int $0x80.
Replace with a 5-byte jump: compute hook address, craft jmp rel32.
Restore pages to read/execute.

They handle edge cases: position-independent code (PIC), RIP-relative addressing, and multi-byte sequences. For instance, syscalls often follow mov %rax, N where N is the syscall number (e.g., 0 for read, 56 for clone). Hooks decode this, log args, apply policies, then invoke the real syscall.

Proof-of-concept code is open-source on GitHub. Tests on real binaries like nginx showed <5% overhead on syscall-heavy workloads, versus 50%+ for ptrace. It works on glibc-linked apps, bypassing ASLR via pre-mapping.

Security Implications

This shines for containment. Imagine enforcing a binary’s syscalls match a whitelist—no surprise execve spawns. Crypto miners or ransomware often hide syscall chains; rewriting exposes them. Pair with eBPF for kernel-side validation, creating a user-kernel syscall firewall.

Real-world context: Linux containers rely on seccomp (300+ profiles in Docker). But escapes happen—CVE-2022-0492 abused fanotify. This adds a layer atop namespaces/cgroups. Enterprises could deploy it via custom distros, like grsecurity’s PaX but dynamic.

Skeptical take: Compatibility breaks fast. Stripped binaries? Fine. Go-linked with custom loaders? Nope. JIT code (browsers, VMs) evades scanning. Self-modifying code corrupts. amd64-specific; ARM/RISC-V need porting. Attackers reverse it—hooks become new targets. Still, for server binaries, it’s potent.

Performance data from their benchmarks:

Baseline nginx requests/sec: 125k
With hooks: 120k (-4%)
strace: 1.2k (-99%)

Numbers hold for CPU-bound; I/O might differ. Scales to microservices—rewrite at container init.

Bigger picture: Mirrors browser content security policies but for binaries. As zero-trusts deepen, expect forks into prod tools. Red teams: test evasion now. Blues: prototype for CI/CD scanning. OpenBSD’s pledge() approximates this statically; Linux catches up dynamically.

Why care beyond nerdery? Supply-chain attacks (SolarWinds, XZ Utils) thrive on trusted binaries. Rewriting neuters persistence. With 3.5B Android devices on Linux kernels, mobile potential looms if ported. Cost: engineering time. Benefit: defensible perimeters. Deploy wisely.

Rewriting Every Syscall in a Linux Binary at Load Time

How It Works

Security Implications

Related

Formal typing rules and presentation materials for Swift 6.2’s concurrency type system, focusing on Capability and Region

Direct Win32 API, Weird-Shaped Windows, and Why They Mostly Disappeared

Things you didn’t know about indexes

My PR has been waiting a year, or the exponential curve behind open source backlogs

Retrofitting JIT Compilers into C Interpreters

Modern Common Lisp with FSet