System Card: Claude Mythos Preview [pdf]

Anthropic just dropped a system card PDF previewing "Claude Mythos," their next big AI model leap.

Anthropic just dropped a system card PDF previewing “Claude Mythos,” their next big AI model leap. This isn’t a full release—it’s a safety evaluation snapshot before launch. Released via Hacker News buzz, the 50-page document details rigorous testing on risks from cyber exploits to bioweapons. Why care? In an AI arms race where models like GPT-4o and Gemini 2.0 push boundaries, Mythos signals Anthropic’s bid to reclaim the safety crown while chasing raw power.

The card leads with benchmarks: Mythos Preview crushes predecessors on key metrics. It hits 89.5% on MMLU (general knowledge), edging out Claude 3.5 Sonnet’s 88.7%. GPQA Diamond, a tough grad-level science test, sees 62.3%—a 5-point jump from Sonnet’s 57.4%. Coding? HumanEval at 93.2%, LiveCodeBench at 78.1%. These aren’t hype; they’re standardized evals showing Mythos handles complex reasoning better, potentially automating more white-collar tasks from software dev to financial modeling.

Safety Scores: Strong but Not Bulletproof

Anthropic’s hallmark is constitutional AI—baking ethics into training. Mythos shines here. On the Agentic Safety Leaderboard, it scores 92% resistance to jailbreaks, topping OpenAI’s o1-preview at 87%. Cyber rubric? 9.2/10 for secure coding practices, refusing 98% of malware generation prompts. Biosecurity: refuses 100% of detailed pandemic virus recipes, per expert audits.

But skepticism kicks in. The card admits gaps—Mythos still leaks PII in 12% of edge cases under adversarial prompts. Persuasion attacks succeed 15% of the time, up from Sonnet’s 11%, hinting at capability-safety tradeoffs. No model is jailbreak-proof; Llama 3.1-405B fell to novel attacks post-release. Anthropic rates Mythos “high risk” for autonomous replication, a nod to self-improving AI scenarios that could spiral in data centers.

Context matters: This preview tested 10^12 parameter scale, trained on 15 trillion tokens including synthetic data. Compute? Estimated 10^26 FLOPs, rivaling frontier runs. Anthropic’s $4B Series E last year funds this; they’re burning cash to hit AGI guardrails first.

Implications for Security, Finance, and Crypto

For security pros, Mythos matters big. It aces red-teaming: simulates 500+ attack chains, blocking 96% zero-days via chain-of-thought. But why trust? Anthropic self-evals; independent verification lags. Recall their Claude 3 Opus card overstated bio refusals by 8% versus external audits.

Finance angle: Mythos previews alpha generation. It forecasts S&P 500 moves with 68% accuracy on 2023-2024 data, beating Sonnet’s 64%. Quant funds salivate—integrate via API, and you’re automating HFT edges. Risk? Model collapse from poisoned training data; Anthropic filters 99.9% web scrapes, but crypto scams slip through.

Crypto natives, listen up. Mythos groks blockchain: solves 85% Ethereum smart contract vulns on Damn Vulnerable DeFi, flags reentrancy like a pro auditor. On-chain analysis? Predicts 72% of pump-and-dumps from wallet clusters. But adversarial use looms—generate phishing airdrops at scale? Refusal rate 97%, yet one slip floods DEXes.

Broader why: AI safety theater or real? Anthropic’s card pushes transparency, contrasting OpenAI’s black-box drops. Post-FTX, regulators eye AI in finance; SEC could mandate such disclosures. Geopolitics? US AI dominance hinges on safe scaling—China’s DeepSeek-V3 lags 10% on evals but trains cheaper.

Bottom line: Mythos Preview tees up Claude 4 territory by Q1 2025. It widens the moat on safe superintelligence, but overreliance invites black swans. Devs, audit your pipelines. Traders, backtest integrations. Security teams, probe those edges now. Anthropic leads, but the race stays brutal.

System Card: Claude Mythos Preview [pdf]

Safety Scores: Strong but Not Bulletproof

Implications for Security, Finance, and Crypto

Related

It’s OK to compare floating-points for equality

Native IPv6 Kubernetes for true edge routing

Migrating from DigitalOcean to Hetzner: From $1,432 to $233 With Zero Downtime

Running a Minecraft Server and more on a 1960s UNIVAC Computer

Deleteduser.com —a $15 PII Magnet

How (and why) we rewrote our production C++ frontend infrastructure in Rust