Why we’re rethinking cache for the AI era

Cloudflare's network handles 32% automated traffic, with AI bots driving over 10 billion requests per week.

Cloudflare’s network handles 32% automated traffic, with AI bots driving over 10 billion requests per week. This surge forces a redesign of CDN caching systems, as bots devour resources optimized for human users. A joint study with ETH Zurich researchers, published at the 2025 Symposium on Cloud Computing, quantifies the mismatch and sketches fixes. Website operators now balance serving AI for visibility—think e-commerce product pages in LLM results or docs in models—against protecting human performance.

Human traffic follows a power law: 80% of requests hit 20% of pages, mostly fresh and popular. Caches thrive here, storing hot items to slash origin fetches. AI bots invert this. They scan sites sequentially, grabbing rare pages like deep docs, images, or loosely related articles. Cloudflare data shows bots issue high-volume parallel requests, often exhausting rate limits. One response generation via RAG might pull from dozens of sources, uncorrelated and cold.

Cache Pollution Hits Hard

Standard caches use LRU eviction: least recently used items go first. Bots flood in misses for tail content, displacing human favorites. Result? Cache hit rates plummet for people. Cloudflare measured this: AI traffic alone drops human hit rates by up to 20% in mixed workloads. Storage bloats too—bots request full site crawls, not snippets.

Operators face a bind. Block bots? Lose AI indexing, future SEO, or “pay per crawl” revenue. Serve them? Human latency spikes, origin costs soar. Current tools like Cloudflare’s bot management help throttle, but don’t optimize cache for both. Skeptical note: Cloudflare profits from traffic volume, so expect upsell pitches. Still, the data holds—public traces from Common Crawl and their own logs confirm bots hit 10x deeper than humans.

Rethinking Cache for Dual Worlds

The ETH-Cloudflare paper proposes splitting caches: one for humans (popularity-based, LRU), another for AI (scan-tolerant, perhaps FIFO or size-aware). Or, smart eviction: tag requests by bot/human, prioritize human items. They simulate on real traces: a “dual-policy” cache boosts combined hit rates 15-30% over vanilla LRU.

Key idea—frequency-weighted eviction. Track access patterns per client class. Bots get a low-priority pool; eviction favors human recency. Early Cloudflare tests integrate this into their edge servers, claiming sub-50ms responses even under bot storms. Community directions: Open-source traces for benchmarking, new policies like “bot-bypass” where AI skips cache straight to origin (with allowances).

Why this matters: AI scrapers won’t slow. OpenAI’s GPTBot, Anthropic’s, and hundreds more crawl relentlessly. By 2026, projections hit 50% automated traffic. CDNs ignoring this lose customers to lag. Sites blocking AI risk irrelevance as search shifts to LLMs—Google’s AI Overviews already pull web data. Tune right, and you monetize bots via deals like Perplexity’s publisher payments.

Tradeoffs persist. Dual caches double storage needs; misclassify traffic, and gains vanish. Privacy angles: bots hoover personal data unless robots.txt enforced. Security too—aggressive crawlers probe for vulns. Operators, audit your logs: Cloudflare’s dashboard shows bot slices. Experiment with robots.txt blocks or paid tiers. The AI era demands caches that don’t pick sides—they serve both, or fail.

Why we’re rethinking cache for the AI era

Cache Pollution Hits Hard

Rethinking Cache for Dual Worlds

Related

Proton Meet Isn’t What They Told You It Was

Artemis II’s toilet is a moon mission milestone

SSH certificates: the better SSH experience

Adobe wrote to my hosts file

800 Rust terminal projects in 3 years

Significant progress made on Xbox 360 recompilation