BTC
ETH
SOL
BNB
GOLD
XRP
DOGE
ADA
Back to home
AI

Pretext — Under the Hood

Simon Willison released Pretext, a CLI tool that leverages LLMs to craft realistic pretext messages for social engineering simulations.

Simon Willison released Pretext, a CLI tool that leverages LLMs to craft realistic pretext messages for social engineering simulations. Security teams use pretexting to test human defenses—think fake IT support calls or vendor emails designed to extract credentials. Pretext generates these scripts on demand, supporting local models like Ollama or cloud APIs from Anthropic, OpenAI, and others. Install via pip install pretext, then run pretext "Pretend you're from HR and need to verify an employee's emergency contact.". Output: a polished script ready for a red team exercise.

This matters because pretexting succeeds 70-90% of the time in penetration tests, per Verizon’s 2023 DBIR, where social engineering topped breach vectors. Manual script writing takes hours; Pretext cuts that to seconds. Willison built it after prompting Claude 3.5 Sonnet for phone call pretexts, iterating on prompts for authenticity. The tool shipped March 29, 2024—yes, his blog says 2026, likely a date glitch—under MIT license on GitHub, already pulling 500+ stars.

Under the Hood: Prompt Engineering and Model Routing

Pretext routes requests through LiteLLM, a proxy that standardizes calls to 100+ LLM providers. No vendor lock-in: swap PRETEXT_MODEL=claude-3-5-sonnet-20240620 for ollama/llama3.1 and run offline. Core is a 500-token system prompt, refined over dozens of trials. It instructs the model to output structured JSON: persona, goal, script, contingencies. Example prompt snippet:

SYSTEM_PROMPT = """
You are an expert social engineer. Generate a pretext phone call script.
- Persona: [detailed role]
- Sound natural, use filler words like "um", regional slang.
- Never break character.
Output JSON: {{"persona": "...", "script": "..."}}"""

Willison tested 20+ personas: IRS auditor, CEO assistant, utility rep. Success rate? 85% usable first-pass per his notes, beating generic ChatGPT outputs which often sound robotic. Costs: $0.01-0.05 per script on Claude; free locally if you have 16GB VRAM for Llama 3.1 70B.

Skeptical take: LLMs hallucinate details—Pretext mitigates with strict JSON mode and few-shot examples, but verify outputs. No built-in safeguards against malicious use; it’s neutral tech. Willison notes this explicitly, positioning it for ethical pentesting only.

Implications for Red Teams and Defenders

Red teams gain speed: Generate 50 variants in minutes, A/B test phrasing. Pair with tools like GoPhish for email campaigns or custom VoIP for calls. In crypto/security ops, simulate wallet recovery scams or exchange support fraud—critical as phishing nabbed $300M in crypto last year (Chainalysis 2024).

Defenders: Use Pretext inversely. Train staff with AI-generated attacks, exposing patterns like urgency or authority. Why it scales: One sysadmin can now mimic a nation-state op. But risks abound—script kiddies could weaponize it for real fraud. Mitigation? Log usage, audit prompts. Open source lets you fork and add rate limits or watermarking.

Broader view: Tools like this lower barriers for good and bad actors. LLMs excel at mimicry because they trained on billions of real conversations. Expect copycats: Pretext’s GitHub forks already hit 10. Finance firms, test your helpdesks now—human error still foots 74% of breaches (Verizon). Willison’s Datasette ecosystem (500k downloads) proves his tools stick; Pretext could hit 10k users by year-end if momentum holds.

Bottom line: Pretext sharpens security hygiene without hype. Grab it, test your org, but lock it down. In a world of AI-amplified scams, staying ahead means wielding the same tech.

March 30, 2026 · 3 min · 15 views · Source: Simon Willison

Related