Dagster’s I/O managers for DuckDB, Snowflake, BigQuery, and Delta Lake expose a SQL injection vulnerability through unescaped dynamic partition keys. Attackers with permission to add dynamic partitions can inject arbitrary SQL that runs under the I/O manager’s database credentials. Deployments using dynamic partitions face the highest risk; static or time-window partitions remain safe.
This flaw stems from direct string interpolation of partition key values into SQL WHERE clauses. No sanitization occurs before execution. Dagster, an open-source data orchestrator popular for asset-centric pipelines, relies on these I/O managers to materialize data into warehouses. Dynamic partitions let users define custom slices—like customer IDs or A/B test variants—beyond rigid time-based ones. But here, a malicious key like ' OR 1=1 -- dumps entire tables or worse.
Vulnerability Mechanics
The affected packages include dagster-duckdb, dagster-snowflake, dagster-gcp (covering BigQuery), and Delta Lake integrations. A user crafts a partition key embedding SQL payload during asset materialization or selection. The I/O manager then executes it verbatim against the backend database.
Exploitation requires the Add Dynamic Partitions permission. In Dagster OSS, anyone hitting the API endpoint can trigger this. In Dagster Cloud (Dagster+), it defaults to Editor roles and above. Patched versions—check the official advisory for exact pins like dagster-duckdb>=0.22.1 or equivalent—escape keys properly using backend-specific methods, such as parameterized queries.
Dagster disclosed this as a high-severity issue (CVSS likely 8+ given privilege escalation potential). No public exploits exist yet, but the vector is straightforward for insiders or compromised API keys.
Real-World Impact
Organizations running dynamic partitions should audit immediately. In Dagster OSS, exposure ties to API access controls. Dagster setups often run in air-gapped or trusted VPCs, where partition creators already hold DB roles via service accounts. This limits blast radius—why grant partition perms without DB access?
Dagster+ changes the calculus. Editors typically edit code and assets, implying table modify rights. But custom RBAC or multi-tenant setups decouple these. Here, a low-priv user escalates to read/write any data the I/O manager touches. Imagine a shared SaaS instance: one tenant’s analyst pivots to exfiltrate competitors’ PII or financials.
Why this matters: Data pipelines centralize crown-jewel assets. SQLi here isn’t just a leak; it’s RCE on your warehouse. BigQuery or Snowflake creds often span petabytes across prod/dev. In 2023, similar vulns in tools like Apache Superset and Metabase led to breaches costing millions. Dagster’s 10,000+ GitHub stars and enterprise adoption amplify stakes. Multi-tenancy grows—Dagster+ pitches it hard—exposing RBAC gaps. Review who holds Add Dynamic Partitions; revoke if DB access lags.
Skeptically, risk feels contained. Dagster docs stress trusted envs, and dynamic partitions aren’t default. Most users stick to cron-like time windows. No zero-days reported, and patches landed fast (within days of triage). Still, it underscores a pattern: data tools prioritize DX over input validation. Python’s f-strings tempt devs; ORMs like SQLAlchemy exist to prevent this.
Fix and Forward Defense
Update your Dagster library versions—no agent or daemon restarts needed. The patch swaps interpolation for escaped params across all I/O managers. If pinned versions block you, a workaround revokes Add Dynamic Partitions or validates keys client-side (see Dagster’s Gist for SQL regex filters).
pip install --upgrade "dagster-duckdb>=0.22.1" "dagster-snowflake>=0.22.1" "dagster-gcp>=0.22.1"
# Verify with `dagster --version` and test dynamic partition jobs
Beyond patches, harden: Enforce least-priv DB creds per I/O manager (read-only for selectors?). Audit RBAC weekly via Dagster’s instance API. Segment tenants with isolated instances. Monitor query logs in Snowflake/BigQuery for anomalies like massive dumps.
This vuln forces a reckoning on dynamic features in orchestrators. Airflow, Prefect, Mage—all flirt with custom metadata. As data teams scale to hundreds of users, perm silos erode. Dagster acted transparently; credit there. But users, own your exposures. Patch today, audit tomorrow.