Attackers can inject arbitrary XML markup into documents generated by @xmldom/xmldom, a widely used Node.js XML DOM library. The flaw stems from improper handling of the CDATA terminator ]]> in CDATA sections. When user input containing this sequence enters a CDATASection node, the library’s XMLSerializer outputs it verbatim. This prematurely closes the CDATA section, interpreting the trailing payload as active XML elements or attributes.
This vulnerability affects versions up to at least 0.9.8, including related packages like xmldom and @xmldom/xmldom. Developers often embed untrusted input—such as user comments, logs, or API payloads—inside CDATA sections assuming they stay as plain text. That’s a false sense of security here. Downstream XML parsers treat the injected content as structure, potentially bypassing validation or triggering unintended logic.
Attack Vectors
The primary entry point is Document.createCDATASection(data), which accepts any string without checking for ]]> (lines 2216–2221 in lib/dom.js v0.9.8). But the WHATWG DOM spec skips validation in mutation methods, opening wider paths:
CharacterData.appendData()CharacterData.replaceData()CharacterData.insertData()- Direct assignment to
.data - Direct assignment to
.textContent
Note: .nodeValue assignments don’t update .data, so the serializer ignores them. Parsing existing XML with CDATA is safe—the SAX parser’s regex stops at the first ]]>, stripping it clean. The risk hits only during creation and serialization of new documents.
Proof-of-concept exploit is straightforward. Create a CDATA section with payload like safe text]]><evil>admin</evil><![CDATA[more text. Serialize it, and the output becomes:
<!CDATA[safe text]]><evil>admin</evil><![CDATA[more text]]>
The injected <evil> element now lives in the XML, ready for exploitation.
Impact and Why It Matters
Node.js apps handling XML integrations—think SOAP APIs, RSS feeds, XML exports, or enterprise data exchanges—face the biggest exposure. A 2023 npm audit shows xmldom downloaded over 10 million times weekly at its peak, with forks like @xmldom/xmldom still active despite deprecation warnings. Many legacy systems rely on it for its browser-like DOM API.
Consequences include:
- Business logic bypass: Inject flags like
<approved>true</approved>to approve fraudulent transactions or escalate privileges. - Integrity breaks: Corrupt exported reports or feeds, misleading analytics or automated decisions.
- XXE amplification: While not directly XXE, injected entities could chain into parser flaws elsewhere.
This isn’t theoretical. XML underpins finance (FIXML, FpML), healthcare (HL7 CDA), and e-commerce (ebXML). An attacker with input control—via forms, APIs, or file uploads—turns “safe” text fields into markup bombs. Skeptically, XML’s complexity invites these bugs; libraries like xmldom prioritize compatibility over strictness, but standards demand ]]> rejection or escaping (per MDN and XML spec). Fair point: native browsers handle this correctly via validation.
Scan your deps. Snyk rates this HIGH (CVSS likely 7.5+ for injection). Affected code looks like:
const doc = new DOMParser().parseFromString('<root/>', 'text/xml');
const cdata = doc.createCDATASection(userInput); // Vulnerable if userInput has ]]>
doc.documentElement.appendChild(cdata);
const serializer = new XMLSerializer();
const xml = serializer.serializeToString(doc); // Boom: markup injected
Root Cause and Fixes
Core issues in lib/dom.js:
createCDATASectionappends raw data (lines 2216–2221).- Serializer dumps
node.datawithout escapes (lines 2919–2920):+ node.data +.
Patched versions (post-0.9.8 in some forks) throw InvalidCharacterError on ]]> in createCDATASection. But mutations like appendData may still slip through—test thoroughly. Migrate to alternatives: fast-xml-parser (faster, safer), xml2js (event-based), or native libxmljs bindings.
Immediate mitigations:
- Validate/sanitize input before CDATA: reject or replace
]]>. - Serialize with custom handlers that escape terminators.
- Audit with
npm auditor Snyk; pin to fixed versions if available.
Bottom line: If your app serializes XML with user data in CDATA, assume compromise until patched. This exposes how fragile XML remains in 2024—treat it like user-generated HTML, never trust the wrappers.