LangSmith SDK users relying on output redaction face a critical data leakage risk. The redaction features—hideOutputs in JavaScript and hide_outputs in Python—fail to mask streaming token events. When large language models generate output in real-time chunks, each token lands in LangSmith traces as a raw new_token event. These events skip the redaction pipeline entirely, exposing full sensitive content despite redaction settings.
This affects every developer using LangSmith for tracing LLM applications with streaming enabled. LangSmith, built by the LangChain team, tracks runs by logging inputs, outputs, and events. Redaction normally scrubs the inputs and outputs fields before storage. But streaming bypasses this: the SDK appends unredacted tokens directly to the events array. Result? Your PII, API keys, or proprietary data appears in plain text in the trace logs.
Technical Breakdown
Both SDKs suffer the same flaw. In the JavaScript version, check traceable.ts lines 997-1003 and 1044-1050. Here, the code pushes raw kwargs.token values into new_token events during streaming loops. The prepareRunCreateOrUpdateInputs function later ignores the events array, redacting only top-level inputs and outputs.
Python mirrors this in run_helpers.py at lines 1924 and 1996. Streaming callbacks add unfiltered tokens to events, while _hide_run_outputs skips them. No configuration toggles this behavior—it’s baked into the SDK logic.
// JS example from traceable.ts (simplified)
if (streaming) {
for await (const chunk of stream) {
const event = { type: 'new_token', data: { token: chunk.token.text } }; // Raw token
run.events.push(event);
}
}
// Redaction later: only run.inputs and run.outputs processed
LangSmith processes over millions of traces daily across its user base, per public metrics from LangChain’s 2023 reports. Streaming dominates in production chatbots and agents, where low-latency output matters. This bug hits hardest there.
Why This Matters—and What to Do
Organizations treat LangSmith traces as auditable logs for debugging LLM chains. But unredacted events turn them into liability magnets. Imagine a healthcare app tracing patient queries: streaming responses with medical histories leak via events, violating HIPAA. Finance firms risk SEC scrutiny over exposed trade data. Even non-regulated users expose competitive edges if traces are subpoenaed or hacked.
LangSmith’s growth—backing 100,000+ developers as of mid-2024—amplifies the blast radius. The platform stores traces indefinitely unless users set retention policies (default: 30 days for paid tiers). Events persist in the UI, APIs, and exports, searchable by anyone with project access.
Fair assessment: LangChain moves fast, iterating on SDKs weekly. This isn’t malice; it’s an oversight in a complex streaming path. But it underscores a key risk: tracing tools aren’t bulletproof for sensitive workloads. Always audit what gets logged.
Immediate fixes:
- Disable streaming if redaction is critical, or post-process traces client-side to scrub events.
- Switch redaction to inputs-only if outputs aren’t sensitive—but that’s rare.
- Monitor LangSmith’s changelog; patch incoming via SDK updates (check
v0.1.xseries). - For high-stakes apps, self-host traces with tools like Phoenix or OpenTelemetry, adding custom filters.
Verify your setup with a test run:
# Python test
from langsmith import traceable
@traceable(hide_outputs=True)
def stream_llm(prompt):
for token in llm.stream(prompt): # Simulated stream
yield token
# Check trace.events for raw tokens
Bottom line: If you stream with LangSmith and expect redaction, you’re leaking data today. Patch now, audit past traces, and rethink blind trust in black-box tracers. In LLM ops, control your logs or they control you.