Developers waste hours chasing flaky tests or performance regressions that aren’t binary pass/fail. Enter git_bayesect, a Python tool that applies Bayesian inference to git bisect. It pinpoints the commit where an event’s likelihood shifts—say, a test flaking 1% of the time jumps to 20%—without needing prior probabilities. You just label observations as “fail” or “pass,” and it greedily selects commits to minimize expected posterior entropy.
This matters because standard git bisect assumes deterministic changes: good or bad commit. Real-world codebases deal with stochastic behaviors—network timeouts, race conditions, resource exhaustion. Traditional bisect fails here, forcing manual runs of tests dozens of times per commit or eyeballing logs. git_bayesect models failure rates via Beta-Bernoulli conjugacy, updating posteriors efficiently even with unknown baselines. In a demo repo, it isolated a failure probability change from 0.1 to 0.9 across 100 commits in under 10 test runs.
Installation and Basic Usage
Install via pip install git_bayesect or uv tool install git_bayesect. It integrates as git subcommands, storing state in .git/bayesect.json.
Start with boundaries:
git bayesect start --old <known-good-commit> --new HEAD
Observe on current checkout:
git bayesect fail # or pass
Or label remotely:
git bayesect pass --commit <hash>
Status shows posterior probabilities for each commit being the change point, plus entropy. It checks out the highest-information commit next. Reset with git bayesect reset or undo last observation with git bayesect undo.
Automate with git bayesect run <test-command>, which loops: checkout, test, label based on exit code (nonzero=fail), until convergence (entropy < threshold or max iterations).
Priors, Automation, and Skeptical Assessment
By default, uniform priors (alpha=1, beta=1) assume ignorance. Tweak per-commit:
git bayesect prior --commit <hash> --alpha 10 --beta 1 # favors "fail" side
Bulk priors from filenames or commit text/diff via Python callbacks. Example for suspicious files:
git bayesect priors_from_filenames --filenames-callback "return 10 if any('timeout' in f for f in filenames) else 1"
This biases toward commits touching risky code, accelerating search in large histories. Log commands for reproducibility: git bayesect log.
Under the hood, it discretizes the change point across your DAG path, computes exact posteriors via conjugate updates—no MCMC slowdowns. Greedy entropy search prunes the space efficiently, often converging in log(N) + small constant tests, where N is commits.
Fair skepticism: It shines for monotonic likelihood shifts (e.g., test got flakier post-commit). Multiple change points or non-Bernoulli events (like multi-modal distributions) confuse it—posteriors spread out, wasting tests. No visualization; you interpret JSON stats. Compared to git bisect --run with averaged runs, it’s smarter but assumes independence per test. In benchmarks from the author’s write-up, it used 30% fewer evaluations than naive binary search on probabilistic data.
Why adopt? CI/CD pipelines run thousands of tests daily; flaky ones burn engineer time. Integrate into scripts for perf cliffs (time <threshold? pass). Open-source at hauntsaninja/git_bayesect; demo repo proves it on synthetic history tweaking a Python script’s random.random() < p failure rate. For security-sensitive repos, priors from static analysis could flag vuln-introducing commits probabilistically.
Bottom line: If your bisects involve variance, this cuts debug cycles. Test it on your next flake—worst case, fall back to vanilla git.