Every public repository, read as a case file: the problem it solves, the approach taken, how it was validated, and what it taught. Shipped repos link to source; planned ones show the intended method, not invented results.
hallucination-hunterACTIVE
AI EVALUATION
PROBLEM
No simple, runnable way to measure whether an LLM answer is actually grounded in its sources, or just plausible.
APPROACH
Pluggable detectors and model backends scored against a labelled dataset, exposed through a CLI and a Python API.
VALIDATION
160-example labelled dataset · 136 tests · green CI · reproducible runs. v0.1.0, MIT.
FINDINGS
Groundedness works as a release gate, not a vibe check — borderline answers surface before they ship.
LESSON
Measure groundedness as a gate. Confidence is not evidence.
AI-assisted builds drift from intent without a spine: no gates, no acceptance criteria, no evidence trail.
APPROACH
A stage-gated lifecycle driven by executable acceptance specs, with reusable templates and checklists per stage.
VALIDATION
Built with parallel AI agents, then put through an independent max-effort audit (graded B+); findings fixed and re-verified. markdownlint + link-check CI.
FINDINGS
Auto-reviewers will confidently 'fix' things that are already correct — adversarial review plus a human pass caught it.
LESSON
Spec-first beats prompt-first, measurably. Let a skeptic try to break it.