TruthGuard
A 120-question confidence calibration benchmark designed to evaluate the metacognitive abilities of Large Language Models. Measures how accurately models align stated confidence with actual correctness — exposing metacognitive overconfidence in AI systems.
120
Questions
3
Difficulty Tiers
CC BY-SA 4.0
License
Question Tiers
What It Measures
Calibration Error
Measures the gap between stated confidence and actual accuracy
Overconfidence Analysis
Identifies systematic overconfidence patterns in frontier models
Cross-Difficulty Correlation
Tracks confidence-accuracy alignment across Easy, Tricky, and Trap tiers
Contributes to the AI safety and alignment field by providing a practical tool for improving model calibration. Open dataset available for the research community.