Benchmarks.

AI evaluation research and LLM testing infrastructure I've published.

View Research
Metacognition Track · AGI Cognitive Abilities

TruthGuard

A 120-question confidence calibration benchmark designed to evaluate the metacognitive abilities of Large Language Models. Measures how accurately models align stated confidence with actual correctness — exposing metacognitive overconfidence in AI systems.

120

Questions

3

Difficulty Tiers

CC BY-SA 4.0

License

Question Tiers

EasyTrickyHard (Trap)
PythonPandasNumPyMatplotlibSeabornKaggle

What It Measures

Calibration Error

Measures the gap between stated confidence and actual accuracy

Overconfidence Analysis

Identifies systematic overconfidence patterns in frontier models

Cross-Difficulty Correlation

Tracks confidence-accuracy alignment across Easy, Tricky, and Trap tiers

Contributes to the AI safety and alignment field by providing a practical tool for improving model calibration. Open dataset available for the research community.