Frame Check

Calibration runs exercise each verifier in the Source Network against a seed corpus of claims whose ground truth has been established independently. Precision is the share of positive verdicts that were correct; recall is the share of real positives that were caught. F1 combines the two. A verifier with a missing API key returns unverifiable on every claim and therefore scores n/a in the table below; the row is committed to make the key-missing code path legible, not because it represents the verifier's calibrated reliability. See the calibration hub for the run index and the methodology protocol. The underlying per-claim verdicts are published as raw_verdicts.json so the aggregation here is reproducible.

Source Network Calibration Report

Per-provider results (all claims)

Provider N TP FP FN TN Precision Recall F1
wikipedia 6 4 1 0 1 0.80 1.00 0.89

Per-provider results (stale claims excluded)

Claims with `as_of_date` older than 90 days are excluded from the numbers below. Comparing these F1 values against the table above separates 'ground truth has drifted since the corpus was seeded' from 'the verifier genuinely misses'.

Provider N TP FP FN TN Precision Recall F1
wikipedia 6 4 1 0 1 0.80 1.00 0.89

Claim-level detail

ID Provider Expected Observed Match Best source
wiki-001 wikipedia verified verified yes Wolfram Alpha
wiki-002 wikipedia verified verified yes Wikipedia
wiki-003 wikipedia verified verified yes Wolfram Alpha
wiki-004 wikipedia contradicted verified no Wikipedia
wiki-005 wikipedia contradicted unverifiable no
wiki-006 wikipedia verified close yes Wikipedia