Frame Check

Source artifact: validation/decision_readiness/results/2026-06-01-7a6e2f294c9e/aggregate.md. The file is regenerated when the validation corpus is re-aggregated; this page reflects the most recent run on the deploy at site-build time. Verify findings against the underlying documents at the validation corpus. CC-BY-4.0.

Decision-readiness corpus aggregate findings

Descriptive of corpus state, not inference about populations.

Every claim below names its sample size inline. With small N,

individual outliers may be sampling noise rather than signal.

Aggregate findings become inference-grade only when N grows

substantially across genres and questions.

Peer-comparison findings

Aggregated across N = 12 pairwise peer comparisons in 2 peer groups. Each comparison is non-directional; counts below represent the number of pairs in which the two peers measurably differed on a given dimension.

Per-dimension divergence rate

Of 12 pairs, three outcomes per dimension: peers measurably differ, peers agree (signals comparable and match), or non-comparable (data missing on one or both sides; e.g., evidence and robustness when corpus profiles were computed without Source Network).

Dimension Differ Agree Non-comparable Rate (of comparable)
coverage 10 2 0 83% of 12
calibration 12 0 0 100% of 12
evidence 3 9 0 25% of 12
robustness 0 0 12 n/a
counterfactual 10 2 0 83% of 12

Named library entries per dimension

When a dimension diverges in this aggregate, the named patterns in the Frame Vocabulary Standard describe the structural failure modes that produce the divergence. Click through to read the canonical entry for each frame.

Per-peer-group breakdown

Counts within each peer group. A group with N members yields C(N, 2) pairs.

bitcoin_retirement_question (6 pairs):

startup_offer_question (6 pairs):

Per-LLM participation in differing pairs

When two peers differ on a dimension, BOTH peers participated. These counts are descriptive of HOW OFTEN each LLM is in a pair where some divergence was measured. Cross-LLM behavioral claims require larger N.

Outlier identification: per peer group, the member whose signal value is most different from the group median on a given dimension is identified as the outlier on that dimension. Counts below report how many groups each LLM was the outlier in (out of the groups it appeared in).

LLM Groups Cov Cal Evi Rob CF
claude 2 1 of 2 0 of 2 n/a n/a 2 of 2
gemini 2 1 of 2 1 of 2 n/a n/a 0 of 2
grok 2 0 of 2 1 of 2 n/a n/a 0 of 2
openai 2 0 of 2 0 of 2 n/a n/a 0 of 2

Legend: Cov = Coverage, Cal = Calibration, Evi = Evidence, Rob = Robustness, CF = Counterfactual. Cell shows outlier-count of comparable-group-count (n/a when the dimension was non-comparable in every group containing this LLM).

Cross-question outlier consistency

When the same LLM is identified as the outlier on the same dimension across multiple distinct peer groups (different questions), that consistency is a stronger structural signal than any single-group outlier identification. The findings below report every (LLM, dimension) cell where the LLM is the outlier in EVERY comparable group it appears in.

These findings still inherit the corpus-level N caveat (current peer groups: 2). Cross-question consistency is more compelling at 2 groups than at 1, more compelling at 5 than at 2. The threshold for 'corpus-level finding' rises with the question-diversity of the peer groups.

Transformation-diff findings

Aggregated across N = 1 transformation pair. Each pair is a directional source -> derived comparison.

Small sample warning. N = 1 is below the threshold (3) at which per-dimension rates become interpretive. Rates below should be read as raw counts, not as corpus-level findings. The aggregate becomes informative as more transformation pairs are added (see `CORPUS_GENRE_GAPS.md`).

Per-dimension movement rate

Of 1 pairs, the count of pairs where the transformation measurably moved a dimension:

Dimension Pairs moved Rate
coverage 1 of 1 1/1 (small N)
calibration 1 of 1 1/1 (small N)
evidence 1 of 1 1/1 (small N)
robustness 0 of 1 0/1 (small N)
counterfactual 0 of 1 0/1 (small N)

Per-transformation-kind breakdown

Honest limits

Citation

Lucic, L. (2026). Frame Check decision-readiness aggregate findings, corpus revision `7a6e2f294c9e`, computed 2026-06-01. https://frame.clarethium.com/corpus/decision-readiness/