Frame Check

Operational guide for Phase 2 decision-readiness raters. The methodology page documents the framework; this page documents HOW to rate one document against the five anchors. Source artifact at validation/decision_readiness/rater_guide.md. See also: the Phase 2 invitation for who this guide is addressed to, the rating-quality contrast for what good ratings look like, and the validation corpus for documents you would rate against. Licensed CC-BY-4.0.

Rater guide: decision-readiness profile

You are rating a document on five dimensions of decision support.

Each dimension is rated on a 1-5 scale. Read the methodology page

at /corpus/decision-readiness/

for the underlying framework; this guide gives you the operational

definitions and anchor descriptions you use while rating.

**Do not read Frame Check's computed profile for the document

before you rate.** The profile lives in `corpus/{doc_id}/profile.json`;

your ratings must be blind to it for the validation to be

informative.

How to rate one document

1. Open `corpus/{doc_id}/document.md` and read the document in full.

2. For each of the five dimensions below, score 1-5 against the

anchor descriptions.

3. Write notes per dimension explaining your reasoning. The notes

are how the validation effort learns where the profile and

raters diverge.

4. Save your ratings as `ratings/{doc_id}/{your_rater_id}.yaml`

using the template in `rating_template.yaml`.

If a dimension does not apply to a document (e.g., a poetry

excerpt has no numerical claims to source-verify), use the

sentinel value `null` rather than guessing. The harness handles

nulls correctly; guessing pollutes the correlation.

---

Dimension 1: Coverage of perspectives

What you are judging: Does the document address the perspectives

that matter for the kind of decision someone might make based on it?

Five general analytical perspectives anchor the dimension: causes

(why), risks (what could go wrong), stakeholders (who is affected),

trends (what is changing), uncertainty (what is unknown).

Anchor descriptions:

with substantive content per perspective. A reader leaves with

a multi-faceted picture suitable for considering most relevant

factors before deciding.

content. The missing perspective is named or its absence is not

load-bearing.

content, OR addresses all 5 but with very uneven depth (one or

two perspectives carry most of the analysis; others are token

mentions).

content. A reader would need additional sources to consider the

decision adequately.

effectively none in substantive depth.

Common confusions: Length is not a proxy for coverage. A short

document can cover all five perspectives concisely; a long

document can dwell on one perspective. Coverage is about whether

the perspectives are substantively present, not how many words

are devoted to them.

---

Dimension 2: Claim calibration

What you are judging: Are the claims in the document hedged

appropriately given their epistemic status? Statements about

uncertain matters that are stated as facts are mis-calibrated

upward; statements about well-established facts that are

unnecessarily hedged are mis-calibrated downward.

Anchor descriptions:

stated cleanly; claims about uncertain matters carry hedges

appropriate to the uncertainty (e.g., "may," "approximately,"

"in some cases," "if X, then Y").

one or two over-confident assertions about uncertain matters,

or a handful of unnecessarily hedged established facts.

other half mis-calibrates in either direction.

are stated as facts. Predictions are stated without

conditional language. The reader cannot tell from the prose

which claims are well-supported and which are speculative.

with the same confident voice regardless of underlying

certainty. The Confidence Imbalance pattern is unmistakable.

Common confusions: Genre matters. A grant proposal SHOULD be

hedged (forward-looking, uncertain). A historical analysis

should NOT be heavily hedged on established events. Use your

judgment about what calibration is APPROPRIATE for the genre,

not a fixed hedge ratio.

---

Dimension 3: Evidence backing

What you are judging: Are the document's claims supported by

sources, or are they floating assertions? Numerical claims and

factual claims need different things: numerical claims need a

source; factual claims need either a source or established

common knowledge.

Anchor descriptions:

Factual claims are either sourced or clearly common knowledge.

The reader can trace any specific claim to its origin.

handful of asserted numbers without attribution. Factual

backing is otherwise solid.

the others are stated without attribution. The reader has to

trust the author on a meaningful share of the content.

without attribution. The document is interpretation-heavy

with thin evidentiary backing.

The document asserts numbers without showing where they come

from. The reader cannot verify any specific claim.

Common confusions: "Evidence backing" is about the document's

internal sourcing discipline, not about whether the sources

themselves are correct. A document that cites an unreliable

source still rates higher on this dimension than one that

asserts the same number with no source. Source CORRECTNESS is

captured by the Robustness dimension when Frame Check verifies

against authoritative providers.

---

Dimension 4: Robustness

What you are judging: Does the document hold up under scrutiny?

Specifically: do its claims survive checking against external

sources? Does the internal logic hold together?

Anchor descriptions:

external sources hold up. Internal logic is consistent. No

obvious load-bearing claim turns out to be wrong.

errors (rounding, dated figures) but no load-bearing claim

fails. Internal logic is consistent.

claim-source mismatches, including at least one that bears

on the document's main argument. Internal logic has at least

one questionable link.

Internal logic has gaps that affect the conclusion.

external sources. The document's main argument relies on

unsupportable assertions.

Common confusions: Robustness assumes you spot-check at least

a few of the document's specific numerical or factual claims. If

you cannot spot-check at all (the document is purely

interpretive with no checkable claims), use `null` for this

dimension.

---

Dimension 5: Counterfactual thinking

What you are judging: Does the document name what would

falsify it? Does it consider alternative explanations or

opposing views? Does it engage with how it might be wrong?

Anchor descriptions:

its main claims. Alternative interpretations are considered and

addressed. Limitations are acknowledged in proportion to the

confidence of the conclusions.

one alternative interpretation is considered. The reader sees

the author has thought about how the analysis could be wrong.

is generic. Alternatives are mentioned but not engaged with.

The author signals counterfactual thinking without practicing

it.

alternatives. The document presents one interpretation as if

it were the only one.

(corresponds to the named pattern

FVS-007: Failure Framing absent

in the Frame Vocabulary Standard).

The document makes confident claims with no engagement with how

it might be wrong, what would change the conclusion, or what

alternative interpretations exist.

Common confusions: This is the dimension most affected by

genre. Editorials and op-eds are expected to be one-sided; a

strong op-ed scoring 1 here is not necessarily a bad op-ed. Use

the genre context (in `corpus/{doc_id}/metadata.yaml`) to

calibrate your rating.

---

Time budget

A rating session takes 15-30 minutes per document for an

experienced rater. Longer for the first few documents while you

calibrate against the anchors; faster once you have done 3-5.

Record `time_spent_minutes` in your rating file so the validation

effort can publish realistic time estimates for future raters.

When in doubt

(calibration biases tend to be upward; conservative rating

helps the validation)

informative

The validation harness reports cases where Frame Check and

expert raters diverge sharply. These are the most useful

documents for methodology revision; your notes are what makes

the divergence interpretable.