You are rating a document on five dimensions of decision support.
Each dimension is rated on a 1-5 scale. Read the methodology page
at /corpus/decision-readiness/
for the underlying framework; this guide gives you the operational
definitions and anchor descriptions you use while rating.
**Do not read Frame Check's computed profile for the document
before you rate.** The profile lives in `corpus/{doc_id}/profile.json`;
your ratings must be blind to it for the validation to be
informative.
1. Open `corpus/{doc_id}/document.md` and read the document in full.
2. For each of the five dimensions below, score 1-5 against the
anchor descriptions.
3. Write notes per dimension explaining your reasoning. The notes
are how the validation effort learns where the profile and
raters diverge.
4. Save your ratings as `ratings/{doc_id}/{your_rater_id}.yaml`
using the template in `rating_template.yaml`.
If a dimension does not apply to a document (e.g., a poetry
excerpt has no numerical claims to source-verify), use the
sentinel value `null` rather than guessing. The harness handles
nulls correctly; guessing pollutes the correlation.
---
What you are judging: Does the document address the perspectives
that matter for the kind of decision someone might make based on it?
Five general analytical perspectives anchor the dimension: causes
(why), risks (what could go wrong), stakeholders (who is affected),
trends (what is changing), uncertainty (what is unknown).
Anchor descriptions:
with substantive content per perspective. A reader leaves with
a multi-faceted picture suitable for considering most relevant
factors before deciding.
content. The missing perspective is named or its absence is not
load-bearing.
content, OR addresses all 5 but with very uneven depth (one or
two perspectives carry most of the analysis; others are token
mentions).
content. A reader would need additional sources to consider the
decision adequately.
effectively none in substantive depth.
Common confusions: Length is not a proxy for coverage. A short
document can cover all five perspectives concisely; a long
document can dwell on one perspective. Coverage is about whether
the perspectives are substantively present, not how many words
are devoted to them.
---
What you are judging: Are the claims in the document hedged
appropriately given their epistemic status? Statements about
uncertain matters that are stated as facts are mis-calibrated
upward; statements about well-established facts that are
unnecessarily hedged are mis-calibrated downward.
Anchor descriptions:
stated cleanly; claims about uncertain matters carry hedges
appropriate to the uncertainty (e.g., "may," "approximately,"
"in some cases," "if X, then Y").
one or two over-confident assertions about uncertain matters,
or a handful of unnecessarily hedged established facts.
other half mis-calibrates in either direction.
are stated as facts. Predictions are stated without
conditional language. The reader cannot tell from the prose
which claims are well-supported and which are speculative.
with the same confident voice regardless of underlying
certainty. The Confidence Imbalance pattern is unmistakable.
Common confusions: Genre matters. A grant proposal SHOULD be
hedged (forward-looking, uncertain). A historical analysis
should NOT be heavily hedged on established events. Use your
judgment about what calibration is APPROPRIATE for the genre,
not a fixed hedge ratio.
---
What you are judging: Are the document's claims supported by
sources, or are they floating assertions? Numerical claims and
factual claims need different things: numerical claims need a
source; factual claims need either a source or established
common knowledge.
Anchor descriptions:
Factual claims are either sourced or clearly common knowledge.
The reader can trace any specific claim to its origin.
handful of asserted numbers without attribution. Factual
backing is otherwise solid.
the others are stated without attribution. The reader has to
trust the author on a meaningful share of the content.
without attribution. The document is interpretation-heavy
with thin evidentiary backing.
The document asserts numbers without showing where they come
from. The reader cannot verify any specific claim.
Common confusions: "Evidence backing" is about the document's
internal sourcing discipline, not about whether the sources
themselves are correct. A document that cites an unreliable
source still rates higher on this dimension than one that
asserts the same number with no source. Source CORRECTNESS is
captured by the Robustness dimension when Frame Check verifies
against authoritative providers.
---
What you are judging: Does the document hold up under scrutiny?
Specifically: do its claims survive checking against external
sources? Does the internal logic hold together?
Anchor descriptions:
external sources hold up. Internal logic is consistent. No
obvious load-bearing claim turns out to be wrong.
errors (rounding, dated figures) but no load-bearing claim
fails. Internal logic is consistent.
claim-source mismatches, including at least one that bears
on the document's main argument. Internal logic has at least
one questionable link.
Internal logic has gaps that affect the conclusion.
external sources. The document's main argument relies on
unsupportable assertions.
Common confusions: Robustness assumes you spot-check at least
a few of the document's specific numerical or factual claims. If
you cannot spot-check at all (the document is purely
interpretive with no checkable claims), use `null` for this
dimension.
---
What you are judging: Does the document name what would
falsify it? Does it consider alternative explanations or
opposing views? Does it engage with how it might be wrong?
Anchor descriptions:
its main claims. Alternative interpretations are considered and
addressed. Limitations are acknowledged in proportion to the
confidence of the conclusions.
one alternative interpretation is considered. The reader sees
the author has thought about how the analysis could be wrong.
is generic. Alternatives are mentioned but not engaged with.
The author signals counterfactual thinking without practicing
it.
alternatives. The document presents one interpretation as if
it were the only one.
(corresponds to the named pattern
FVS-007: Failure Framing absent
in the Frame Vocabulary Standard).
The document makes confident claims with no engagement with how
it might be wrong, what would change the conclusion, or what
alternative interpretations exist.
Common confusions: This is the dimension most affected by
genre. Editorials and op-eds are expected to be one-sided; a
strong op-ed scoring 1 here is not necessarily a bad op-ed. Use
the genre context (in `corpus/{doc_id}/metadata.yaml`) to
calibrate your rating.
---
A rating session takes 15-30 minutes per document for an
experienced rater. Longer for the first few documents while you
calibrate against the anchors; faster once you have done 3-5.
Record `time_spent_minutes` in your rating file so the validation
effort can publish realistic time estimates for future raters.
(calibration biases tend to be upward; conservative rating
helps the validation)
informative
The validation harness reports cases where Frame Check and
expert raters diverge sharply. These are the most useful
documents for methodology revision; your notes are what makes
the divergence interpretable.