# RAILS — Reading Alignment Index for AI Literacy Systems
## Version 1.1 (Draft) | April 2026

### Purpose
RAILS is a purpose-built evaluation instrument for assessing whether AI-generated literacy instructional content aligns with the scientific consensus on how children learn to read. To the best of our knowledge after systematic search, it is the first instrument designed specifically for evaluating single AI-generated outputs against Science of Reading indicators.

### What RAILS Evaluates
Single AI-generated outputs: lesson plans, instructional recommendations, activity designs, assessment suggestions, and curriculum advice produced by AI tools in response to teacher prompts.

### What RAILS Does NOT Evaluate
Full curricula, scope and sequence documents, year-long programs, student outcomes, the mechanism behind AI outputs (training data vs. system prompts vs. fine-tuning), or the AI tool's privacy, usability, or technical features.

### Scope of Claims
RAILS documents what AI tools produce. It does not make causal claims about why tools produce non-aligned content, whether teachers adopt AI recommendations, or whether AI-generated content is better or worse than teacher-generated content.

### Research Foundation
RAILS draws from the same interdisciplinary evidence base underpinning every major literacy evaluation framework globally (TRL CEGs, EdReports, FCRR/REL Rubric, NCTQ, UK DfE SSP Criteria, AITSL). Its indicators are grounded in:
- National Reading Panel (NICHD, 2000)
- Simple View of Reading (Gough & Tunmer, 1986)
- Scarborough's Reading Rope (Scarborough, 2001)
- Ehri's Phases of Word Reading Development (Ehri, 2005, 2014, 2020)
- Self-Teaching Hypothesis (Share, 1995)
- Castles, Rastle & Nation (2018) — "Ending the Reading Wars"
- Moats, L.C. (2020) — "Teaching Reading Is Rocket Science"
- Seidenberg, M. (2017) — "Language at the Speed of Sight"
- Graham & Harris (2005) — Self-Regulated Strategy Development
- Beck, McKeown & Kucan (2002, 2013) — Tiered Vocabulary Framework

### Indicator Tiers

**Strong Consensus** — Converging evidence from multiple meta-analyses and syntheses. No serious scientific dispute:
WR-1, WR-2, WR-3, WR-4, WR-5, WR-6, W-2, AS-1, AS-2

AS-1 and AS-2 are Strong Consensus because the evidence against MSV-based miscue analysis and leveled-text placement is well-established (Moats 2000/2020, Kilpatrick 2015, NRP 2000). The critique of running records concerns the MSV error analysis framework specifically, not listening to students read. The critique of leveled text systems concerns using text predictability rather than decodability as the basis for placement.

**Emerging Consensus** — Supported by substantial research but with some ongoing scholarly debate:
LC-1, LC-2, RC-1, W-1

Findings from Emerging Consensus indicators are reported with appropriate hedging.

### Scoring Scale (4-Point Severity)

**For presence-based indicators** (WR-1, WR-2, WR-3, WR-4, WR-6, LC-1, RC-1, W-1, W-2, AS-1, AS-2):
- **0 — Absent:** The practice is not present in the output
- **1 — Peripheral:** Mentioned briefly, as one option among several
- **2 — Central:** A primary instructional recommendation
- **3 — Dominant:** The main or sole approach; no SoR-aligned alternatives offered

**For absence-based indicators** (WR-5, LC-2):
- **0 — Present:** The aligned practice is explicitly and adequately addressed
- **1 — Vague:** The practice is mentioned but without depth or clarity
- **2 — Missing:** The aligned practice is absent from the output
- **3 — Missing + Replaced:** The aligned practice is absent AND the space is filled by a non-aligned alternative

### Run Scoring Decision Rule
Score all 3 runs independently. Report the **median** severity per indicator. Also report the **range** to document non-determinism. Flag any indicator where the range is ≥ 2.

### Scoring Rules
- Score ONLY what is explicitly stated in the output. Do not infer.
- If the practice is implied but not stated, score 0 and note "implied but not explicit" in the evidence column.
- Exception: If both raters independently identify the same implied practice, it may be scored as 1 (Peripheral) with documentation.
- When uncertain, score 0 with a one-sentence explanation. A wrong score is 3x worse than a zero.

### Metrics
- **Red Flag Rate** = outputs with ≥1 indicator scored 2+ / total outputs
- **Critical Flag Rate** = same, filtered to Critical indicators only
- **Severity Index** = mean severity score across all flagged indicators
- **Self-Correction Rate** = outputs where the tool warns against a non-aligned practice / total

---

## STRAND 1: WORD RECOGNITION [Strong Consensus]
*Simple View: Decoding side. Scarborough's Rope: Phonological Awareness, Decoding, Sight Recognition*

### WR-1: Three-Cueing / MSV [CRITICAL]
**Red flag:** The output recommends using meaning (M), structure (S), and/or visual (V) cues, picture clues, context clues, or "reading around the word" as strategies for identifying unfamiliar words.
**Anchor examples:**
- Score 3: "When students encounter an unfamiliar word, they should look at the picture, think about what makes sense, and check the first letter."
- Score 2: "Students can use a combination of strategies including phonics, context clues, and picture support."
- Score 1: "While phonics is primary, students may occasionally check if a word makes sense in context."
- Score 0: "Students should sound out unfamiliar words using their knowledge of letter-sound correspondences."
**Research:**
- Stanovich, K.E. (1980). Toward an interactive-compensatory model. *RRQ*, 16(1), 32-71.
- Share, D.L. (1995). Phonological recoding and self-teaching. *Cognition*, 55(2), 151-218.
- Castles, A., Rastle, K., & Nation, K. (2018). Ending the reading wars. *PSPI*, 19(1), 5-51.
- Seidenberg, M. (2017). *Language at the Speed of Sight*. Chapter 10.

### WR-2: Whole-Word Memorization of High-Frequency Words [CRITICAL]
**Red flag:** Teaching "sight words" through rote memorization without attention to letter-sound structure.
**Anchor examples:**
- Score 3: "Use flashcards and repetition to memorize these 50 sight words."
- Score 2: "Have students practice sight words through rainbow writing and word walls."
- Score 1: "Most words are taught through phonics, but some irregular words need to be memorized."
- Score 0: "Teach high-frequency words by drawing attention to the regular and irregular letter-sound parts."
**Research:**
- Ehri, L.C. (2014). Orthographic mapping. *SSR*, 18(1), 5-21.
- Ehri, L.C. (2005). Learning to read words. *SSR*, 9(2), 167-188.
- Miles, K.P. & Ehri, L.C. (2019). Orthographic mapping facilitates sight word memory. In *Reading Development and Difficulties*. Springer.

### WR-3: Incidental or Embedded Phonics
**Red flag:** Phonics taught as opportunistic mini-lessons rather than systematic, explicit instruction.
**Anchor examples:**
- Score 3: "As you read together, point out interesting letter patterns when they come up naturally."
- Score 2: "Include a brief phonics mini-lesson at the start, then focus on the text."
- Score 1: "Phonics is taught systematically, with occasional reinforcement during shared reading."
- Score 0: "The lesson includes explicit, structured phonics instruction with a clear teaching sequence: model, guided practice, independent application."
**Research:**
- NRP (NICHD, 2000). Chapter 2: Alphabetics.
- Ehri et al. (2001). Systematic phonics instruction helps students learn to read. *RER*, 71(3), 393-447.

### WR-4: Leveled/Predictable Texts Over Decodable Texts [CRITICAL]
**Red flag:** Recommending leveled readers (A-Z), predictable texts, or "reading level" placement for beginning readers instead of decodable texts matched to taught phonics patterns.
**Anchor examples:**
- Score 3: "Select leveled readers at the student's guided reading level (e.g., Level D)."
- Score 2: "Use a mix of leveled readers and decodable texts."
- Score 1: "Primarily use decodable texts, supplemented with some leveled texts for engagement."
- Score 0: "Use decodable texts that match the phonics patterns students have been taught."
**Research:**
- Mesmer, H.A. (2001). Decodable text: A review. *RRI*, 40(2), 121-141.
- Juel, C. & Roper-Schneider, D. (1985). Influence of basal readers. *RRQ*, 20(2), 134-152.
- Price-Mohr, R. & Price, C. (2020). Decodable vs non-decodable texts. *ECEJ*, 48, 39-47.

### WR-5: Phoneme Awareness Not Taught or Conflated with Phonics
**Red flag:** Not addressing phoneme-level awareness as a distinct oral/auditory skill, or conflating it with letter-based phonics.
**Anchor examples:**
- Score 3: The output addresses letter-sound instruction but phoneme awareness is entirely absent — no blending, segmenting, or any oral/auditory skill work is mentioned or implied.
- Score 2: Phoneme awareness is absent as a distinct skill. The output may reference "phonics and phonemic awareness" but treats them as interchangeable or only addresses the print-based side.
- Score 1: Phoneme awareness is mentioned but vaguely or briefly, without specific oral/auditory activities. The output acknowledges it exists but doesn't develop it as a separate instructional strand.
- Score 0: "Teach phoneme awareness (blending, segmenting sounds) orally before connecting to print."
**Research:**
- NRP (NICHD, 2000). Chapter 2: Phonemic Awareness.
- Kilpatrick, D.A. (2015). *Essentials of Assessing, Preventing, and Overcoming Reading Difficulties*. Chapters 4-5.

### WR-6: Fluency as Speed Only, or Round-Robin Reading
**Red flag:** Defining fluency primarily as reading rate, recommending round-robin/popcorn reading, or omitting accuracy and prosody.
**Anchor examples:**
- Score 3: "Have each student read a page aloud while others follow along" (round-robin).
- Score 2: "Focus on increasing words-per-minute through timed readings."
- Score 1: "Practice fluency through repeated reading, focusing on speed and expression."
- Score 0: "Build fluency through repeated reading of connected text, emphasizing accuracy, automaticity, and prosody."
**Research:**
- NRP (NICHD, 2000). Chapter 3: Fluency.
- Kuhn, M.R. & Stahl, S.A. (2003). Fluency: A review. *JEP*, 95(1), 3-21.
- Therrien, W.J. (2004). Fluency and comprehension gains. *RASE*, 25(4), 252-261.

---

## STRAND 2: LANGUAGE COMPREHENSION [Emerging Consensus]
*Simple View: Language Comprehension side. Scarborough's Rope: Background Knowledge, Vocabulary, Language Structures.*

### LC-1: Vocabulary Through Context Clues Only
**Red flag:** "Use context clues" as the primary vocabulary strategy without explicit instruction.
**Anchor examples:**
- Score 3: "Teach students to figure out unknown words by reading the sentences around them."
- Score 2: "Use context clues and discuss word meanings when students encounter them."
- Score 1: "Teach vocabulary explicitly, and also show students how context can confirm meaning."
- Score 0: "Provide explicit vocabulary instruction with student-friendly definitions, multiple exposures, and morphological analysis."
**Research:**
- Beck, I.L., McKeown, M.G., & Kucan, L. (2013). *Bringing Words to Life* (2nd ed.). Guilford.
- NRP (NICHD, 2000). Chapter 4: Vocabulary.
- Bowers, P.N., Kirby, J.R., & Deacon, S.H. (2010). Morphological instruction. *RER*, 80(2), 144-179.

### LC-2: No Background/Domain Knowledge Building
**Red flag:** Skills-only comprehension approach without building content knowledge.
**Anchor examples:**
- Score 3: "Focus on practicing the strategy of making predictions across multiple unrelated texts."
- Score 2: "Read a variety of texts and practice comprehension strategies."
- Score 1: "Build knowledge through a text set on a topic, and teach strategies as tools for understanding."
- Score 0: "Use a coherent sequence of knowledge-building texts on a topic, developing vocabulary and background knowledge."
**Research:**
- Willingham, D.T. (2006). How knowledge helps. *American Educator*, 30(1), 30-37.
- Hirsch, E.D. (2003). Reading comprehension requires knowledge. *American Educator*, 27(1), 10-29.
- Cabell, S.Q. & Hwang, H. (2020). Building content knowledge. *RRQ*, 55(S1), S99-S107.

---

## STRAND 3: READING COMPREHENSION [Emerging Consensus]

### RC-1: Comprehension Strategies as Ends in Themselves
**Red flag:** Strategies (predicting, visualizing, making connections) taught as standalone skills rather than tools applied to rich texts.
**Anchor examples:**
- Score 3: "This week we practice predicting. Students make predictions before, during, and after reading."
- Score 2: "Teach comprehension strategies and apply them to a text."
- Score 1: "Use strategies as temporary scaffolds while building knowledge from the text."
- Score 0: "Build comprehension through knowledge-rich texts, using strategies as tools for understanding specific content."
**Research:**
- Willingham, D.T. (2006). Usefulness of brief instruction in reading comprehension strategies. *AE*, 30(4), 39-50.
- Duke, N.K. et al. (2011). Essential elements of fostering reading comprehension. In *What Research Has to Say* (4th ed.). IRA.

---

## STRAND 4: WRITING [Mixed — see indicator tiers]

### W-1: No Explicit Writing Instruction [Emerging Consensus]
**Red flag:** Assigning writing without explicit instruction in sentence/paragraph structure, or relying entirely on process writing.
**Anchor examples:**
- Score 3: "Have students brainstorm, write a draft, and share with a partner."
- Score 2: "Students write a persuasive paragraph using the writing process."
- Score 1: "Model a topic sentence, then guide students through constructing supporting details."
- Score 0: "Use explicit instruction: model the target structure, practice collaboratively, then release to independent writing."
**Research:**
- Graham, S. & Harris, K.R. (2005). Writing better. Brookes.
- Graham, S. et al. (2012). Writing instruction meta-analysis. *JEP*, 104(4), 879-896.

### W-2: Spelling Not Connected to Phonics [Strong Consensus]
**Red flag:** Spelling taught through memorization (rainbow writing, copying, word sorts by visual pattern) without connecting to phoneme-grapheme correspondences.
**Anchor examples:**
- Score 3: "Students practice spelling words by writing them five times each."
- Score 2: "Use word sorts and spelling tests with a weekly word list."
- Score 1: "Connect spelling to phonics patterns, with some additional practice through word sorts."
- Score 0: "Teach spelling through phoneme-grapheme mapping, connecting encoding to the phonics scope and sequence."
**Research:**
- Moats, L.C. (2005/2020). *Speech to Print*. Brookes.
- NRP (NICHD, 2000). Chapter 2: Alphabetics (encoding).

---

## STRAND 5: ASSESSMENT [Strong Consensus]

### AS-1: MSV Miscue Analysis / Running Records [CRITICAL]
**Red flag:** Running records with MSV/miscue analysis as primary assessment, or analyzing errors by whether substituted words "make sense."
**Anchor examples:**
- Score 3: "Use running records to analyze whether the student's miscues are meaning-based, structural, or visual."
- Score 2: "Running records and phonics assessments together."
- Score 1: "Use running records for fluency, but assess decoding separately with a phonics screener."
- Score 0: "Use evidence-based screening tools (DIBELS, Acadience) to assess foundational skills."
**Research:**
- Moats, L.C. (2000/2020). *Teaching Reading Is Rocket Science*. AFT.
- Kilpatrick, D.A. (2015). Chapters 2, 6.

### AS-2: Guided Reading Levels (A-Z) as Primary Metric [CRITICAL]
**Red flag:** Using F&P levels, DRA levels, or similar leveled-text systems as primary placement or progress metric.
**Anchor examples:**
- Score 3: "Assess students and place them at their guided reading level (e.g., Level J)."
- Score 2: "Use guided reading levels alongside other assessments."
- Score 1: "Guided reading levels provide some information, but use phonics screeners for placement."
- Score 0: "Use skill-based assessment data (phonics, phoneme awareness, fluency) to inform instruction."
**Research:**
- Hiebert, E.H. & Pearson, P.D. (2010). Examination of text complexity measures. TextProject.
- Cunningham, J.W. et al. (2005). Validity of quantitative text tools. *RWQ*, 21, 349-369.

---

## SUMMARY

| ID | Strand | Indicator | Critical? | Tier |
|----|--------|-----------|-----------|------|
| WR-1 | Word Recognition | Three-cueing / MSV | YES | Strong |
| WR-2 | Word Recognition | Whole-word memorization | YES | Strong |
| WR-3 | Word Recognition | Incidental phonics | No | Strong |
| WR-4 | Word Recognition | Leveled over decodable | YES | Strong |
| WR-5 | Word Recognition | PA missing/conflated | No | Strong |
| WR-6 | Word Recognition | Fluency = speed / round-robin | No | Strong |
| LC-1 | Language Comp | Context clues only | No | Emerging |
| LC-2 | Language Comp | No knowledge building | No | Emerging |
| RC-1 | Reading Comp | Strategies as ends | No | Emerging |
| W-1 | Writing | No explicit instruction | No | Emerging |
| W-2 | Writing | Spelling not connected to phonics | No | Strong |
| AS-1 | Assessment | MSV / running records | YES | Strong |
| AS-2 | Assessment | Guided reading levels | YES | Strong |

**Total indicators:** 13
**Critical indicators:** 5 (WR-1, WR-2, WR-4, AS-1, AS-2)
**Strong Consensus:** 9 indicators
**Emerging Consensus:** 4 indicators
**Presence-based indicators:** 11
**Absence-based indicators:** 2 (WR-5, LC-2)
**WR indicators scored K-3 only** (N/A for grades 4-5)
**Total unique citations:** 28+

---

## Relationship to Existing Frameworks

RAILS was developed independently. Its indicators are grounded in the same evidence base as TRL CEGs, EdReports v2.0, FCRR/REL Rubric, NCTQ Reading Foundations, UK DfE SSP Criteria, and AITSL Reading Instruction Evidence Guide. It is purpose-built for a unit of analysis (single AI-generated outputs) that none of these frameworks address.

---

*RAILS v1.1 (Draft) — Joshua Durey / Still Human — April 2026*
*Pre-registration: [OSF link TBD]*
*Expert review status: Pending (Phase 0)*
*License: Creative Commons BY 4.0 (for maximum adoption and citation)*
