The Evidence Problem — What the studies actually say about Amira AI Tutor

I spent two weeks reading every study Amira Learning cites to justify the millions of dollars states are spending on their AI reading platform.

Here’s what I found.

Amira’s marketing claims “Strong (Tier 1)” ESSA evidence. The actual rating from Evidence for ESSA, the independent review maintained by Johns Hopkins University, is “Moderate (Tier 2).” That’s not an interpretation. It’s a fact you can verify in sixty seconds at evidenceforessa.org.

But it gets worse.

The only RCT-tested software from 2001

The one randomized controlled trial in Amira’s evidence base tested software built in 2001 at Carnegie Mellon. Amira Learning was founded in 2018. The product has been fundamentally redesigned in the 17 years between the study and the company. And the comparison group in that trial? Sustained Silent Reading. Kids sitting quietly with books. No feedback, no tutor, no adult interaction at all. Any interactive tool would beat that.

What the current-product studies actually show

The large-scale studies of the actual, current Amira product tell a different story.

In a Texas study covering 15,424 students, kindergartners showed a modest effect size of +0.26. First graders showed +0.06. That means moving a student from the 50th percentile to roughly the 52nd. Functionally zero.

In Louisiana, 79,084 students were studied. For fourth graders who used Amira, the effect size was 0.03. Third graders, 0.05. Only 5 to 19 percent of students even met the recommended usage levels.

Marketed ESSA tierStrong (Tier 1)

Actual ESSA tier (Johns Hopkins)Moderate (Tier 2)

Marketed effect size range0.64 – 0.70

Current-product effect sizes0.03 – 0.26

Only RCT — software vintage2001

Every current study is vendor-funded

And every one of these current-product studies was funded by Amira. The Louisiana study, which Amira calls an “Independent Third-Party Study” on their website, states on its own cover page that “Amira Learning contracted with Instructure” to conduct it.

No peer-reviewed study of the current Amira product exists. The What Works Clearinghouse has no entry for Amira. And the company markets effect sizes of 0.64 to 0.70 pulled from the 2001 software while the real-world numbers sit between 0.03 and 0.26.

What this means

I’m not saying the product doesn’t work at all. I’m saying the distance between the marketing claims and the documented evidence is not a rounding error. It’s a pattern.

And states are spending tens of millions of taxpayer dollars based on that marketing.

← All writing Full case →