How it's built
Every course is written by one AI, then torn apart by another
PurrLearn is a gamified learning app, but the interesting part is the machine behind it. Consumer "AI courses" are usually one model prompted once — which is exactly how you get confident, wrong answers. So the courses here go through an adversarial pipeline instead. This page is the honest version of how it works, including where it falls short.
The pipeline
- 1Sourced knowledge, not vibes
Each skill is collected as a store of "knowledge atoms" — small, self-contained facts, each carrying a real source (textbook, paper, official docs). A course can only be drafted from atoms that already exist. No atom, no claim.
- 2A producer drafts the lesson
One agent turns the atoms into a lesson: 2–3 teaching steps + ~10 questions, objectives mapped to Bloom levels, every hard claim traceable back to a source. It writes both English and Chinese.
- 3An independent reviewer tries to destroy it
A second, separate agent reviews the draft — explicitly prompted to assume the author fabricated things and to hunt for wrong quiz answers, overclaims, and misattributed citations. If it finds anything high-severity, the lesson is held, not shipped.
- 4A completeness gate decides if the course is real
Even after every lesson passes review, a third agent judges the course as a whole: is it ≥50 questions across ≥5 core skills, with a coherent beginner arc and no glaring gap? A course only reaches the catalog if it clears this bar — otherwise it stays unlisted.
Real bugs it actually caught
- Correct answers were silently piling up in option A across the entire question bank — a model bias that would let anyone "always pick A" and score well. Fixed with a deterministic per-question shuffle.
- A citation that flatly contradicted the study it was citing (an emotion-and-sharing claim pointed at a paper that said the opposite). Caught, reworded, source removed.
- A lesson that stated a design principle was a "late addition" when it wasn't — a fabricated historical detail, flagged and corrected.
Where this doesn't (yet) replace a human
Adversarial self-review catches fabrications and broken quiz logic well. It's weaker at taste, pedagogy, and knowing which 20% of a field actually matters to a beginner — that still wants a human subject-matter expert. The completeness gate is heuristic, not truth. It's a way to ship faster with a floor on quality, not a claim of perfection.