AI homework gains may hide a two-year learning cost

A study of more than 26,000 students in central China found that AI use improved homework speed and grades while exam scores fell. The biggest concern is timing: the full entrance-exam gap took about two years to appear, which means short studies may miss the damage.

WTF Index IDIOCRACY
◄ Terminator 0 Idiocracy 4 ►

AI use appears to improve homework output while weakening students' real learning and exam performance over time.

AI homework gains may hide a two-year learning cost

AI can make schoolwork look better in the short term. In a large study from central China, students who used AI completed assignments faster and earned stronger homework scores. But the same students later performed worse on closed-book exams, and the largest gap on entrance exams took about two years to become visible.

The finding does not mean AI is always harmful in education. The study points to a sharper distinction: students who used AI while still spending serious time on homework did not show the same exam losses. The risk appears when AI turns homework from practice into outsourcing.

What the study tracked

The researchers analyzed 30 months of panel data from more than 26,000 students in grades 7 through 12 in a county with over one million residents. The dataset covered monthly exams, homework scores and completion times, and high-stakes entrance exams for high school and college.

AI adoption rose quickly during the study period. Self-reported use moved from near zero to about 80 percent, with a major increase around the releases of DeepSeek V2.5 in September 2024 and DeepSeek R1 in January 2025. The most commonly used tools were Doubao, DeepSeek, ChatGLM, Ernie Bot, and Qwen.

The study used the fact that students began using AI at different times. Researchers compared each student’s performance before and after first use, then contrasted that pattern with students who had not yet started using AI. The timing of first use came from self-reported data, and the causal claim depends on the assumption that both groups would otherwise have followed similar learning paths.

Homework improved while exams weakened

Six months after students first used AI, homework scores were up by 18 percent. At the same time, the average time spent on an assignment fell from 64 to 45 minutes.

That looked efficient on paper. But closed-book monthly exam scores fell by 20 percent over the same period. In other words, the work students submitted improved, while their ability to reproduce or apply knowledge without AI declined.

The delayed effect on entrance exams is especially important. Regular exam scores weakened within half a year, but the full impact on high-stakes entrance exams took about two years to appear. The decline ranged from 18 to 24 percent.

That timing matters for schools and researchers. A short evaluation could easily capture the visible benefit of faster homework and higher assignment grades, while missing the slower loss in retained knowledge. The study argues that the long-term learning cost only becomes clear after enough students have used AI for long enough.

Outsourcing changed the meaning of homework

After more than five months of AI use, about 81 percent of students finished homework in under 50 minutes. That was faster than even the quickest students who did not use AI. These students had high homework grades but weak exam results.

The researchers interpret that pattern as a sign that many students were outsourcing their work to AI. The signal is not any one metric by itself. It is the combination of unusually short completion times, strong homework marks, and poor closed-book performance.

For schools, this creates a measurement problem. Homework grades become less reliable when AI can produce polished answers without requiring students to do the thinking. Among AI users with above-average homework scores, higher homework grades actually predicted worse exam results.

But the study also found a more constructive pattern. AI users who spent about as much time on homework as classmates who did not use AI performed just as well on exams and earned better homework grades. This group was not simply made up of stronger students at the start, according to the study. The implication is clear: AI use is less damaging when it supports effort instead of replacing it.

Who was hit hardest

The decline was not evenly spread across subjects or student groups. Social science subjects such as politics and geography saw an average decline of 27 percent. STEM subjects fell 22 percent, English 17 percent, and Chinese 9 percent.

That pattern is notable because many earlier experiments focused on math, programming, and foreign languages. The study suggests that the learning risk is not limited to technical or language tasks.

The effects also differed by age, gender, prior performance, and amount of use:

  • Younger students in lower secondary school lost more than older students, with declines of 24 versus 17 percent.
  • Boys were hit harder than girls, with declines of 21.6 versus 18.4 percent, which the study mainly attributes to heavier AI use among boys.
  • Top performers saw a minus 24 percent effect, compared with minus 16 percent in the bottom third.
  • Students using AI for up to one hour per week lost about 5 percent, while those using it five hours or more lost 30 percent.

This dose-response pattern strengthens the study’s central concern. More AI use was associated with larger learning losses, especially when the tool appears to have substituted for independent work.

Why the warning took time to show up

The estimated learning penalty fell from about 25 percent in early 2023 to 16 percent by June 2025. The same improvement appeared among a fixed group of early adopters, suggesting some adaptation by students and teachers. Still, the losses did not disappear.

The study also explains why the reaction from schools may have been muted. A teacher often sees only one subject, where a 20 percent grade drop may not immediately stand out as an AI problem. At the county level, the aggregate effect did not reach about minus 10 percent until June 2025 because many students had not used AI long enough for the damage to build.

Students may also misread what is happening. Independent learning often feels slower and more difficult than receiving an answer quickly. The study says students can mistake that mental effort for a sign that they are learning poorly, even though the effort is part of learning itself.

The proposed countermeasures are practical rather than absolute bans. The study suggests giving students credible information about the long-term costs of outsourcing, putting more weight on in-person exams, and tracking completion time instead of relying mainly on homework grades.

Other research described in the source points in the same direction. An Anthropic study found that participants learning programming skills with AI help scored 17 percent worse on follow-up knowledge tests than a control group, without saving real time. Those who copied AI answers performed worse, while those who used AI to understand tasks did not show the same decline. A Swiss Business School study found a negative link between AI use and critical thinking, and a UC Berkeley study analyzing more than 500,000 grades found that top A grades rose in writing- and programming-heavy courses after ChatGPT launched, with the effect concentrated in unsupervised homework rather than proctored exams.

The lesson for education is not that AI must stay out of classrooms. It is that schools need to separate AI-supported learning from AI-substituted work. If grading continues to reward polished homework without checking whether students can perform without assistance, the real cost may arrive too late to correct easily.