Stanford Study Finds AI Beats Law Professors at Answering Legal Questions

By Abdul Wasay|2 months ago |

A new Stanford Law School study has delivered a striking result. Law professors overwhelmingly preferred AI-generated answers to student legal questions over answers written by fellow professors. The finding could reshape how legal education is delivered.

Stanford Law School Professor Julian Nyarko led the research. The study, titled “Law Professors Prefer AI Over Peer Answers,” involved 16 law professors across US law schools. It tested whether large language models could work as effective tutors for contract law courses.

The numbers are clear. In a blind evaluation of nearly 3,000 anonymized comparisons, professors rated AI responses significantly higher than human-written ones. AI won 75% of head-to-head matchups. Even more telling, professors flagged AI responses as pedagogically harmful only 3.5% of the time. Peer-written answers drew that same warning 12% of the time.

“We designed this study to be as rigorous as possible because the stakes are so high,” Nyarko explained. “Legal education is about training future lawyers to think critically, argue persuasively, and navigate ethical complexities. Our study makes important steps towards finding out whether AI could support that mission.”

Stanford Law School Professor Julian Nyarko

The study stands out because legal reasoning rarely offers clean right-or-wrong answers. Most previous AI evaluations focused on subjects with clear solutions. Law demands judgment, nuanced reasoning, and the ability to navigate ambiguity. Nyarko said the team chose law precisely for that reason. He admitted the team was surprised by the size of the results.

The researchers designed the test carefully. Participants created 40 realistic contract law questions, wrote their own answers, then evaluated responses without knowing the source. The AI systems performed on par with the best human instructor in the study. The team also calibrated AI answers to match human responses in length and structure, then used multiple evaluation methods to ensure validity.

Yale Law School professor and co-author Sarath Sanga framed the core question simply. He said two opposing legal arguments can both be strong, so the team wanted to know whether AI could meet the professional standard lawyers use to judge each other. The answer, in this case, was yes. The paper drew co-authors from Yale, NYU, and the University of Chicago, among others.

The team tested specific models, including commercial tutoring systems and Google’s NotebookLM. Performance varied, but professors still often preferred AI even when context limits affected its answers.

Nyarko cautioned against overreaching. He stressed the team is not advocating wholesale adoption of AI tutors. However, he said blanket skepticism is equally unwarranted. He argued the conversation should move from whether AI can give high-quality answers to how schools can deploy it responsibly.

Nyarko is currently a Professor at Stanford Law School, the Faculty Director at liftlab, as well as a Senior Fellow at Stanford’s Human-Centered Artificial Intelligence (HAI).