4 min read

AI Can Read Every Law Ever Written—But It Can't Think Like a Lawyer

Picture of Writing Team Writing Team : Nov 10, 2025 8:00:00 AM

Research Law firm marketing Agentic AI Legalities

8:02

Computer scientist Randy Goebel has been running a competition for over a decade that exposes AI's most fundamental weakness in legal reasoning: it can retrieve statutes, analyze precedents, and summarize cases—but it can't make the yes/no judgment call. Did the defendant break the law, or not? That's where AI collapses. And according to new research published in Computer Law & Security Review, it reveals why deploying AI in courtrooms is both inevitable and terrifying.

Goebel, from the University of Alberta, and his colleagues in Japan use real cases from the Japanese bar exam to test AI systems. The task is simple: retrieve relevant statutes, apply them to the facts, and decide guilt or innocence. AI consistently fails the last part. Not because it lacks information—it has access to every legal precedent ever written. But because legal reasoning isn't retrieval. It's inference, judgment, and contextual logic. And large language models don't reason. They pattern-match.

The paper outlines three types of reasoning AI must possess to "think" like legal professionals: case-based reasoning (examining precedents), rule-based reasoning (applying written laws), and abductive reasoning (constructing plausible explanations from incomplete information). AI can handle the first two. The third? Completely beyond current capabilities. And that's the one that matters most.

The Abductive Reasoning Problem: When Facts Don't Speak for Themselves

Here's the type of problem AI can't solve: A man is found holding a knife. The victim has a stab wound. Did the man stab the victim, or did a gust of wind blow the knife into his hand? A human lawyer looks at context—motive, opportunity, physical evidence, witness credibility—and constructs a plausible narrative that either incriminates or exonerates.

AI doesn't do that. It pattern-matches against similar cases. If 87% of cases where someone held a knife near a stabbing resulted in convictions, the AI will predict conviction. But that's not reasoning—that's statistical correlation disguised as judgment. And in a legal system where liberty depends on nuance, correlation isn't good enough.

Goebel's assessment is brutal: "Modern large language models don't reason. They're like your friend who has read every page of Encyclopedia Britannica, who has an opinion on everything but knows nothing about how the logic fits together."

This is the core problem with deploying LLMs in high-stakes decision-making. They're encyclopedic but not logical. They can summarize precedent, draft motions, and flag relevant statutes. But they can't construct the chain of inference that connects evidence to conclusion. And without that, they're dangerous.

The Hallucination Problem: When AI Invents "Facts"

The other catastrophic failure mode: hallucinations. LLMs don't just fail to reason—they confidently invent facts. A lawyer using an AI tool to draft a brief might find that the AI cited cases that don't exist, statutes that were never passed, or precedents that were overturned decades ago. This isn't hypothetical. It's already happened. Multiple times. In actual court filings.

In one high-profile case, a lawyer submitted a brief with AI-generated case citations. The judge discovered the cases were fabricated. The lawyer faced sanctions. The client's case was jeopardized. And the legal profession learned a hard lesson: generic LLMs applied to legal work are career-ending risks.

Goebel's paper emphasizes this: generic LLMs are "at best unreliable and, at worst, potentially career-ending for lawyers." The challenge for AI scientists is whether they can develop a reasoning framework that works in conjunction with LLMs to focus on accuracy and contextual relevance. Not replacing human judgment—augmenting it with tools that don't invent facts.

The Supreme Court Deadline Problem: Why AI in Law Feels Urgent

Goebel's work is motivated by a real crisis: Canada's R. v. Jordan decision, which shortened the time prosecutors have to bring cases to trial. The result? Cases as severe as sexual assault and fraud are being thrown out of court because the system can't process them fast enough.

This is the paradox: the legal system desperately needs efficiency tools, but the tools available are fundamentally unreliable. Judges and lawyers face enormous dockets, narrow time windows, and enormous pressure to deliver justice quickly. AI promises to help. But if the AI can't reason, can't verify facts, and can't construct legally sound arguments, it doesn't help—it just introduces new failure modes.

Goebel's mandate is clear: "The passion and the value to society is to improve judicial decision-making." Not replace judges. Not automate verdicts. But provide tools that help overwhelmed legal professionals process information faster and more accurately. The problem is we're not there yet. And deploying tools prematurely creates risks that outweigh benefits.

The Framework: What AI Needs to Actually Be Useful in Law

Goebel's paper outlines what's required for AI to be ethically and effectively deployed in legal contexts:

1. Case-based reasoning –

AI can do this. Retrieve precedents, identify patterns, suggest similar cases. This is retrieval and pattern-matching, and LLMs are good at it.

2. Rule-based reasoning –

AI can partially do this. Apply written laws to facts. But it struggles with edge cases, ambiguity, and conflicting statutes.

3. Abductive reasoning –

AI can't do this. Construct plausible explanations from incomplete information. This requires logical inference, contextual judgment, and the ability to weigh competing narratives. LLMs don't have this capability. They fake it by pattern-matching, which works until it catastrophically doesn't.

The solution isn't a single "godlike" LLM that can render perfect judicial decisions. Goebel is explicit: claims that such a tool is imminent are "absurd." Every judge he's spoken to acknowledges there is no such thing as perfect judgment. The question is whether current technologies provide more value than harm.

Goebel foresees many separate AI tools for different legal tasks—document review, statute retrieval, precedent analysis, contract drafting—rather than one system that does everything. That's the realistic path forward. Not AI replacing lawyers, but AI handling specific, well-defined tasks where failure modes are manageable.

What This Means for Every Industry Deploying AI for "Decision-Making"

The legal profession is a canary in the coal mine for high-stakes AI deployment. If AI can't be trusted to reason about guilt or innocence, can it be trusted to approve loans, diagnose diseases, or make hiring decisions? The reasoning challenges are the same. The stakes are just as high.

The lesson from Goebel's research applies everywhere: AI is excellent at retrieval and pattern-matching, but terrible at reasoning and judgment. If your use case requires inference, context, or logical construction of arguments, generic LLMs will fail. And if failure means lawsuits, regulatory penalties, or harm to people, you need guardrails.

The framework Goebel proposes—specialized tools, human oversight, reasoning layers on top of LLMs—is the path forward for any industry trying to deploy AI responsibly. Not "let the AI decide," but "let the AI assist, and let humans verify."

For marketers, this is a reminder: AI-generated content, recommendations, and strategies need human judgment. The AI can draft, suggest, and retrieve. But it can't reason about whether the strategy is sound, whether the messaging resonates, or whether the campaign aligns with brand values. That's still your job.

Want to build AI workflows that leverage automation without outsourcing judgment? Let's talk. Because the companies that win won't just adopt AI tools—they'll understand where those tools fail and build systems that account for it.