1 min read
ChatGPT Agent Changes Everything
The future of work just showed up unannounced and started reorganizing your calendar. OpenAI's ChatGPT Agent launched today, and it's not just...
4 min read
Writing Team
:
Nov 10, 2025 8:00:00 AM
Computer scientist Randy Goebel has been running a competition for over a decade that exposes AI's most fundamental weakness in legal reasoning: it can retrieve statutes, analyze precedents, and summarize cases—but it can't make the yes/no judgment call. Did the defendant break the law, or not? That's where AI collapses. And according to new research published in Computer Law & Security Review, it reveals why deploying AI in courtrooms is both inevitable and terrifying.
Goebel, from the University of Alberta, and his colleagues in Japan use real cases from the Japanese bar exam to test AI systems. The task is simple: retrieve relevant statutes, apply them to the facts, and decide guilt or innocence. AI consistently fails the last part. Not because it lacks information—it has access to every legal precedent ever written. But because legal reasoning isn't retrieval. It's inference, judgment, and contextual logic. And large language models don't reason. They pattern-match.
The paper outlines three types of reasoning AI must possess to "think" like legal professionals: case-based reasoning (examining precedents), rule-based reasoning (applying written laws), and abductive reasoning (constructing plausible explanations from incomplete information). AI can handle the first two. The third? Completely beyond current capabilities. And that's the one that matters most.
Here's the type of problem AI can't solve: A man is found holding a knife. The victim has a stab wound. Did the man stab the victim, or did a gust of wind blow the knife into his hand? A human lawyer looks at context—motive, opportunity, physical evidence, witness credibility—and constructs a plausible narrative that either incriminates or exonerates.
AI doesn't do that. It pattern-matches against similar cases. If 87% of cases where someone held a knife near a stabbing resulted in convictions, the AI will predict conviction. But that's not reasoning—that's statistical correlation disguised as judgment. And in a legal system where liberty depends on nuance, correlation isn't good enough.
Goebel's assessment is brutal: "Modern large language models don't reason. They're like your friend who has read every page of Encyclopedia Britannica, who has an opinion on everything but knows nothing about how the logic fits together."
This is the core problem with deploying LLMs in high-stakes decision-making. They're encyclopedic but not logical. They can summarize precedent, draft motions, and flag relevant statutes. But they can't construct the chain of inference that connects evidence to conclusion. And without that, they're dangerous.
The other catastrophic failure mode: hallucinations. LLMs don't just fail to reason—they confidently invent facts. A lawyer using an AI tool to draft a brief might find that the AI cited cases that don't exist, statutes that were never passed, or precedents that were overturned decades ago. This isn't hypothetical. It's already happened. Multiple times. In actual court filings.
In one high-profile case, a lawyer submitted a brief with AI-generated case citations. The judge discovered the cases were fabricated. The lawyer faced sanctions. The client's case was jeopardized. And the legal profession learned a hard lesson: generic LLMs applied to legal work are career-ending risks.
Goebel's paper emphasizes this: generic LLMs are "at best unreliable and, at worst, potentially career-ending for lawyers." The challenge for AI scientists is whether they can develop a reasoning framework that works in conjunction with LLMs to focus on accuracy and contextual relevance. Not replacing human judgment—augmenting it with tools that don't invent facts.
Goebel's work is motivated by a real crisis: Canada's R. v. Jordan decision, which shortened the time prosecutors have to bring cases to trial. The result? Cases as severe as sexual assault and fraud are being thrown out of court because the system can't process them fast enough.
This is the paradox: the legal system desperately needs efficiency tools, but the tools available are fundamentally unreliable. Judges and lawyers face enormous dockets, narrow time windows, and enormous pressure to deliver justice quickly. AI promises to help. But if the AI can't reason, can't verify facts, and can't construct legally sound arguments, it doesn't help—it just introduces new failure modes.
Goebel's mandate is clear: "The passion and the value to society is to improve judicial decision-making." Not replace judges. Not automate verdicts. But provide tools that help overwhelmed legal professionals process information faster and more accurately. The problem is we're not there yet. And deploying tools prematurely creates risks that outweigh benefits.
Goebel's paper outlines what's required for AI to be ethically and effectively deployed in legal contexts:
AI can do this. Retrieve precedents, identify patterns, suggest similar cases. This is retrieval and pattern-matching, and LLMs are good at it.
AI can partially do this. Apply written laws to facts. But it struggles with edge cases, ambiguity, and conflicting statutes.
AI can't do this. Construct plausible explanations from incomplete information. This requires logical inference, contextual judgment, and the ability to weigh competing narratives. LLMs don't have this capability. They fake it by pattern-matching, which works until it catastrophically doesn't.
The solution isn't a single "godlike" LLM that can render perfect judicial decisions. Goebel is explicit: claims that such a tool is imminent are "absurd." Every judge he's spoken to acknowledges there is no such thing as perfect judgment. The question is whether current technologies provide more value than harm.
Goebel foresees many separate AI tools for different legal tasks—document review, statute retrieval, precedent analysis, contract drafting—rather than one system that does everything. That's the realistic path forward. Not AI replacing lawyers, but AI handling specific, well-defined tasks where failure modes are manageable.
The legal profession is a canary in the coal mine for high-stakes AI deployment. If AI can't be trusted to reason about guilt or innocence, can it be trusted to approve loans, diagnose diseases, or make hiring decisions? The reasoning challenges are the same. The stakes are just as high.
The lesson from Goebel's research applies everywhere: AI is excellent at retrieval and pattern-matching, but terrible at reasoning and judgment. If your use case requires inference, context, or logical construction of arguments, generic LLMs will fail. And if failure means lawsuits, regulatory penalties, or harm to people, you need guardrails.
The framework Goebel proposes—specialized tools, human oversight, reasoning layers on top of LLMs—is the path forward for any industry trying to deploy AI responsibly. Not "let the AI decide," but "let the AI assist, and let humans verify."
For marketers, this is a reminder: AI-generated content, recommendations, and strategies need human judgment. The AI can draft, suggest, and retrieve. But it can't reason about whether the strategy is sound, whether the messaging resonates, or whether the campaign aligns with brand values. That's still your job.
Want to build AI workflows that leverage automation without outsourcing judgment? Let's talk. Because the companies that win won't just adopt AI tools—they'll understand where those tools fail and build systems that account for it.
1 min read
The future of work just showed up unannounced and started reorganizing your calendar. OpenAI's ChatGPT Agent launched today, and it's not just...
California just did what the federal government has spent three years refusing to do: establish actual accountability standards for AI companies....
Another day, another headline screaming about AI gone rogue in the legal profession. This time, it's New Jersey attorney Sukjin Henry Cho, slapped...