Thomson Reuters CEO Steve Hasker on Building AI
At the AI Agent Conference in New York, Steve Hasker, President and CEO of Thomson Reuters, made the argument that I think every accounting firm, law...
3 min read
Joy Youell
:
May 7, 2026 12:00:00 AM
The AI Agent Conference in New York spent two days putting enterprise practitioners in the same room and asking them to be honest about what's working and what isn't. A panel featuring David Shim of Read AI, Daniel Vassilev of Relevance AI, Bryan Tsao of Jasper, and Scott Keane of Invictus Growth Partners took that charge seriously. Their session was titled "What Actually Works" and they meant it.
The panel's opening framing set the tone: "Everybody is building agents. The question is what actually works. Production changes everything."
@aeyespybywinsome Scalability is key.
♬ original sound - AEyeSpy
The panel kept returning to measurement — and kept acknowledging how unsolved it is. "How do I really measure it? How do I know it's doing the right thing? That's the elephant in the room."
This is the enterprise AI problem that doesn't get enough attention in technical discussions. You can deploy an agent. You can watch it run. Knowing whether it's doing the right thing consistently, at scale, across the full range of inputs it will encounter in production — that's a different and much harder question. Enterprise AI adoption stalls precisely here. Without measurable outcomes tied to business objectives, organizations can't justify continued investment, can't identify what to improve, and can't demonstrate ROI to leadership.
The panel made a conceptual distinction that's worth carrying into any enterprise AI conversation. Agents are not traditional software applications. "This is a workflow, not traditional software. Traditional monitoring assumptions break."
The difference matters because agents are probabilistic, adaptive, contextual, and continuously evolving. Deterministic testing — the kind that works for conventional software — is insufficient for systems that behave differently based on context, learn from feedback, and operate across dynamic multi-step workflows. The entire quality assurance and monitoring infrastructure has to be redesigned around this reality.
One of the most practically important points from the session: as agent deployments grow, human review becomes operationally impossible. "Human labeling does not scale. Interactions become long and complex. You need automated evaluation."
This is a constraint most organizations hit later than they should. Early-stage deployments feel manageable with human review. Then volume grows, interactions get more complex, and the review burden compounds faster than headcount can absorb. The organizations that build machine-assisted monitoring from the beginning — before they need it — are in a fundamentally different position than the ones retrofitting it after the fact.

The panel pushed back hard on the idea of a universal evaluation framework for enterprise AI. Success metrics are entirely dependent on business context, and treating them as transferable across use cases is a mistake.
Healthcare AI optimizes for safety and accuracy. A wrong answer in a clinical context is a patient safety event. Sales and marketing AI optimizes for persuasion, engagement, and conversion. An overly cautious answer in a sales context is a missed opportunity. "There is no universal metric. Success depends on the business objective."
This sounds obvious stated plainly, but the implication is significant: every enterprise AI deployment needs its own evaluation framework designed around what that specific deployment is supposed to accomplish. That's organizational work, not just technical work.
The panel was consistent on this: building an agent in isolation is manageable. Connecting it to enterprise systems is where things get genuinely difficult. "Everything seems easy at first. Then you connect enterprise systems."
APIs integrate, workflows chain together, edge cases multiply, and complexity compounds exponentially. The organizations that survive this phase have centralized orchestration, unified governance, and shared infrastructure — a single platform perspective on agent management rather than a collection of independently deployed agents that interact unpredictably.
"You need one platform to manage this. Centralization becomes critical."
One of the more interesting conceptual turns in the session was the panel's comparison of agent management to employee management. KPIs. Guardrails. Performance monitoring. Evaluation frameworks. Accountability structures.
"You evaluate agents similarly to employees. You define KPIs. You create guardrails."
This isn't just a useful metaphor — it's a practical framework for organizations trying to figure out how to govern AI deployments at scale. The management infrastructure organizations have built for human workers maps reasonably well onto what agents need: clear objectives, performance standards, feedback mechanisms, and defined boundaries. The teams that borrow from that playbook will build more governable AI systems than the ones treating agents as pure technical infrastructure.
The panel closed on a point that came up across multiple sessions at the conference: governance retrofitted after deployment is harder and more expensive than governance built in from the start. "Organizations need operational readiness. Governance matters from day one."
Executive sponsorship, process maturity, operational discipline — these aren't soft requirements that can be addressed later. They're prerequisites for the kind of durable, scalable enterprise AI deployment that actually delivers on its promise. The organizations building that foundation now are the ones who will still be running production systems in two years.
This session was presented at the AI Agent Conference 2026 in New York. Panelists represented Read AI, Relevance AI, Jasper, and Invictus Growth Partners.
At the AI Agent Conference in New York, Steve Hasker, President and CEO of Thomson Reuters, made the argument that I think every accounting firm, law...
Computer scientist Randy Goebel has been running a competition for over a decade that exposes AI's most fundamental weakness in legal reasoning: it...
Every conversation you've had with an AI assistant has made it slightly dumber. Not the model — the opportunity. Every reply you sent, every...