3 min read

Read AI, Relevance AI, and Jasper on Building Enterprise AI Agents

Picture of Joy Youell Joy Youell : May 7, 2026 12:00:00 AM

Business Agentic AI Enterprise AI

6:09

The AI Agent Conference in New York spent two days putting enterprise practitioners in the same room and asking them to be honest about what's working and what isn't. A panel featuring David Shim of Read AI, Daniel Vassilev of Relevance AI, Bryan Tsao of Jasper, and Scott Keane of Invictus Growth Partners took that charge seriously. Their session was titled "What Actually Works" and they meant it.

The panel's opening framing set the tone: "Everybody is building agents. The question is what actually works. Production changes everything."

@aeyespybywinsome
Scalability is key.
♬ original sound - AEyeSpy

Evaluation Is the Elephant in the Room

The panel kept returning to measurement — and kept acknowledging how unsolved it is. "How do I really measure it? How do I know it's doing the right thing? That's the elephant in the room."

This is the enterprise AI problem that doesn't get enough attention in technical discussions. You can deploy an agent. You can watch it run. Knowing whether it's doing the right thing consistently, at scale, across the full range of inputs it will encounter in production — that's a different and much harder question. Enterprise AI adoption stalls precisely here. Without measurable outcomes tied to business objectives, organizations can't justify continued investment, can't identify what to improve, and can't demonstrate ROI to leadership.

Agents Are Workflows, Not Applications

The panel made a conceptual distinction that's worth carrying into any enterprise AI conversation. Agents are not traditional software applications. "This is a workflow, not traditional software. Traditional monitoring assumptions break."

The difference matters because agents are probabilistic, adaptive, contextual, and continuously evolving. Deterministic testing — the kind that works for conventional software — is insufficient for systems that behave differently based on context, learn from feedback, and operate across dynamic multi-step workflows. The entire quality assurance and monitoring infrastructure has to be redesigned around this reality.

Human Evaluation Doesn't Scale

One of the most practically important points from the session: as agent deployments grow, human review becomes operationally impossible. "Human labeling does not scale. Interactions become long and complex. You need automated evaluation."

This is a constraint most organizations hit later than they should. Early-stage deployments feel manageable with human review. Then volume grows, interactions get more complex, and the review burden compounds faster than headcount can absorb. The organizations that build machine-assisted monitoring from the beginning — before they need it — are in a fundamentally different position than the ones retrofitting it after the fact.

There Is No Universal Success Metric

The panel pushed back hard on the idea of a universal evaluation framework for enterprise AI. Success metrics are entirely dependent on business context, and treating them as transferable across use cases is a mistake.

Healthcare AI optimizes for safety and accuracy. A wrong answer in a clinical context is a patient safety event. Sales and marketing AI optimizes for persuasion, engagement, and conversion. An overly cautious answer in a sales context is a missed opportunity. "There is no universal metric. Success depends on the business objective."

This sounds obvious stated plainly, but the implication is significant: every enterprise AI deployment needs its own evaluation framework designed around what that specific deployment is supposed to accomplish. That's organizational work, not just technical work.

Complexity Explodes After Integration

The panel was consistent on this: building an agent in isolation is manageable. Connecting it to enterprise systems is where things get genuinely difficult. "Everything seems easy at first. Then you connect enterprise systems."

APIs integrate, workflows chain together, edge cases multiply, and complexity compounds exponentially. The organizations that survive this phase have centralized orchestration, unified governance, and shared infrastructure — a single platform perspective on agent management rather than a collection of independently deployed agents that interact unpredictably.

"You need one platform to manage this. Centralization becomes critical."

Agents Are Starting to Resemble Employees

One of the more interesting conceptual turns in the session was the panel's comparison of agent management to employee management. KPIs. Guardrails. Performance monitoring. Evaluation frameworks. Accountability structures.

"You evaluate agents similarly to employees. You define KPIs. You create guardrails."

This isn't just a useful metaphor — it's a practical framework for organizations trying to figure out how to govern AI deployments at scale. The management infrastructure organizations have built for human workers maps reasonably well onto what agents need: clear objectives, performance standards, feedback mechanisms, and defined boundaries. The teams that borrow from that playbook will build more governable AI systems than the ones treating agents as pure technical infrastructure.

Governance and Operational Readiness Matter From Day One

The panel closed on a point that came up across multiple sessions at the conference: governance retrofitted after deployment is harder and more expensive than governance built in from the start. "Organizations need operational readiness. Governance matters from day one."

Executive sponsorship, process maturity, operational discipline — these aren't soft requirements that can be addressed later. They're prerequisites for the kind of durable, scalable enterprise AI deployment that actually delivers on its promise. The organizations building that foundation now are the ones who will still be running production systems in two years.

This session was presented at the AI Agent Conference 2026 in New York. Panelists represented Read AI, Relevance AI, Jasper, and Invictus Growth Partners.

Thomson Reuters CEO Steve Hasker on Building AI

Joy Youell : May 8, 2026 12:00:00 AM

At the AI Agent Conference in New York, Steve Hasker, President and CEO of Thomson Reuters, made the argument that I think every accounting firm, law...

Agentic AI Finance AI Adoption AI Accounting Tools

AI Can Read Every Law Ever Written—But It Can't Think Like a Lawyer

Writing Team : Nov 10, 2025 8:00:00 AM

Computer scientist Randy Goebel has been running a competition for over a decade that exposes AI's most fundamental weakness in legal reasoning: it...

Research Law firm marketing Agentic AI Legalities

Princeton Researchers Build an AI That Learns From Realtime Conversations

Writing Team : Mar 18, 2026 7:59:59 AM

Every conversation you've had with an AI assistant has made it slightly dumber. Not the model — the opportunity. Every reply you sent, every...

Agentic AI AI news AI Capabilities

Read AI, Relevance AI, and Jasper on Building Enterprise AI Agents

Evaluation Is the Elephant in the Room

Agents Are Workflows, Not Applications

Human Evaluation Doesn't Scale

There Is No Universal Success Metric

Complexity Explodes After Integration

Agents Are Starting to Resemble Employees

Governance and Operational Readiness Matter From Day One

Thomson Reuters CEO Steve Hasker on Building AI

AI Can Read Every Law Ever Written—But It Can't Think Like a Lawyer

Princeton Researchers Build an AI That Learns From Realtime Conversations

Industries We Primarily Support

Our Ideas

Our Services

Read AI, Relevance AI, and Jasper on Building Enterprise AI Agents

Evaluation Is the Elephant in the Room

Agents Are Workflows, Not Applications

Human Evaluation Doesn't Scale

hbspt.cta._relativeUrls=true;hbspt.cta.load(45570072, 'b05c6e8f-9e5f-4d57-a787-d9591a004b92', {"useNewLoader":"true","region":"na1"});

There Is No Universal Success Metric

Complexity Explodes After Integration

Agents Are Starting to Resemble Employees

Governance and Operational Readiness Matter From Day One

Thomson Reuters CEO Steve Hasker on Building AI

AI Can Read Every Law Ever Written—But It Can't Think Like a Lawyer

Princeton Researchers Build an AI That Learns From Realtime Conversations