Something shifted last year. I felt it in almost every conversation I had with clinical AI vendors and the health systems working with them.
Over the last few years, the main question was "Does this actually work?" Radiologists, CMOs, and CIOs sat with AI vendors and asked for proof. “Show me the sensitivity numbers.” “Show me the peer-reviewed study.” “Show me a reference site I can call.”
By the end of 2025, that question was mostly answered. The tools work. Detection figures improved. Gaps that once caused problems were closing. More studies appeared, skeptics became less vocal, and health systems running pilots saw real results.
That should be a moment to celebrate. It is. But in my day-to-day work with clinical AI vendors and the teams that deploy them, I'm seeing a new, harder question come up.
The pilot worked. Now what?
A pilot aims to answer one question: Does this technology bring clinical value? It has a set scope, a focused team, a well-defined timeline, and someone overseeing it from start to finish. When it works, everyone is pleased. Leadership signs off, the vendor celebrates, and the clinical champion gets recognition.
Then comes the request to scale. And that's where the conversations I'm having start to get honest.
Health system leaders often tell me the same thing: "We know the tool works. We just don't know how to run it as a program." Moving from a controlled pilot to real operations reveals gaps that pilots tend to hide.
IT teams that were focused during the pilot now have several other priorities. Integration that was simple becomes complex when rolling out to three hospitals with different PACS setups. The vendor’s dashboard was clear when one person managed it, but now no one is sure who should monitor it.
I've seen great clinical AI tools lose internal momentum not because they stopped working, but because the infrastructure around them couldn't keep pace.
From my experience working with vendors, the algorithm is usually the easiest part of scaling.
The real challenges are everything else. Security reviews start over with each new model. Integration work gets repeated because there’s no shared deployment path. Monitoring is done in a vendor portal, but only a few people have access, and no one checks it regularly.
Every new clinical AI tool a health system adds creates a new surface area to manage. And because most of those tools were bought and deployed independently, they don't share infrastructure. They don't share data standards. They definitely don't share a common evidence layer.
The result is a group of good tools that don’t form a unified suite. It’s like hiring great people but never giving them a way to work together. The talent is there, but the system is missing.
Another thing I often hear from clinicians is about trust. It’s not general distrust of AI, but frustration that they can’t see how the tools are performing with their own patients.
During a pilot, someone is generating that visibility. There's a report, a presentation, and a moment when results are shared. In production, that often disappears. Clinicians are using tools they believe in, but they're doing so more on faith than on evidence.
This is more important than it seems. When clinicians see an unexpected result or a model flags something as inaccurate, they need context. They want to know whether it’s a one-off or a pattern, and they need to see performance with their own patients, not just data from a vendor’s study.
Without that visibility, confidence erodes gradually. And once it erodes, it's slow to rebuild.
I’m not saying this transition is easy. Moving from pilot to program is truly hard work. But the health systems that do it well have a few things in common.
They stopped treating each new model as a separate project. Instead of rebuilding integration, security, and monitoring every time, they created a reusable process. New tools fit in, and the hard work is done just once.
They separated vendor claims from their own evidence. Instead of relying on vendor reports, they set up their own performance metrics using their own patient data, independent of the vendor. This changes renewal discussions and also affects what clinicians are willing to trust.
They give leadership a portfolio view. Not tool-by-tool dashboards, but a clear picture of what's running, how it's performing, where it's drifting, and what the return looks like across the organization. That visibility is what turns a collection of pilots into something a CIO can defend in a board meeting.
None of this is about replacing the clinical AI tools that are working. The algorithms are delivering. The vendors have done their part. What's missing in most health systems is the governance layer on top of those tools.
I’m excited by the quality of clinical AI available now. Working closely with vendors, I’ve seen what these tools can do. The improvements in detection rates and simplification of workflows are real, and the clinical value is clear.
What I think about now is how health systems capture that value at scale, not just in a controlled pilot with a champion in the room, but across every site, every service line, and every patient who deserves the benefit of these tools.
The algorithms are ready. The real question for 2026 is about governance.
That's the conversation I'm most interested in having. If you're working through this, I'd genuinely like to hear where you're stuck.
Sam Knapp is AI Partnerships Manager at Ferrum Health, where he works with clinical AI vendors and health systems to build governed, scalable AI programs.