What is 'agentic AI' exactly?
Agentic AI refers to AI systems that do not merely generate output in response to a single prompt, but actively execute tasks across multiple steps — retrieving information, making decisions, performing actions in other systems, and self-correcting based on outcomes. Unlike a chatbot that answers a question, an AI agent can 'do' a piece of work.
In healthcare, that might mean an agent that reviews intake questionnaires, retrieves relevant information from an EHR, drafts a proposed care plan, and routes it to the clinician for approval. Or an agent that schedules a callback, updates the care record, and sends an SMS reminder to the client — all from a single instruction issued by a professional.
The technological advances since 2023 have made agentic AI possible at a scale that was previously unthinkable. At the same time: the greater the action space of an AI system, the greater the potential consequences of an error. That is why four principles are non-negotiable in healthcare — and no longer subject to debate.
1. Human always in the loop
Clinical decisions are made by humans. AI proposes, contextualizes, and drafts initial versions. Approving, adjusting, and deciding is and remains the work of human professionals. This is not a design choice that can later be disabled for the sake of 'efficiency' — it is an architectural principle.
In practice, this means: AI may draft a letter, but the clinician sends it. AI may produce a scheduling proposal, but the planner approves it. AI may raise a risk flag, but the responsible care professional acts — or decides not to act, with documented justification. The AI is an instrument; the professional is the actor.
Under the European AI Act, AI systems in healthcare fall into the high-risk category, with explicit requirements for human oversight. 'Human in the loop' is therefore not only an ethical choice but a statutory requirement. An AI system that autonomously implements medical decisions without human intervention is, by definition, non-compliant in a clinical context under European law.
This principle has implications for how AI is deployed. It works best in workflows that already include a human approval step (the clinician reviews the letter before sending it). It works less well where 'full automation' was the expectation. Many AI projects in healthcare fail not because of the technology, but because of incorrect assumptions about what AI is permitted to do autonomously.
2. Explainability over performance
A model that cannot explain its reasoning has no place in care. Simpler but more transparent almost always beats spectacular but opaque in a healthcare setting. That may sound like a conservative choice. In reality, it is a choice that delivers greater long-term value.
Unexplainable AI is unusable for healthcare professionals. The obligation to inform patients (GDPR art. 13–15), the requirement to explain automated decisions (art. 22), and professional disciplinary accountability for clinical decisions all make transparency not a feature but a prerequisite.
Explainability does not always mean technically 'explainable AI' in the academic sense. In practice it means: the AI shows which data it used, which rules it applied, and — for data-driven models — which factors carried the most weight. A retrieval-augmented model that cites source information is often more transparent and more useful than a black-box classifier that outputs only a score.
In practice, we find that explainability accelerates adoption. Professionals use AI more frequently when they can follow its reasoning — and they apply its outputs more appropriately. A 'smarter' model that is unusable delivers less value than a simpler model that is used every day.
3. Data stays within the walls
Healthcare data does not leave the EU. Private deployment or on-premise-compatible models are the only viable option for clinical data processing. This is not a preference — it is a framework requirement. GDPR, NEN 7510, and the forthcoming EHDS regulation impose strict requirements on data residency, encryption, access management, and purpose limitation for health data.
In practice, this means: models are run in EU cloud regions or on-premise; data is not used to train models that are also served to other customers; and APIs to AI providers outside the EU are used only for data that has been explicitly pseudonymized or de-identified, with a completed DPIA and a legal basis in place.
This does not exclude all modern AI — but it does determine which AI architectures are fit for purpose. Hosted SaaS models that log all prompts are rarely compliant for clinical data. Models that can be deployed in your own environment — or through partners that guarantee EU residency and data isolation — are the workable path. This is something AI vendors either understand or do not; when in doubt, the answer is 'they do not.'
4. Log the AI itself
Every AI action is audit-logged. What did the AI propose, what was approved, what was rejected? This is how you will be able to explain what happened two years from now. It is also how you learn what works and what does not — because without a log, you have no empirical basis for improvement.
The audit log of a healthcare AI system must contain at minimum: the timestamp, the actor (which user, which AI version), the input, the generated output, the human decision taken, and the data accesses used to produce the output. Stored immutably, retained long enough to meet retention requirements, and decoupled from the clinical data itself for access control purposes.
This principle has a second effect that is often underestimated: it enables continuous evaluation. When you know which AI suggestions have been rejected, you can measure the signal value of what the AI produces. Which types of proposals are frequently adopted? Which are not? That is where the real product learning loop resides — and it is impossible without an audit log.
For regulators (AP, IGJ) and for the patient, audit logging makes the use of AI explainable after the fact. For the organization, it makes the use of AI improvable over time. Both are preconditions for scale — not only once you have grown large, but from the very first production deployment.
What this means in practice
These four principles may sound strict. In practice, they function more as a filter than a barrier. They rule out a number of AI applications (fully autonomous decision-making systems, opaque models, hosted non-EU AI processing clinical data, AI without logging) and point you toward the applications that do work and deliver sustainable value.
In concrete terms, we see three types of use cases that operate well within these principles. First: documentation assistance — automated draft reports, letters, and referrals that the professional validates. Second: workflow orchestration — automated routing, scheduling, and reminders, with human approval at critical junctures. Third: alerting — risk flags, early warnings, and anomaly detection, where the system only draws attention but does not intervene.
What we do not do: deploy AI for diagnostic conclusions without clinical validation, or use AI for automated triage decisions without a human fallback. These principles are not intended to keep AI small. They are intended to allow AI to scale without breaking down at the first incident.


