Skip to content

Table of Content

Agent-Led Healthcare Cloud Migration

Why the standard playbook of migrate first, govern later creates a window of exposure no health system can afford, and what an agent-orchestrated cutover to the Databricks Intelligence Platform actually looks like in practice.

1. The Migration Compliance Gap

Most healthcare CIOs we talk to have already decided to modernize their EHR analytics systems about eighteen to twenty-four months ago. Now, the real challenge is figuring out how to carry out a multi-petabyte migration of data from Epic, Cerner, or Meditech, along with all related ETL processes and claims warehouses, without creating governance gaps that could raise concerns with the Office for Civil Rights, internal auditors, or state attorneys general.

It's understandable to feel uneasy about this process. Normally, during migration, data is transferred first, followed by setting up access controls, then establishing audit logging, and finally, creating policy attestations. Each of these steps is handled by separate teams, happens on its own schedule, and is tracked in its own documentation system. The gap between moving the first Delta table to the new lakehouse and achieving complete governance isn't usually a matter of days and in most cases we've observed, it takes months.

The interval in question presents a significant challenge. During this period, every PHI record passing through staging, each intermediate dataset produced by ETL conversions, and any query performed by engineering teams to validate row counts may constitute a HIPAA event that the platform is not yet able to attest to. The migration process remains incomplete until governance measures have been finalized. Treating these two processes as strictly sequential represents the primary compliance risk associated with healthcare cloud migration today.

The interval between the first Delta table landing in the new lakehouse and the moment governance catches up is the actual problem. Treating migration and governance as sequential work streams is the single largest source of compliance risk in healthcare cloud migration today.

2. Where Traditional Migrations Break

Healthcare projects that approach migration solely as a technological task tend to experience three recurring failure modes. These issues are obvious in hindsight, yet they are frequently overlooked during the planning stages.

Schema Drift during ETL conversion

EHR-derived data contains clinical context within column names, code value sets, and source-specific data types, which often do not map seamlessly from systems like Informatica, SSIS, or DataStage into PySpark. Typically, engineers rewriting this data may flatten or rename fields for convenience, producing a schema that appears appropriate upon review. However, issues emerge months later when downstream care management tools or regulatory reports cannot trace their values accurately back to the original data. At that stage, lineage must be reconstructed manually using spreadsheets, which leads to gaps in the audit trail.

PHI Sprawl in Staging Zones

Every reliable migration plan includes a staging area for incoming source data before it’s transformed, along with intermediate zones for reconciliation. In practice, these zones often hold copies of PHI that aren't being actively monitored. They’re created with broad permissions to give the migration team full access for troubleshooting purposes. Once the original ticket is closed after cutover, these zones are usually forgotten. One large project can leave behind many orphaned datasets, each posing a risk for potential breach notifications.

The Audit Trail Gap

Legacy governance tools, whether it is custom permission frameworks on Teradata, metadata catalog integrated with Hadoop, or row-level security managed through stored procedures, are rarely migrated to new platforms without modification. On Databricks, replacement controls must be carefully configured, rigorously tested, and validated to ensure they meet all legacy use cases. If this process is deferred until after the production cutover, there exists a defined period during which the new platform operates with permissions and audit mechanisms that have not been fully verified. Notably, such gaps are precisely what auditors are trained to identify.

3. Agent-Led Migration

The term "agent-led migration" is currently popular in vendor marketing, so it's important to clarify its meaning. In the Mastech delivery model, "agent-led" describes a particular five-stage conversion process where custom-built AI agents take on the deterministic, repetitive tasks that used to occupy 70–80% of a migration engineer’s workload. The steps in this pipeline are as follows:

Stage 1: Source Analyzer

A deterministic agent systematically analyses and reverse-engineers the source environment. It records each object, assesses row counts and volumetric data, maps dependencies within stored procedures and ETL processes, and generates an organized inventory of actual assets, distinguishing them from those reported in source documentation. In healthcare system projects, discrepancies between documented and existing data are frequently substantial.

Stage 2: Source to Target Conversion

LLM-based agents convert legacy code into PySpark or Databricks SQL aligned to the medallion architecture. Informatica mappings become Delta Live Tables pipelines. T-SQL becomes Databricks SQL. SAS DATA steps and PROC SQL become PySpark with functional validation. Stored procedures with embedded clinical business rules are translated with the rules preserved as testable units, not dissolved into the procedural body. Automated conversion routinely reaches 75%-85% of the codebase, with the remainder flagged for engineer review.

Stage 3: Alignment to the Well-Architected Framework

Converted code isn't simply transliterated; it is reorganized to fit Databricks platform standards, including Unity Catalog naming conventions, medallion zone separation, cluster sizing practices, and Photon-compatible query development. This will ensure that the code is not just converted one-to-one based on syntactic rules but rearchitected using the well-architected design principles of Databricks. Most in-house conversion tools fall short at this stage, as they generate code that works on Databricks but isn't actually developed with Databricks-specific requirements in mind.

Stage 4: Business Parity Check

An agent reviews complete result sets, comparing converted outputs with the originals rather than just checking samples. Any differences found are sorted by their most probable cause, such as data type changes, variations in handling null values, shifts in time zones, or actual logic errors. Instead of manually running comparisons, the engineer examining the parity report receives an organized analysis of any failures.

Stage 5: Reinforcement Validation

Outputs are calibrated against established human review patterns from prior engagements, applying security and clinical reasoning standards to any content involving PHI. The agent learns from the corrections made during the current engagement and improves its future suggestions within the same project.

Agent-led processes do not imply that agents are responsible for making clinical or compliance decisions; these judgments remain the prerogative of human professionals. Agents serve to automate manual conversion and reconciliation tasks, thereby allowing more time for human judgment within schedules.

It is worth saying clearly what agent-led does not mean. The agents do not make clinical decisions, they do not approve schema changes that affect downstream clinical applications without human sign-off, and they do not deploy to production without engineering review. The human-in-the-loop checkpoints are defined as part of the engagement, and the agents log every action they take inside MLflow so the entire conversion history is auditable after the fact.

4. Governance Built In, Not Bolted On

The reason this matters for healthcare specifically is that the same agentic framework that accelerates the migration also stands up the governance controls in parallel. The Databricks Intelligence Platform provides four native capabilities that, when configured correctly during migration rather than after it, close the governance gap before it ever opens.

Unity Catalog

As the source data is profiled in Stage 1 of the conversion pipeline, columns containing PHI identifiers are tagged automatically. As the target schemas are created in Stage 3, Unity Catalog policies are generated in parallel. Row filters restrict patient cohorts by clinical unit or geography. Column masks redact the eighteen Safe Harbor identifiers based on the requesting user’s group. Dynamic views provide de-identified cohorts for research without copying data. These policies are configured once and apply uniformly whether the consumer is a clinician with a notebook, a population health dashboard, a BI tool, or a downstream AI agent operating under a service principal. There is no parallel permissions model for AI workloads and no shadow copy of data for agent consumption.

AI Gateway

Where the migration involves any model-based workload, whether that is a fine-tuned model in the workspace, an external endpoint such as Anthropic Claude or Azure OpenAI, or a Databricks-hosted foundation model, the AI Gateway sits between the calling application and the model. Inbound policies detect Safe Harbor identifiers in prompts and block or redact them before they reach the model. Outbound policies apply the same detection to generated responses before they return to the caller. The Gateway is configured during migration rather than after model deployment, which means PHI exposure through an AI pathway is closed before any model is consumed in production.

MLflow and Lakehouse Monitoring

Every conversion action, every parity check result, every agent invocation, and every data access event is captured in MLflow and the system audit tables. When an internal audit team requests evidence of what was migrated, what was masked, and who accessed what during the cutover window, the response is a query, not a forensic project. After migration completes, the same observability infrastructure continues to support model monitoring, drift detection, and access pattern analysis without additional instrumentation.

AgentBricks

The agents that ran the migration share a runtime architecture with the agents that will operate care management, prior authorization, and clinical decision support workflows after cutover. AgentBricks enforces Unity Catalog permissions, AI Gateway content controls, and MLflow logging by default for any agent built within it. The governance properties that were configured during migration extend into agentic production workloads without rework.

5. The 6-Month Delivery Model

Industry average for an enterprise EHR migration of this complexity is twelve to eighteen months. The six-month timeline is achievable for scopes of 100 to 200 core tables with their associated ETL footprint, and it depends on the wave-based factory model rather than a monolithic cutover.

Weeks 1 to 3: Discovery and Assessment

The Source Profiler runs across the legacy platform to generate a structured inventory. In parallel, capacity planning, DBU forecasting, and TCO modelling are conducted. The Unity Catalog target schema is developed based on this inventory, and the PHI tagging strategy is evaluated in collaboration with the compliance team. Governance baselines are formally established prior to any data migration.

Weeks 4 to 12: Wave-based Conversion

Tables and pipelines are organized into waves consisting of twenty to thirty objects per wave. Each wave progresses through all five stages of the conversion pipeline concurrently with the discovery phase of subsequent waves. Engineers systematically review agent outputs, validate business parity reports, and authorize promotion to the integration environment. Unity Catalog policies are implemented in alignment with each wave, rather than being deferred for a final security assessment.

Weeks 13 to 18: Validation and Parallel run

The Validation Kit automates row count checks, schema validation, and query result comparisons across the migrated system. During the parallel run period, outputs from legacy and Databricks environments are compared, and discrepancies are resolved before cutover.

Weeks 19 to 24: Cutover and Stabilization

Production cutover occurs either in waves or during a single window, based on operational needs of the healthcare system. The Lakehouse Monitoring framework is active immediately post-cutover, enabling drift and anomaly detection from day one. The legacy environment remains read-only for the regulatory retention period.

Engagements often extend beyond this timeline for predictable reasons like late discovery of an undocumented dependency on a legacy reporting tool, lack of schema change approval from clinical app owners, or delays in identity provider integration with Unity Catalog. These aren't technical migration issues and can be identified in week one with proper discovery approach.

6. The HIPAA-Ready Lakehouse

The destination is not just a migrated estate, it is a governed lakehouse where essential health system workloads run on a single auditable data layer. Upon completion, four production patterns are instantly accessible.

  • Secure EHR analytics – Population health teams, finance departments, and clinical leadership access governed data views with sensitive information masked by default. De-identified cohorts are created as needed for research purposes, and lineage information provides accurate traceability from each report to the original source records.

  • Agentic care management - Care managers retain clinical judgment while agents handle outreach prioritization, communication drafting, gap-in-care identification, and visit summary preparation. Every action is bounded by Unity Catalog permissions, screened by the AI Gateway, and recorded for compliance review.

  • Clinical AI observability - Lakehouse Monitoring tracks model performance, data quality, and behavioral drift for all models impacting clinical decisions. When any monitored property exceeds set thresholds, alerts are triggered, enabling the AI ecosystem to be justified to regulators or clinical boards rather than simply explained.

  • Regulatory reporting - Evidence for CMS interoperability rules, federal algorithmic reporting requirements, and state-level AI and biometric transparency laws becomes a parameterized query, and not a multi-week compilation exercise. The reports that historically required several weeks of manual work can now run on demand from repeatable workflows.

7. Three Readiness Indicators

The suitability of agent-led migration for a particular health system is determined primarily by an assessment of three key readiness indicators, rather than by platform preference.

Governance maturity - How much time does it currently take to determine who accessed records for patients in this cohort within a specific date range? If it takes longer than a few hours, consider using the migration as an opportunity to address this issue, rather than simply transferring the existing limitation.

Legacy platform composition - Migrating from just a Teradata system is quite different from handling a migration that includes Hadoop, Informatica, SAS, and Oracle together. The agent-led model delivers the best results when dealing with diverse source platforms, since manual conversion becomes far less practical and cost-effective in these situations.

Agentic AI Readiness - When care management, prior authorization, or clinical decision support agents are projected within the next eighteen months, establishing the governance framework during migration is significantly more cost-effective than implementing it post-migration. The choices made regarding platform architecture at this stage fundamentally determine the future capabilities of agentic workloads.

Next Steps

The Mastech Governance Baseline Assessment is a four to six week, fixed-scope engagement that examines the current AI governance framework against the four core requirements covered in this article and produces a prioritized remediation plan specific to the target Databricks environment. The assessment covers Unity Catalog configuration, AI Gateway readiness, observability coverage across MLflow and Lakehouse Monitoring, and the maturity of existing agentic AI controls. For health systems where the migration itself has not yet been scoped, the assessment also produces the discovery output that feeds directly into a wave-based migration plan.

The autonomous enterprise in healthcare is no longer a future-state concept. The governance framework that supports it can be built in parallel with the migration that enables it, and the six-month timeline is achievable when the two are treated as a single engagement rather than two sequential ones.

Siddharth Jothimani

Siddharth Jothimani

Enterprise Data & AI professional with deep expertise in architecting scalable cloud data platforms, modern analytics solutions, and enterprise AI ecosystems. He has strong experience in driving end-to-end data modernization initiatives using the Databricks Platform, with expertise spanning scalable data engineering, unified governance, real-time analytics, AI/ML enablement, cloud migration, and the development of AI-ready Lakehouse architectures that enable business-driven innovation. Driven by continuous learning and innovation, he focuses on enabling organizations to build AI-ready data platforms in Databricks that are scalable, governed, and aligned to business growth.