From cfd320887a03856361c497eea21805c68af58b34 Mon Sep 17 00:00:00 2001 From: pritishpai Date: Wed, 29 Jan 2025 14:00:21 -0500 Subject: [PATCH 01/14] Added pipeline migration docs - migrate pipeline command --- docs/ucx/docs/reference/commands/index.mdx | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/ucx/docs/reference/commands/index.mdx b/docs/ucx/docs/reference/commands/index.mdx index 9e38daa27f..71827e179b 100644 --- a/docs/ucx/docs/reference/commands/index.mdx +++ b/docs/ucx/docs/reference/commands/index.mdx @@ -660,6 +660,13 @@ It takes a `WorkspaceClient` object and `from` and `to` parameters as parameters the `TableMove` class. This command is useful for developers and administrators who want to create an alias for a table. It can also be used to debug issues related to table aliasing. + +### `migrate-dlt-pipelines` + +```text +$ databricks labs ucx migrate-dlt-pipelines [--include-pipeline-ids ] [--exclude-pipeline-ids ] +``` + ## Utility commands ### `logs` From c5e568c0830ccb90be5adf756e4eefb64408bd27 Mon Sep 17 00:00:00 2001 From: pritishpai Date: Wed, 29 Jan 2025 14:00:44 -0500 Subject: [PATCH 02/14] Added description, known issues and limitations --- docs/ucx/docs/reference/commands/index.mdx | 24 ++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/docs/ucx/docs/reference/commands/index.mdx b/docs/ucx/docs/reference/commands/index.mdx index 71827e179b..86c7b6faf7 100644 --- a/docs/ucx/docs/reference/commands/index.mdx +++ b/docs/ucx/docs/reference/commands/index.mdx @@ -660,6 +660,30 @@ It takes a `WorkspaceClient` object and `from` and `to` parameters as parameters the `TableMove` class. This command is useful for developers and administrators who want to create an alias for a table. It can also be used to debug issues related to table aliasing. +## Pipeline migration commands + +These commands are for the [pipeline migration process](/docs/process/#pipeline-migration-process) and require the [assessment workflow](/docs/reference/workflows#assessment-workflow) to be completed. +The pipeline migration process is an automated process that migrates Hive Metastore Delta Live Table (DLT) pipelines to the Unity Catalog. +One pre-requisite for the pipeline migration process is to have the DLT clone API to be enabled for the workspace. + +Known issues and Limitations: +- Only clones from HMS to UC are supported. +- Pipelines may only be cloned into the same workspace. +- HMS pipelines must currently be publishing tables to some target schema. +- Only the following streaming sources are supported: + - Delta + - Autoloader + - If your pipeline uses Autoloader with file notification events, do not run the original HMS pipeline after cloning as this will cause some file notification events to be dropped from the UC clone. If the HMS original was started accidentally, missed files can be backfilled by using the `cloudFiles.backfillInterval` option in Autoloader. + - Kafka where "kafka.group.id" is not set + - Kinesis where "consumerMode" is not "efo" +- Maintenance is automatically paused (for both pipelines) while migration is in progress +- If an Autoloader source specifies an explicit cloudFiles.schemaLocation, mergeSchema needs to be set to true for the HMS original and UC clone to operate concurrently. +- Pipelines that publish tables to custom schemas are not supported. +- On tables cloned to UC, time travel queries are undefined when querying by timestamp to versions originally written on HMS. Time travel queries by version will work correctly, as will time travel queries by timestamp to versions written on UC. +- All existing limitations of using DLT on UC. +- Existing UC limitations. + - If tables in the HMS pipeline specify storage locations (using the "path" parameter in Python or the LOCATION clause in SQL), the configuration "pipelines.migration.ignoreExplicitPath" can be set to "true" to ignore the parameter in the cloned pipeline. + ### `migrate-dlt-pipelines` From d4c5f744191d75504d0b27010bc1bd439e605db3 Mon Sep 17 00:00:00 2001 From: pritishpai Date: Wed, 29 Jan 2025 17:46:59 -0500 Subject: [PATCH 03/14] Fix links move description to process --- docs/ucx/docs/process/index.mdx | 28 +++++++++++++++++++++- docs/ucx/docs/reference/commands/index.mdx | 23 +----------------- 2 files changed, 28 insertions(+), 23 deletions(-) diff --git a/docs/ucx/docs/process/index.mdx b/docs/ucx/docs/process/index.mdx index d8d0f62e81..5c9b1cf5aa 100644 --- a/docs/ucx/docs/process/index.mdx +++ b/docs/ucx/docs/process/index.mdx @@ -11,7 +11,8 @@ On a high level, the steps in migration process are: 4. [data reconciliation](/docs/reference/workflows#post-migration-data-reconciliation-workflow) 5. [code migration](#code-migration) 6. [final details](#final-details) - +5. [code migration](/docs/reference/commands#code-migration-commands) +6. [pipeline migration](/docs/process#pipeline-migration-process) The migration process can be schematic visualized as: ```mermaid @@ -288,6 +289,7 @@ databricks labs ucx revert-migrated-tables --schema X --table Y [--delete-manage The [`revert-migrated-tables` command](/docs/reference/commands#revert-migrated-tables) drops the Unity Catalog table or view and reset the `upgraded_to` property on the source object. Use this command to allow for migrating a table or view again. + ## Code Migration After you're done with the [table migration](#table-migration-process) and @@ -311,3 +313,27 @@ After investigating the code linter advices, code can be migrated. We recommend Once you're done with the [code migration](#code-migration), you can run the: - [`cluster-remap` command](/docs/reference/commands#cluster-remap) to remap the clusters to be UC compatible. + + +## Pipeline Migration Process + +The pipeline migration process is an automated process that migrates Hive Metastore Delta Live Table (DLT) pipelines to the Unity Catalog. +One pre-requisite for the pipeline migration process is to have the DLT clone API to be enabled for the workspace. + +Known issues and Limitations: +- Only clones from HMS to UC are supported. +- Pipelines may only be cloned into the same workspace. +- HMS pipelines must currently be publishing tables to some target schema. +- Only the following streaming sources are supported: + - Delta + - Autoloader + - If your pipeline uses Autoloader with file notification events, do not run the original HMS pipeline after cloning as this will cause some file notification events to be dropped from the UC clone. If the HMS original was started accidentally, missed files can be backfilled by using the `cloudFiles.backfillInterval` option in Autoloader. + - Kafka where "kafka.group.id" is not set + - Kinesis where "consumerMode" is not "efo" +- Maintenance is automatically paused (for both pipelines) while migration is in progress +- If an Autoloader source specifies an explicit cloudFiles.schemaLocation, mergeSchema needs to be set to true for the HMS original and UC clone to operate concurrently. +- Pipelines that publish tables to custom schemas are not supported. +- On tables cloned to UC, time travel queries are undefined when querying by timestamp to versions originally written on HMS. Time travel queries by version will work correctly, as will time travel queries by timestamp to versions written on UC. +- All existing limitations of using DLT on UC. +- Existing UC limitations. + - If tables in the HMS pipeline specify storage locations (using the "path" parameter in Python or the LOCATION clause in SQL), the configuration "pipelines.migration.ignoreExplicitPath" can be set to "true" to ignore the parameter in the cloned pipeline. diff --git a/docs/ucx/docs/reference/commands/index.mdx b/docs/ucx/docs/reference/commands/index.mdx index 86c7b6faf7..f5cce43e4e 100644 --- a/docs/ucx/docs/reference/commands/index.mdx +++ b/docs/ucx/docs/reference/commands/index.mdx @@ -662,28 +662,7 @@ It can also be used to debug issues related to table aliasing. ## Pipeline migration commands -These commands are for the [pipeline migration process](/docs/process/#pipeline-migration-process) and require the [assessment workflow](/docs/reference/workflows#assessment-workflow) to be completed. -The pipeline migration process is an automated process that migrates Hive Metastore Delta Live Table (DLT) pipelines to the Unity Catalog. -One pre-requisite for the pipeline migration process is to have the DLT clone API to be enabled for the workspace. - -Known issues and Limitations: -- Only clones from HMS to UC are supported. -- Pipelines may only be cloned into the same workspace. -- HMS pipelines must currently be publishing tables to some target schema. -- Only the following streaming sources are supported: - - Delta - - Autoloader - - If your pipeline uses Autoloader with file notification events, do not run the original HMS pipeline after cloning as this will cause some file notification events to be dropped from the UC clone. If the HMS original was started accidentally, missed files can be backfilled by using the `cloudFiles.backfillInterval` option in Autoloader. - - Kafka where "kafka.group.id" is not set - - Kinesis where "consumerMode" is not "efo" -- Maintenance is automatically paused (for both pipelines) while migration is in progress -- If an Autoloader source specifies an explicit cloudFiles.schemaLocation, mergeSchema needs to be set to true for the HMS original and UC clone to operate concurrently. -- Pipelines that publish tables to custom schemas are not supported. -- On tables cloned to UC, time travel queries are undefined when querying by timestamp to versions originally written on HMS. Time travel queries by version will work correctly, as will time travel queries by timestamp to versions written on UC. -- All existing limitations of using DLT on UC. -- Existing UC limitations. - - If tables in the HMS pipeline specify storage locations (using the "path" parameter in Python or the LOCATION clause in SQL), the configuration "pipelines.migration.ignoreExplicitPath" can be set to "true" to ignore the parameter in the cloned pipeline. - +These commands are for [pipeline migration](/docs/process#pipeline-migration-process) and require the [assessment workflow](/docs/reference/workflows#assessment-workflow) to be completed. ### `migrate-dlt-pipelines` From e9c09fea5b055a223cd49cd5abf1170a276e24ef Mon Sep 17 00:00:00 2001 From: pritishpai Date: Wed, 29 Jan 2025 18:09:33 -0500 Subject: [PATCH 04/14] Add prerequisite blockquote and change subheading font --- docs/ucx/docs/process/index.mdx | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/docs/ucx/docs/process/index.mdx b/docs/ucx/docs/process/index.mdx index 5c9b1cf5aa..d89122cdc4 100644 --- a/docs/ucx/docs/process/index.mdx +++ b/docs/ucx/docs/process/index.mdx @@ -13,6 +13,7 @@ On a high level, the steps in migration process are: 6. [final details](#final-details) 5. [code migration](/docs/reference/commands#code-migration-commands) 6. [pipeline migration](/docs/process#pipeline-migration-process) + The migration process can be schematic visualized as: ```mermaid @@ -317,10 +318,14 @@ Once you're done with the [code migration](#code-migration), you can run the: ## Pipeline Migration Process +> You are required to complete the [assessment workflow](/docs/reference/workflows#assessment-workflow) before starting the pipeline migration workflow. + The pipeline migration process is an automated process that migrates Hive Metastore Delta Live Table (DLT) pipelines to the Unity Catalog. -One pre-requisite for the pipeline migration process is to have the DLT clone API to be enabled for the workspace. -Known issues and Limitations: +Upon the first update, the cloned pipeline will copy over all the data and checkpoints, and then run normally thereafter. After the cloned pipeline reaches ‘RUNNING’, both the original and the cloned pipeline can run independently. + + +### Known issues and Limitations: - Only clones from HMS to UC are supported. - Pipelines may only be cloned into the same workspace. - HMS pipelines must currently be publishing tables to some target schema. From 3901d3a06198b592b0ad0dfef9730d45b091682c Mon Sep 17 00:00:00 2001 From: pritishpai Date: Wed, 29 Jan 2025 18:26:14 -0500 Subject: [PATCH 05/14] Add external links --- docs/ucx/docs/process/index.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/ucx/docs/process/index.mdx b/docs/ucx/docs/process/index.mdx index d89122cdc4..889ddd8f52 100644 --- a/docs/ucx/docs/process/index.mdx +++ b/docs/ucx/docs/process/index.mdx @@ -333,12 +333,12 @@ Upon the first update, the cloned pipeline will copy over all the data and check - Delta - Autoloader - If your pipeline uses Autoloader with file notification events, do not run the original HMS pipeline after cloning as this will cause some file notification events to be dropped from the UC clone. If the HMS original was started accidentally, missed files can be backfilled by using the `cloudFiles.backfillInterval` option in Autoloader. - - Kafka where "kafka.group.id" is not set - - Kinesis where "consumerMode" is not "efo" + - Kafka where `kafka.group.id` is not set + - Kinesis where `consumerMode` is not "efo" - Maintenance is automatically paused (for both pipelines) while migration is in progress - If an Autoloader source specifies an explicit cloudFiles.schemaLocation, mergeSchema needs to be set to true for the HMS original and UC clone to operate concurrently. - Pipelines that publish tables to custom schemas are not supported. - On tables cloned to UC, time travel queries are undefined when querying by timestamp to versions originally written on HMS. Time travel queries by version will work correctly, as will time travel queries by timestamp to versions written on UC. -- All existing limitations of using DLT on UC. -- Existing UC limitations. +- [All existing limitations](https://docs.databricks.com/en/delta-live-tables/unity-catalog.html#limitations) of using DLT on UC. +- [Existing UC limitations](https://docs.databricks.com/en/data-governance/unity-catalog/index.html#limitations) - If tables in the HMS pipeline specify storage locations (using the "path" parameter in Python or the LOCATION clause in SQL), the configuration "pipelines.migration.ignoreExplicitPath" can be set to "true" to ignore the parameter in the cloned pipeline. From 7c3db76e28284511feb687800ae0fa4acc7ffef8 Mon Sep 17 00:00:00 2001 From: pritishpai Date: Thu, 30 Jan 2025 13:02:12 -0500 Subject: [PATCH 06/14] Highlight keywords --- docs/ucx/docs/process/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ucx/docs/process/index.mdx b/docs/ucx/docs/process/index.mdx index 889ddd8f52..8ec6be2f5d 100644 --- a/docs/ucx/docs/process/index.mdx +++ b/docs/ucx/docs/process/index.mdx @@ -336,7 +336,7 @@ Upon the first update, the cloned pipeline will copy over all the data and check - Kafka where `kafka.group.id` is not set - Kinesis where `consumerMode` is not "efo" - Maintenance is automatically paused (for both pipelines) while migration is in progress -- If an Autoloader source specifies an explicit cloudFiles.schemaLocation, mergeSchema needs to be set to true for the HMS original and UC clone to operate concurrently. +- If an Autoloader source specifies an explicit `cloudFiles.schemaLocation`, `mergeSchema` needs to be set to true for the HMS original and UC clone to operate concurrently. - Pipelines that publish tables to custom schemas are not supported. - On tables cloned to UC, time travel queries are undefined when querying by timestamp to versions originally written on HMS. Time travel queries by version will work correctly, as will time travel queries by timestamp to versions written on UC. - [All existing limitations](https://docs.databricks.com/en/delta-live-tables/unity-catalog.html#limitations) of using DLT on UC. From 3b469875431cbd7ed359f9b89f63754669561fa9 Mon Sep 17 00:00:00 2001 From: pritishpai Date: Thu, 30 Jan 2025 13:08:17 -0500 Subject: [PATCH 07/14] Autoloader link documentation --- docs/ucx/docs/process/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ucx/docs/process/index.mdx b/docs/ucx/docs/process/index.mdx index 8ec6be2f5d..cad8b47ae1 100644 --- a/docs/ucx/docs/process/index.mdx +++ b/docs/ucx/docs/process/index.mdx @@ -331,7 +331,7 @@ Upon the first update, the cloned pipeline will copy over all the data and check - HMS pipelines must currently be publishing tables to some target schema. - Only the following streaming sources are supported: - Delta - - Autoloader + - [Autoloader](https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/index.html) - If your pipeline uses Autoloader with file notification events, do not run the original HMS pipeline after cloning as this will cause some file notification events to be dropped from the UC clone. If the HMS original was started accidentally, missed files can be backfilled by using the `cloudFiles.backfillInterval` option in Autoloader. - Kafka where `kafka.group.id` is not set - Kinesis where `consumerMode` is not "efo" From ca1468995e2d2cf3a2fefeadd2a536e98bbee5e8 Mon Sep 17 00:00:00 2001 From: pritishpai Date: Thu, 30 Jan 2025 13:10:57 -0500 Subject: [PATCH 08/14] DLT maintenance link documentation --- docs/ucx/docs/process/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ucx/docs/process/index.mdx b/docs/ucx/docs/process/index.mdx index cad8b47ae1..ce75cbb1c1 100644 --- a/docs/ucx/docs/process/index.mdx +++ b/docs/ucx/docs/process/index.mdx @@ -335,7 +335,7 @@ Upon the first update, the cloned pipeline will copy over all the data and check - If your pipeline uses Autoloader with file notification events, do not run the original HMS pipeline after cloning as this will cause some file notification events to be dropped from the UC clone. If the HMS original was started accidentally, missed files can be backfilled by using the `cloudFiles.backfillInterval` option in Autoloader. - Kafka where `kafka.group.id` is not set - Kinesis where `consumerMode` is not "efo" -- Maintenance is automatically paused (for both pipelines) while migration is in progress +- [Maintenance](https://docs.databricks.com/en/delta-live-tables/index.html#maintenance-tasks-performed-by-delta-live-tables) is automatically paused (for both pipelines) while migration is in progress - If an Autoloader source specifies an explicit `cloudFiles.schemaLocation`, `mergeSchema` needs to be set to true for the HMS original and UC clone to operate concurrently. - Pipelines that publish tables to custom schemas are not supported. - On tables cloned to UC, time travel queries are undefined when querying by timestamp to versions originally written on HMS. Time travel queries by version will work correctly, as will time travel queries by timestamp to versions written on UC. From e0bfe0f625c1f3b16f1ad06f1eb99054a7a3bb51 Mon Sep 17 00:00:00 2001 From: pritishpai Date: Thu, 30 Jan 2025 13:12:09 -0500 Subject: [PATCH 09/14] Wording --- docs/ucx/docs/process/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ucx/docs/process/index.mdx b/docs/ucx/docs/process/index.mdx index ce75cbb1c1..cf1ba5e2d4 100644 --- a/docs/ucx/docs/process/index.mdx +++ b/docs/ucx/docs/process/index.mdx @@ -327,7 +327,7 @@ Upon the first update, the cloned pipeline will copy over all the data and check ### Known issues and Limitations: - Only clones from HMS to UC are supported. -- Pipelines may only be cloned into the same workspace. +- Pipelines may only be cloned within the same workspace. - HMS pipelines must currently be publishing tables to some target schema. - Only the following streaming sources are supported: - Delta From 2e07073bf20b2c6ca1d845c911c5f44c9a1d7a04 Mon Sep 17 00:00:00 2001 From: pritishpai Date: Thu, 30 Jan 2025 13:30:40 -0500 Subject: [PATCH 10/14] Add an example --- docs/ucx/docs/process/index.mdx | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/ucx/docs/process/index.mdx b/docs/ucx/docs/process/index.mdx index cf1ba5e2d4..f7325ae0c3 100644 --- a/docs/ucx/docs/process/index.mdx +++ b/docs/ucx/docs/process/index.mdx @@ -320,10 +320,12 @@ Once you're done with the [code migration](#code-migration), you can run the: > You are required to complete the [assessment workflow](/docs/reference/workflows#assessment-workflow) before starting the pipeline migration workflow. -The pipeline migration process is an automated process that migrates Hive Metastore Delta Live Table (DLT) pipelines to the Unity Catalog. +The pipeline migration process is an automated process that clones the Hive Metastore Delta Live Table (DLT) pipelines to the Unity Catalog. Upon the first update, the cloned pipeline will copy over all the data and checkpoints, and then run normally thereafter. After the cloned pipeline reaches ‘RUNNING’, both the original and the cloned pipeline can run independently. +#### Example: +Existing HMS DLT pipeline is called "dlt_pipeline", the pipeline will be stopped and renamed to "dlt_pipeline [OLD]". The new cloned pipeline will be "dlt_pipeline". ### Known issues and Limitations: - Only clones from HMS to UC are supported. From 18ec3e8d936106455e3d9376630d94b9bfbe1e6a Mon Sep 17 00:00:00 2001 From: pritishpai Date: Thu, 30 Jan 2025 14:42:59 -0500 Subject: [PATCH 11/14] Added more usage notes --- docs/ucx/docs/process/index.mdx | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/ucx/docs/process/index.mdx b/docs/ucx/docs/process/index.mdx index f7325ae0c3..5f76ee9240 100644 --- a/docs/ucx/docs/process/index.mdx +++ b/docs/ucx/docs/process/index.mdx @@ -344,3 +344,12 @@ Existing HMS DLT pipeline is called "dlt_pipeline", the pipeline will be stopped - [All existing limitations](https://docs.databricks.com/en/delta-live-tables/unity-catalog.html#limitations) of using DLT on UC. - [Existing UC limitations](https://docs.databricks.com/en/data-governance/unity-catalog/index.html#limitations) - If tables in the HMS pipeline specify storage locations (using the "path" parameter in Python or the LOCATION clause in SQL), the configuration "pipelines.migration.ignoreExplicitPath" can be set to "true" to ignore the parameter in the cloned pipeline. + + +### Considerations +- Do not edit the notebooks that define the pipeline during cloning. +- The original pipeline should not be running when requesting the clone. +- When a clone is requested, DLT will automatically start an update to migrate the existing data and metadata for Streaming Tables, allowing them to pick up where the original pipeline left off. +- It is expected that the update metrics do not include the migrated data. +- Make sure all name-based references in the HMS pipeline are fully qualified, e.g. hive_metastore.schema.table +- After the UC clone reaches RUNNING, both the original pipeline and the cloned pipeline may run independently. From 0aba4e0a9d22ae958ef60b8b05703b8e003a5c30 Mon Sep 17 00:00:00 2001 From: pritishpai Date: Thu, 30 Jan 2025 14:44:49 -0500 Subject: [PATCH 12/14] Fixed wording --- docs/ucx/docs/process/index.mdx | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/ucx/docs/process/index.mdx b/docs/ucx/docs/process/index.mdx index 5f76ee9240..9993a6c807 100644 --- a/docs/ucx/docs/process/index.mdx +++ b/docs/ucx/docs/process/index.mdx @@ -320,8 +320,7 @@ Once you're done with the [code migration](#code-migration), you can run the: > You are required to complete the [assessment workflow](/docs/reference/workflows#assessment-workflow) before starting the pipeline migration workflow. -The pipeline migration process is an automated process that clones the Hive Metastore Delta Live Table (DLT) pipelines to the Unity Catalog. - +The pipeline migration process is a workflow that clones the Hive Metastore Delta Live Table (DLT) pipelines to the Unity Catalog. Upon the first update, the cloned pipeline will copy over all the data and checkpoints, and then run normally thereafter. After the cloned pipeline reaches ‘RUNNING’, both the original and the cloned pipeline can run independently. #### Example: From 08f37ab85124fe9cf56d12d18facba785ed4cd8f Mon Sep 17 00:00:00 2001 From: pritishpai Date: Fri, 31 Jan 2025 08:59:43 -0500 Subject: [PATCH 13/14] Moving things around --- docs/ucx/docs/process/index.mdx | 19 ++++++++++--------- docs/ucx/docs/reference/commands/index.mdx | 2 +- 2 files changed, 11 insertions(+), 10 deletions(-) diff --git a/docs/ucx/docs/process/index.mdx b/docs/ucx/docs/process/index.mdx index 9993a6c807..96f15a3774 100644 --- a/docs/ucx/docs/process/index.mdx +++ b/docs/ucx/docs/process/index.mdx @@ -10,9 +10,9 @@ On a high level, the steps in migration process are: 3. [table migration](/docs/process/#table-migration-process) 4. [data reconciliation](/docs/reference/workflows#post-migration-data-reconciliation-workflow) 5. [code migration](#code-migration) -6. [final details](#final-details) -5. [code migration](/docs/reference/commands#code-migration-commands) -6. [pipeline migration](/docs/process#pipeline-migration-process) +6. [code migration](/docs/reference/commands#code-migration-commands) +7. [delta live table pipeline migration](/docs/process#delta-live-table-pipeline-migration-process) +8. [final details](#final-details) The migration process can be schematic visualized as: @@ -310,13 +310,8 @@ After investigating the code linter advices, code can be migrated. We recommend - Use the [`migrate-` commands`](/docs/reference/commands#code-migration-commands) to migrate resources. - Set the [default catalog](https://docs.databricks.com/en/catalogs/default.html) to Unity Catalog. -## Final details - -Once you're done with the [code migration](#code-migration), you can run the: -- [`cluster-remap` command](/docs/reference/commands#cluster-remap) to remap the clusters to be UC compatible. - -## Pipeline Migration Process +## Delta Live Table Pipeline Migration Process > You are required to complete the [assessment workflow](/docs/reference/workflows#assessment-workflow) before starting the pipeline migration workflow. @@ -352,3 +347,9 @@ Existing HMS DLT pipeline is called "dlt_pipeline", the pipeline will be stopped - It is expected that the update metrics do not include the migrated data. - Make sure all name-based references in the HMS pipeline are fully qualified, e.g. hive_metastore.schema.table - After the UC clone reaches RUNNING, both the original pipeline and the cloned pipeline may run independently. + + +## Final details + +Once you're done with the [code migration](#code-migration), you can run the: +- [`cluster-remap` command](/docs/reference/commands#cluster-remap) to remap the clusters to be UC compatible. diff --git a/docs/ucx/docs/reference/commands/index.mdx b/docs/ucx/docs/reference/commands/index.mdx index f5cce43e4e..431192f3bf 100644 --- a/docs/ucx/docs/reference/commands/index.mdx +++ b/docs/ucx/docs/reference/commands/index.mdx @@ -662,7 +662,7 @@ It can also be used to debug issues related to table aliasing. ## Pipeline migration commands -These commands are for [pipeline migration](/docs/process#pipeline-migration-process) and require the [assessment workflow](/docs/reference/workflows#assessment-workflow) to be completed. +These commands are for [pipeline migration](/docs/process#delta-live-table-pipeline-migration-process) and require the [assessment workflow](/docs/reference/workflows#assessment-workflow) to be completed. ### `migrate-dlt-pipelines` From 5e801da4753f710a1af77e2c41aaf5640ebe9af9 Mon Sep 17 00:00:00 2001 From: pritishpai Date: Fri, 31 Jan 2025 09:01:23 -0500 Subject: [PATCH 14/14] Remove duplicate code migration --- docs/ucx/docs/process/index.mdx | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/ucx/docs/process/index.mdx b/docs/ucx/docs/process/index.mdx index 96f15a3774..5933392d93 100644 --- a/docs/ucx/docs/process/index.mdx +++ b/docs/ucx/docs/process/index.mdx @@ -9,7 +9,6 @@ On a high level, the steps in migration process are: 2. [group migration](/docs/reference/workflows#group-migration-workflow) 3. [table migration](/docs/process/#table-migration-process) 4. [data reconciliation](/docs/reference/workflows#post-migration-data-reconciliation-workflow) -5. [code migration](#code-migration) 6. [code migration](/docs/reference/commands#code-migration-commands) 7. [delta live table pipeline migration](/docs/process#delta-live-table-pipeline-migration-process) 8. [final details](#final-details)