Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding Docker packaging to support non-linux/macOS environments #274

Merged
merged 1 commit into from
Jul 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# OS agnostic runtime for Databricks Migrate
[Databricks Migrate](https://github.com/databrickslabs/migrate) is a tool from [Databricks Labs](https://github.com/databrickslabs) that facilitates migration of objects in one Databricks workspace to another. The tool requires a Unix/Linux OS environment to operate. The packaging in this repo allows the Migrate tool to be run in other OS environments (i.e. Windows) using Docker.

## Local Environment Prerequisites
The set of scripts located in this directory have the following local dependencies:
* [Docker Desktop](https://docs.docker.com/desktop/install/windows-install/)

## Getting Started
_Note: For the following steps, the legacy workspace that is being migrated `from` is aliased as `oldWS`. The new workspace that is being migrated `to` is aliased as `newWS`._

1. Open the `.\databricks\.databrickscfg` file in this repo. There are entries for `oldWS` and `newWS` in this file. Replace the value for each `host` with the appropriate URL for that workspace. The URL will be of the format `https://<workspace-host>`.
2. For each workspace, [generate an access token](https://docs.databricks.com/dev-tools/auth.html#personal-access-tokens-for-users) for a user that has admin privileges.
3. Copy the generated token into the `.\databricks\.databrickscfg` file in this repo. Replace the value for each `token` in the file with the appropriate token for that workspace.
4. Save the changes to `.\databricks\.databrickscfg`.


## Environment Setup
With the docker daemon running, navigate to this directory in your local shell, and execute the following command.
```
docker build -t databricks-migrate .
```
Once the image is done building, execute the following command from your shell to start the container. Your shell will be redirected to `/opt/migrate` inside the running container. The tool is now ready for use.
```
docker run -it --name databricks-migrate -v .\databricks\.databrickscfg:/root/.databrickscfg -v .\databricks:/databricks databricks-migrate
```

## Usage
For CDWNG workspaces, the following commands have been used to handle migration of specific Databricks objects.

Export objects from `oldWS`:
```
python migration_pipeline.py --azure --profile oldWS --export-pipeline --set-export-dir $SESSIONS_DIR --notebook-format SOURCE --keep-tasks users groups workspace_item_log workspace_acls notebooks clusters instance_pools jobs
```
Import objects to `newWS`:

> Note: the `--session` parameter needs to be set to that generated from the export session above. Here it is set as a environment variable $EXPORT_SESSION

```
python migration_pipeline.py --azure --profile newWS --import-pipeline --set-export-dir $SESSIONS_DIR --notebook-format SOURCE --session $EXPORT_SESSION --keep-tasks users groups workspace_item_log workspace_acls notebooks clusters instance_pools jobs
```

## Output and Logging
The `.\databricks` directory in this repo is mounted to the running container. This in combination with the preset $SESSIONS_DIR environment variable (see Usage above) allows the output of the migration pipeline to be available and persisted on the local host.
9 changes: 9 additions & 0 deletions docker/databricks/.databrickscfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[oldWS]
host = https://<workspace-host>
token = dapi...
jobs-api-version = 2.0

[newWS]
host = https://<workspace-host>
token = dapi...
jobs-api-version = 2.0
21 changes: 21 additions & 0 deletions docker/dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
FROM python:3

# mount the host dir with this file to the container DBX_DIR on container startup
ENV DBX_DIR=/databricks
ENV SESSIONS_DIR=${DBX_DIR}/migrate-sessions
ENV INSTALL_DIR=/opt

RUN mkdir -p ${SESSIONS_DIR}

# install and configure databricks-cli
RUN pip install --upgrade pip
RUN pip install --upgrade databricks-cli

# install and init the migrate utility
RUN cd ${INSTALL_DIR} \
&& git clone https://github.com/databrickslabs/migrate.git \
&& cd migrate \
&& python setup.py install

WORKDIR ${INSTALL_DIR}/migrate
ENTRYPOINT [ "/bin/bash" ]