Skip to content

AzureCosmosDB/data-migration-desktop-tool

Repository files navigation

Azure Cosmos DB Desktop Data Migration Tool

To access the archived version of the tool, navigate to the Archive branch.


Overview

The Azure Cosmos DB Desktop Data Migration Tool is an open-source project containing a command-line application that provides import and export functionality for Azure Cosmos DB.

Quick Installation

To use the tool, download the latest zip file for your platform (win-x64, mac-x64, or linux-x64) from Releases and extract all files to your desired install location. To begin a data transfer operation, first populate the migrationsettings.json file with appropriate settings for your data source and sink (see detailed instructions below or review examples), and then run the application from a command line: dmt.exe on Windows or dmt on other platforms.

Extension documentation

Multiple extensions are provided in this repository. Find the documentation for the usage and configuration of each using the links provided:

  1. Azure Cosmos DB

  2. Azure Table API

  3. JSON

  4. MongoDB

  5. SQL Server

  6. Parquet

  7. CSV

  8. File Storage

  9. Azure Blob Storage

  10. AWS S3

  11. Azure Cognitive Search

Architecture

The Azure Cosmos DB Desktop Data Migration Tool is a lightweight executable that leverages the Managed Extensibility Framework (MEF). MEF enables decoupled implementation of the core project and its extensions. The core application is a command-line executable responsible for composing the required extensions at runtime by automatically loading them from the Extensions folder of the application. An Extension is a class library that includes the implementation of a System as a Source and (optionally) Sink for data transfer. The core application project does not contain direct references to any extension implementation. Instead, these projects share a common interface.

An extensions folder holds multiple extensions implementations.The application loads extensions from the extensions folder and executes functionality based on an interface implementation.

Project Structure

The Cosmos DB Data Migration Tool core project is a C# command-line executable. The core application serves as the composition container for the required Source and Sink extensions. Therefore, the application user needs to put only the desired Extension class library assembly into the Extensions folder before running the application. In addition, the core project has a unit test project to exercise the application's behavior, whereas extension projects contain concrete integration tests that rely on external systems.

The project consists of a core application executable project as well as a shared interface project. Each extension will consist of an extension project plus integration test project. A core unit test project exercises the core application functionality.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Getting Started

Clone the source code repository

  1. From a command prompt, execute the following command in an empty working folder that will house the source code.
git clone https://github.com/AzureCosmosDB/data-migration-desktop-tool.git

Build the solution

  1. Using Visual Studio 2022, open CosmosDbDataMigrationTool.sln.

  2. Build the project using the keyboard shortcut Ctrl+Shift+B (Cmd+Shift+B on a Mac). This will build all current extension projects as well as the command-line Core application. The extension projects build assemblies get written to the Extensions folder of the Core application build. This way all extension options are available when the application is run.

Tutorial: JSON to Cosmos DB migration

This tutorial outlines how to use the Azure Cosmos DB Desktop Data Migration Tool to move JSON data to Azure Cosmos DB. This tutorial uses the Azure Cosmos DB Emulator.

Tutorial Software prerequisites

  1. Visual Studio 2022
  2. .NET 6.0 SDK
  3. Azure Cosmos DB Emulator or Azure Cosmos DB resource.

Task 1: Provision a sample database and container using the Azure Cosmos DB Emulator as the destination(sink)

  1. Launch the Azure Cosmos DB emulator application and open https://localhost:8081/_explorer/index.html in a browser.

  2. Select the Explorer option from the left menu. Then choose the New Database link found beneath the Common Tasks heading.

    The Azure Cosmos DB emulator screen displays with Explorer selected from the left menu and the New Database link highlighted beneath the Common Tasks heading.

  3. On the New Database blade, enter datamigration in the Database id field, then select OK.

    The New Database blade displays with datamigration entered in the Database id field and the OK button is highlighted.

  4. If the datamigration database doesn't appear in the list of databases, select the Refresh icon.

    The datamigration database displays with the refresh button highlighted.

  5. Expand the ellipsis menu next to the datamigration database and select New Container.

    The ellipsis menu of the datamigration database is expanded with the New Container item highlighted.

  6. On the New Container blade, enter btcdata in the Container id field, and /id in the Partition key field. Select the OK button.

    The New Container blade displays with btcdata entered in the Container id field and /id entered in the Partition key field. The OK button is highlighted.

    Note: When using the Cosmos DB Data Migration tool, the container doesn't have to previously exist, it will be created automatically using the partition key specified in the sink configuration.

Task 2: Prepare JSON source documents

  1. Locate the docs/resources/sample-data.zip file. Extract the files to any desired folder. These files serve as the JSON data that is to be migrated to Cosmos DB.

Task 3: Setup the data migration configuration

  1. Each extension contains a README document that outlines configuration for the data migration. In this case, locate the configuration for JSON (Source) and Cosmos DB (Sink).

  2. In the Visual Studio Solution Explorer, expand the Microsoft.Data.Transfer.Core project, and open migrationsettings.json. This file provides an example outline of the settings file structure. Using the documentation linked above, configure the SourceSettings and SinkSettings sections. Ensure the FilePath setting is the location where the sample data is extracted. The ConnectionString setting can be found on the Cosmos DB Emulator Quickstart screen as the Primary Connection String. Save the file.

    Note: The alternate terms Target and Destination can be used in place of Sink in configuration files and command line parameters. For example "Target" and "TargetSettings" would also be valid in the below example.

    {
        "Source": "JSON",
        "Sink": "Cosmos-nosql",
        "SourceSettings": {
            "FilePath": "C:\\btcdata\\simple_json.json"
        },
        "SinkSettings": {
            "ConnectionString": "AccountEndpoint=https://localhost:8081/;AccountKey=C2y6yDj...",
            "Database": "datamigration",
            "Container": "btcdata",
            "PartitionKeyPath": "/id",
            "RecreateContainer": false,
            "IncludeMetadataFields": false
        }
    }

    The Cosmos DB Emulator Quickstart screen displays with the Primary Connection String value highlighted.

  3. Ensure the Cosmos.DataTransfer.Core project is set as the startup project then press F5 to run the application.

  4. The application then performs the data migration. After a few moments the process will indicate Data transfer complete. or Data transfer failed.

Note: The Source and Sink properties should match the DisplayName set in the code for the extensions.

Using the command line

  1. Download the latest release, or ensure the project is built.

  2. The Extensions folder contains the plug-ins available for use in the migration. Each extension is located in a folder with the name of the data source. For example, the Cosmos DB extension is located in the folder Cosmos. Before running the application, you can open the Extensions folder and remove any folders for the extensions that are not required for the migration.

  3. In the root of the build folder, locate the migrationsettings.json and update settings as documented in the Extension documentation. Example file (similar to tutorial above):

    {
        "Source": "JSON",
        "Sink": "Cosmos-nosql",
        "SourceSettings": {
            "FilePath": "C:\\btcdata\\simple_json.json"
        },
        "SinkSettings": {
            "ConnectionString": "AccountEndpoint=https://localhost:8081/;AccountKey=C2y6yDj...",
            "Database": "datamigration",
            "Container": "btcdata",
            "PartitionKeyPath": "/id",
            "RecreateContainer": false,
            "IncludeMetadataFields": false
        }
    }

Note: migrationsettings.json can also be configured to execute multiple data transfer operations with a single run command. To do this, include an Operations property consisting of an array of objects that include SourceSettings and SinkSettings properties using the same format as those shown above for single operations. Additional details and examples can be found in this blog post.

  1. Execute the program using the following command:

    Using Windows

    dmt.exe

    Note: Use the --settings option with a file path to specify a different settings file (overriding the default migrationsettings.json file). This facilitates automating running of different migration jobs in a programmatic loop.

    Using macOS

    ./dmt

    Note: Before you run the tool on macOS, you'll need to follow Apple's instructions on how to Open a Mac app from an unidentified developer.

Creating Extensions

  1. Decide what type of extension you want to create. There are 3 different types of extensions and each of those can be implemented to read data, write data, or both.

    1. DataSource/DataSink extension: Appropriate for data sources which include both a native data format and storage. Most databases fall under this category and generally your extension will be written using an SDK specific to that type of database. For example, SQL Server uses data structured as tables and is accessed through drivers that handle underlying communication with the database.
    2. Binary File Storage extension: Only concerned with the storage of binary files and is agnostic to the specific file format. Examples include files on local disk or cloud blob storage providers. This type of extension can be used by any File Format extension.
    3. File Format extension: Handles translating data for a specific binary file format but is agnostic to storage. Examples include JSON or Parquet. This type of extension can be combined with any Binary File Storage extension to create multiple DataSource/DataSink extensions.
  2. Add a new folder in the Extensions folder with the name of your extension.

  3. Create the extension project and an accompanying test project.

    • The naming convention for extension projects is Cosmos.DataTransfer.<Name>Extension.
    • Extension projects should use .NET 6 framework and Console Application type. A Program.cs file must be included in order to build the console project. A Console Application Project is required to have the build include NuGet referenced packages.

    Binary File Storage extensions are only used in combination with other extensions so should be placed in a .NET 6 Class Library without the additional debugging configuration needed below.

  4. Add the new projects to the CosmosDbDataMigrationTool solution.

  5. In order to facilitate local debugging the extension build output along with any dependencies needs to be copied into the Core\Cosmos.DataTransfer.Core\bin\Debug\net6.0\Extensions folder. To set up the project to automatically copy add the following changes.

    • Add a Publish Profile to Folder named LocalDebugFolder with a Target Location of ..\..\..\Core\Cosmos.DataTransfer.Core\bin\Debug\net6.0\Extensions
    • To publish every time the project builds, edit the .csproj file to add a new post-build step:
    <Target Name="PublishDebug" AfterTargets="Build" Condition=" '$(Configuration)' == 'Debug' ">
       <Exec Command="dotnet publish --no-build -p:PublishProfile=LocalDebugFolder" />
    </Target>
  6. Add references to the System.ComponentModel.Composition NuGet package and the Cosmos.DataTransfer.Interfaces project.

  7. Extensions can implement either IDataSourceExtension to read data or IDataSinkExtension to write data. Classes implementing these interfaces should include a class level System.ComponentModel.Composition.ExportAttribute with the implemented interface type as a parameter. This will allow the plugin to get picked up by the main application.

    • Binary File Storage extensions implement the IComposableDataSource or IComposableDataSink interfaces. To be used with different file formats, the projects containing the formatters should reference the extension's project and add new CompositeSourceExtension or CompositeSinkExtension referencing the storage and formatter extensions.
    • File Format extensions implement the IFormattedDataReader or IFormattedDataWriter interfaces. In order to be usable each should also declare one or more CompositeSourceExtension or CompositeSinkExtension to define available storage locations for the format. This will require adding references to Storage extension projects and adding a declaration for each file format/storage combination. Example:
      [Export(typeof(IDataSinkExtension))]
      public class JsonAzureBlobSink : CompositeSinkExtension<AzureBlobDataSink, JsonFormatWriter>
      {
          public override string DisplayName => "JSON-AzureBlob";
      }
    1. Settings needed by the extension can be retrieved from any standard .NET configuration source in the main application by using the IConfiguration instance passed into the ReadAsync and WriteAsync methods. Settings under SourceSettings/SinkSettings will be included as well as any settings included in JSON files specified by the SourceSettingsPath/SinkSettingsPath.
  8. Implement your extension to read and/or write using the generic IDataItem interface which exposes object properties as a list key-value pairs. Depending on the specific structure of the data storage type being implemented, you can choose to support nested objects and arrays or only flat top-level properties.

    Binary File Storage extensions are only concerned with generic storage so only work with Stream instances representing whole files rather than individual IDataItem.