Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various fixes #52

Merged
merged 6 commits into from
May 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/workflows/dotnet-build-test.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: .NET
name: Build and Test all .NET projects

on:
push:
Expand All @@ -8,9 +8,9 @@ on:

jobs:
build:

name: Build and test .NET projects
runs-on: ubuntu-latest

container: mcr.microsoft.com/dotnet/sdk:6.0
steps:
- uses: actions/checkout@v3
- name: Setup .NET
Expand All @@ -28,4 +28,4 @@ jobs:
uses: actions/upload-artifact@v3
with:
name: debug-build
path: /home/runner/work/azure-documentdb-datamigrationtool/azure-documentdb-datamigrationtool/Core/Cosmos.DataTransfer.Core/bin/Debug/net6.0 #path/to/artifact/ # or path/to/artifact
path: /home/runner/work/data-migration-desktop-tool/data-migration-desktop-tool/Core/Cosmos.DataTransfer.Core/bin/Debug/net6.0 #path/to/artifact/ # or path/to/artifact
7 changes: 4 additions & 3 deletions .github/workflows/validate.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
name: Validate all .NET projects
on:
pull_request:
branches:
- main
# pull_request:
# branches:
# - main
workflow_dispatch:
jobs:
build-test:
name: Build and test .NET projects
Expand Down
123 changes: 123 additions & 0 deletions ExampleConfigs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Example `migrationsettings.json` Files

## JSON to Cosmos-NoSQL
```json
{
"Source": "json",
"Sink": "cosmos-nosql",
"SourceSettings": {
"FilePath": "https://mytestfiles.local/sales-data.json"
},
"SinkSettings": {
"ConnectionString": "AccountEndpoint=https://...",
"Database": "myDb",
"Container": "myContainer",
"PartitionKeyPath": "/id",
"RecreateContainer": true,
"WriteMode": "Insert",
"CreatedContainerMaxThroughput": 5000,
"IsServerlessAccount": false
}
}
```

## Cosmos-NoSQL to JSON
```json
{
"Source": "Cosmos-NoSql",
"Sink": "JSON",
"SourceSettings":
{
"ConnectionString": "AccountEndpoint=https://...",
"Database":"cosmicworks",
"Container":"customers",
"IncludeMetadataFields": true
},
"SinkSettings":
{
"FilePath": "c:\\data\\cosmicworks\\customers.json",
"Indented": true
}
}
```

## MongoDB to Cosmos-NoSQL
```json
{
"Source": "mongodb",
"Sink": "cosmos-nosql",
"SourceSettings": {
"ConnectionString": "mongodb://...",
"DatabaseName": "sales",
"Collection": "person"
},
"SinkSettings": {
"ConnectionString": "AccountEndpoint=https://...",
"Database": "users",
"Container": "migrated",
"PartitionKeyPath": "/id",
"ConnectionMode": "Direct",
"WriteMode": "UpsertStream",
"CreatedContainerMaxThroughput": 8000,
"UseAutoscaleForCreatedContainer": false
}
}
```

## SqlServer to AzureTableAPI
```json
{
"Source": "SqlServer",
"Sink": "AzureTableApi",
"SourceSettings": {
"ConnectionString": "Server=...",
"QueryText": "SELECT Id, Date, Amount FROM dbo.Payments WHERE Status = 'open'"
},
"SinkSettings": {
"ConnectionString": "DefaultEndpointsProtocol=https;AccountName=...",
"Table": "payments",
"RowKeyFieldName": "Id"
}
}
```

## Cosmos-NoSQL to SqlServer
```json
{
"Source": "cosmos-nosql",
"Sink": "sqlserver",
"SourceSettings":
{
"ConnectionString": "AccountEndpoint=https://...",
"Database":"operations",
"Container":"alerts",
"PartitionKeyValue": "jan",
"Query": "SELECT a.name, a.description, a.count, a.id, a.isSet FROM a"
},
"SinkSettings":
{
"ConnectionString": "Server=...",
"TableName": "Import",
"ColumnMappings": [
{
"ColumnName": "Name"
},
{
"ColumnName": "Description"
},
{
"ColumnName": "Count",
"SourceFieldName": "number"
},
{
"ColumnName": "Id"
},
{
"ColumnName": "IsSet",
"AllowNull": false,
"DefaultValue": false
}
]
}
}
```
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
using System.Globalization;
using System.Reflection;
using System.Text;
using System.Text.RegularExpressions;
using Cosmos.DataTransfer.Interfaces;
using Microsoft.Azure.Cosmos;
using Microsoft.Extensions.Configuration;
Expand Down Expand Up @@ -33,8 +34,8 @@ public async Task WriteAsync(IAsyncEnumerable<IDataItem> dataItems, IConfigurati

var entryAssembly = Assembly.GetEntryAssembly();
bool isShardedImport = false;
string sourceName = dataSource.DisplayName;
string sinkName = DisplayName;
string sourceName = StripSpecialChars(dataSource.DisplayName);
string sinkName = StripSpecialChars(DisplayName);
string userAgentString = string.Format(CultureInfo.InvariantCulture, "{0}-{1}-{2}-{3}{4}",
entryAssembly == null ? "dtr" : entryAssembly.GetName().Name,
Assembly.GetExecutingAssembly().GetName().Version,
Expand Down Expand Up @@ -119,6 +120,11 @@ void ReportCount(int i)
logger.LogInformation("Added {AddedCount} total records in {TotalSeconds}s", addedCount, $"{timer.ElapsedMilliseconds / 1000.0:F2}");
}

private static string StripSpecialChars(string displayName)
{
return Regex.Replace(displayName, "[^\\w]", "", RegexOptions.Compiled);
}

private static AsyncRetryPolicy GetRetryPolicy(int maxRetryCount, int initialRetryDuration)
{
int retryDelayBaseMs = initialRetryDuration / 2;
Expand Down
2 changes: 1 addition & 1 deletion Extensions/Cosmos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Source supports an optional `IncludeMetadataFields` parameter (`false` by defaul
}
```

Sink requires an additional `PartitionKeyPath` parameter which is used when creating the container if it does not exist. It also supports an optional `RecreateContainer` parameter (`false` by default) to delete and then recreate the container to ensure only newly imported data is present. The optional `BatchSize` parameter (100 by default) sets the number of items to accumulate before inserting. The optional `WriteMode` parameter specifies the type of data write to use: `InsertStream`, `Insert`, `UpsertStream`, or `Upsert`. The `IsServerlessAccount` parameter specifies whether the target account uses Serverless instead of Provisioned throughput, which affects the way containers are created. Additional parameters allow changing the behavior of the Cosmos client appropriate to your environment.
Sink requires an additional `PartitionKeyPath` parameter which is used when creating the container if it does not exist. It also supports an optional `RecreateContainer` parameter (`false` by default) to delete and then recreate the container to ensure only newly imported data is present. The optional `BatchSize` parameter (100 by default) sets the number of items to accumulate before inserting. `ConnectionMode` can be set to either `Gateway` (default) or `Direct` to control how the client connects to the CosmosDB service. For situations where a container is created as part of the transfer operation `CreatedContainerMaxThroughput` (in RUs) and `UseAutoscaleForCreatedContainer` provide the initial throughput settings which will be in effect when executing the transfer. The optional `WriteMode` parameter specifies the type of data write to use: `InsertStream`, `Insert`, `UpsertStream`, or `Upsert`. The `IsServerlessAccount` parameter specifies whether the target account uses Serverless instead of Provisioned throughput, which affects the way containers are created. Additional parameters allow changing the behavior of the Cosmos client appropriate to your environment.

### Sink

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,12 @@ public async Task WriteAsync(IAsyncEnumerable<IDataItem> dataItems, IConfigurati
bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping(dbColumn.ColumnName, dbColumn.ColumnName));
}

var dataTable = new DataTable();
dataTable.Columns.AddRange(dataColumns.Values.ToArray());

var batches = dataItems.Buffer(settings.BatchSize);
await foreach (var batch in batches.WithCancellation(cancellationToken))
{
var dataTable = new DataTable();
dataTable.Columns.AddRange(dataColumns.Values.ToArray());
foreach (var item in batch)
{
var fieldNames = item.GetFieldNames().ToList();
Expand Down Expand Up @@ -77,6 +78,7 @@ public async Task WriteAsync(IAsyncEnumerable<IDataItem> dataItems, IConfigurati
dataTable.Rows.Add(row);
}
await bulkCopy.WriteToServerAsync(dataTable, cancellationToken);
dataTable.Clear();
}

await transaction.CommitAsync(cancellationToken);
Expand All @@ -91,4 +93,4 @@ public async Task WriteAsync(IAsyncEnumerable<IDataItem> dataItems, IConfigurati
await connection.CloseAsync();
}
}
}
}
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The Azure Cosmos DB Desktop Data Migration Tool is an open-source project contai

## Quick Installation

To use the tool, download the latest zip file for your platform (win-x64, mac-x64, or linux-x64) from [Releases](https://github.com/AzureCosmosDB/data-migration-desktop-tool/releases) and extract all files to your desired install location. To begin a data transfer operation, first populate the `migrationsettings.json` file with appropriate settings for your data source and sink (see [detailed instructions](#using-the-command-line) below), and then run the application from a command line: `dmt.exe` on Windows or `dmt` on other platforms.
To use the tool, download the latest zip file for your platform (win-x64, mac-x64, or linux-x64) from [Releases](https://github.com/AzureCosmosDB/data-migration-desktop-tool/releases) and extract all files to your desired install location. To begin a data transfer operation, first populate the `migrationsettings.json` file with appropriate settings for your data source and sink (see [detailed instructions](#using-the-command-line) below or [review examples](ExampleConfigs.md)), and then run the application from a command line: `dmt.exe` on Windows or `dmt` on other platforms.

## Extension documentation

Expand Down Expand Up @@ -177,7 +177,7 @@ This tutorial outlines how to use the Azure Cosmos DB Desktop Data Migration Too
}
}
```
> **Note**: **migrationsettings.json** can also be configured to execute multiple data transfer operations with a single run command. To do this, include an `Operations` property consisting of an array of objects that include `SourceSettings` and `SinkSettings` properties using the same format as those shown above for single operations.
> **Note**: **migrationsettings.json** can also be configured to execute multiple data transfer operations with a single run command. To do this, include an `Operations` property consisting of an array of objects that include `SourceSettings` and `SinkSettings` properties using the same format as those shown above for single operations. Additional details and examples can be found in [this blog post](https://codemindinterface.com/2023/03/cosmos-tool-operations/).

4. Execute the program using the following command:

Expand Down