Skip to content
This repository has been archived by the owner on Sep 6, 2023. It is now read-only.

Store in parquet format #20

Merged
merged 8 commits into from
May 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 15 additions & 13 deletions .assets/Setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,14 @@ Install the extension into BC using the code given in the [businessCentral](/bus

### Step 4. Enter the BC settings
Let us take a look at the settings show in the sample screenshot below,
- **a)** The container name (defaulted to `business-central`) inside the storage account where the data shall be exported as block blobs. The export process creates this location if it does not already exist. Please ensure that the name corresponds to the requirements as outlined at [Naming and Referencing Containers, Blobs, and Metadata - Azure Storage | Microsoft Docs](https://docs.microsoft.com/en-us/rest/api/storageservices/Naming-and-Referencing-Containers--Blobs--and-Metadata).
- **b)** The tenant id at which the app registration created above resides (refer to **b)** in the picture at [Step 1](/.assets/Setup.md#step-1-create-an-azure-service-principal))
- **c)** The name of the storage account that you created in [Step 2](/.assets/Setup.md#step-2-configure-an-azure-data-lake-gen2).
- **d)** The Application (client) ID from the App registration (refer to **a)** in the picture at [Step 1](/.assets/Setup.md#step-1-create-an-azure-service-principal))
- **e)** The client credential key you had defined (refer to **c)** in the in the picture at [Step 1](/.assets/Setup.md#step-1-create-an-azure-service-principal))
- **f)** The size of the individual data payload that constitutes a single REST Api upload operation to the data lake. A bigger size will surely mean less number of uploads but might consume too much memory on the BC side. Note that each upload creates a new block within the blob in the data lake. So the size of such blocks are constrained as described at [Put Block (REST API) - Azure Storage | Microsoft Docs](https://docs.microsoft.com/en-us/rest/api/storageservices/put-block#remarks).
- **g)** The flag to enable or disable operational telemetry from this extension. It is set to True by default.
- **Container** The container name inside the storage account where the data shall be exported as block blobs. The export process creates this location if it does not already exist. Please ensure that the name corresponds to the requirements as outlined at [Naming and Referencing Containers, Blobs, and Metadata - Azure Storage | Microsoft Docs](https://docs.microsoft.com/en-us/rest/api/storageservices/Naming-and-Referencing-Containers--Blobs--and-Metadata).
- **Tenant ID** The tenant id at which the app registration created above resides (refer to **b)** in the picture at [Step 1](/.assets/Setup.md#step-1-create-an-azure-service-principal))
- **Account name** The name of the storage account that you created in [Step 2](/.assets/Setup.md#step-2-configure-an-azure-data-lake-gen2).
- **Client ID** The Application (client) ID from the App registration (refer to **a)** in the picture at [Step 1](/.assets/Setup.md#step-1-create-an-azure-service-principal))
- **Client secret** The client credential key you had defined (refer to **c)** in the in the picture at [Step 1](/.assets/Setup.md#step-1-create-an-azure-service-principal))
- **Max payload size (MiBs)** The size of the individual data payload that constitutes a single REST Api upload operation to the data lake. A bigger size will surely mean less number of uploads but might consume too much memory on the BC side. Note that each upload creates a new block within the blob in the data lake. The size of such blocks are constrained as described at [Put Block (REST API) - Azure Storage | Microsoft Docs](https://docs.microsoft.com/en-us/rest/api/storageservices/put-block#remarks).
- **CDM data format** The format in which the exported data is stored on the data lake. Recommended format is Parquet, which is better at handling special characters in the BC text fields.
- **Emit telemetry** The flag to enable or disable operational telemetry from this extension. It is set to True by default.

![The Export to Azure Data Lake Storage page](/.assets/bcAdlsePage.png)

Expand Down Expand Up @@ -73,14 +74,15 @@ This is the step that would create the analytics pipelines in the above workspac
| Sequence # | Name & Url | Tab | Menu to invoke under the `+` sign |
| ---------- | ---- | --- | ----------------------------------|
|1|[`data_dataset`](/synapse/dataset/data_dataset.json)|`Data`|`Integration dataset`|
|2|[`deltasManifest_dataset`](/synapse/dataset/deltasManifest_dataset.json)|`Data`|`Integration dataset`|
|2|[`dataManifest_dataset`](/synapse/dataset/dataManifest_dataset.json)|`Data`|`Integration dataset`|
|3|[`deltas_dataset`](/synapse/dataset/deltas_dataset.json)|`Data`|`Integration dataset`|
|4|[`stagingManifest_dataset`](/synapse/dataset/stagingManifest_dataset.json)|`Data`|`Integration dataset`|
|4|[`deltasManifest_dataset`](/synapse/dataset/deltasManifest_dataset.json)|`Data`|`Integration dataset`|
|5|[`staging_dataset`](/synapse/dataset/staging_dataset.json)|`Data`|`Integration dataset`|
|6|[`Consolidation_flow`](/synapse/dataflow/Consolidation_flow.json)|`Develop`|`Data flow`|
|7|[`Consolidation_OneEntity`](/synapse/pipeline/Consolidation_OneEntity.json)|`Integrate`|`Pipeline`|
|8|[`Consolidation_CheckForDeltas`](/synapse/pipeline/Consolidation_CheckForDeltas.json)|`Integrate`|`Pipeline`|
|9|[`Consolidation_AllEntities`](/synapse/pipeline/Consolidation_AllEntities.json)|`Integrate`|`Pipeline`|
|6|[`stagingManifest_dataset`](/synapse/dataset/stagingManifest_dataset.json)|`Data`|`Integration dataset`|
|7|[`Consolidation_flow`](/synapse/dataflow/Consolidation_flow.json)|`Develop`|`Data flow`|
|8|[`Consolidation_OneEntity`](/synapse/pipeline/Consolidation_OneEntity.json)|`Integrate`|`Pipeline`|
|9|[`Consolidation_CheckForDeltas`](/synapse/pipeline/Consolidation_CheckForDeltas.json)|`Integrate`|`Pipeline`|
|10|[`Consolidation_AllEntities`](/synapse/pipeline/Consolidation_AllEntities.json)|`Integrate`|`Pipeline`|

6. At the toolbar of the **Synapse Studio** at the top, you may now click on **Validate all** and if there are no errors, click on **Publish all**.

Expand Down
Binary file modified .assets/bcAdlsePage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,12 @@ More details:

## Latest notable changes

Date | Changes
Pull request | Changes
--------------- | ---
3rd May, 2022 | The [Consolidation_CheckForDeltas](/synapse/pipeline/Consolidation_CheckForDeltas.json) pipeline now contains a fail activity that is triggered when no directory is found in `/deltas/` for an entity listed in the `deltas.manifest.cdm.json`. This may occur when no new deltas have been exported since the last execution of the consolidation pipeline. Other parallel pipeline runs are not affected.
[20](/pull/20) | Data on the lake can be chosen to be stored on the [Parquet](https://docs.microsoft.com/en-us/azure/data-factory/format-parquet) format, thus improving its fidelity to its original in Business Central.
[16](/pull/16) | The [Consolidation_CheckForDeltas](/synapse/pipeline/Consolidation_CheckForDeltas.json) pipeline now contains a fail activity that is triggered when no directory is found in `/deltas/` for an entity listed in the `deltas.manifest.cdm.json`. This may occur when no new deltas have been exported since the last execution of the consolidation pipeline. Other parallel pipeline runs are not affected.
[14](/pull/14) | It is possible now to select all fields in a table for export. Those fields that are not allowed to be exported, say flow fields, are not selected.
[13](/pull/13) | A template is inserted in the `OnAfterOnDatabaseDelete` procedure, so that deletions of archive table records, are not synchronized to the data lake. This helps in selected tables in the data lake continuing to hold on to records that may be removed from the BC database, for house-keeping purposes. This is especialy relevant for ledger entry tables.

## Contributing

Expand Down
2 changes: 1 addition & 1 deletion businessCentral/app.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"publisher": "The bc2adls team, Microsoft Denmark",
"brief": "Sync data from Business Central to the Azure storage",
"description": "Exports data in chosen tables to the Azure Data Lake and keeps it in sync by incremental updates. Before you use this tool, please read the SUPPORT.md file at https://github.com/microsoft/bc2adls.",
"version": "1.0.0.5",
"version": "1.0.1.1",
"privacyStatement": "https://go.microsoft.com/fwlink/?LinkId=724009",
"EULA": "https://go.microsoft.com/fwlink/?linkid=2009120",
"help": "https://go.microsoft.com/fwlink/?LinkId=724011",
Expand Down
35 changes: 23 additions & 12 deletions businessCentral/src/ADLSECDMUtil.Codeunit.al
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ codeunit 82566 "ADLSE CDM Util" // Refer Common Data Model https://docs.microsof
Content.Add('definitions', Definitions);
end;

procedure UpdateDefaultManifestContent(ExistingContent: JsonObject; TableID: Integer; Folder: Text) Content: JsonObject
procedure UpdateDefaultManifestContent(ExistingContent: JsonObject; TableID: Integer; Folder: Text; ADLSECdmFormat: Enum "ADLSE CDM Format") Content: JsonObject
var
ADLSEUtil: Codeunit "ADLSE Util";
Entities: JsonArray;
Expand Down Expand Up @@ -64,17 +64,28 @@ codeunit 82566 "ADLSE CDM Util" // Refer Common Data Model https://docs.microsof

DataPartitionPattern.Add('name', EntityName);
DataPartitionPattern.Add('rootLocation', StrSubstNo('%1/%2/', Folder, EntityName));
DataPartitionPattern.Add('globPattern', '*.csv');

ExhibitsTrait.Add('traitReference', 'is.partition.format.CSV');
AddNameValue(ExhibitsTraitArgs, 'columnHeaders', 'true');
AddNameValue(ExhibitsTraitArgs, 'delimiter', ',');
AddNameValue(ExhibitsTraitArgs, 'escape', '\');
AddNameValue(ExhibitsTraitArgs, 'encoding', 'utf-8');
AddNameValue(ExhibitsTraitArgs, 'quote', '"');
ExhibitsTrait.Add('arguments', ExhibitsTraitArgs);
ExhibitsTraits.Add(ExhibitsTrait);
DataPartitionPattern.Add('exhibitsTraits', ExhibitsTraits);
case ADLSECdmFormat of
"ADLSE CDM Format"::Csv:
begin
DataPartitionPattern.Add('globPattern', '*.csv');
ExhibitsTrait.Add('traitReference', 'is.partition.format.CSV');
AddNameValue(ExhibitsTraitArgs, 'columnHeaders', 'true');
AddNameValue(ExhibitsTraitArgs, 'delimiter', ',');
AddNameValue(ExhibitsTraitArgs, 'escape', '\');
AddNameValue(ExhibitsTraitArgs, 'encoding', 'utf-8');
AddNameValue(ExhibitsTraitArgs, 'quote', '"');
ExhibitsTrait.Add('arguments', ExhibitsTraitArgs);
ExhibitsTraits.Add(ExhibitsTrait);
DataPartitionPattern.Add('exhibitsTraits', ExhibitsTraits);
end;
ADLSECdmFormat::Parquet:
begin
DataPartitionPattern.Add('globPattern', '*.parquet');
ExhibitsTrait.Add('traitReference', 'is.partition.format.parquet');
ExhibitsTraits.Add(ExhibitsTrait);
DataPartitionPattern.Add('exhibitsTraits', ExhibitsTraits);
end;
end;

DataPartitionPatterns.Add(DataPartitionPattern);
Entity.Add('dataPartitionPatterns', DataPartitionPatterns);
Expand Down
21 changes: 21 additions & 0 deletions businessCentral/src/ADLSECdmFormat.Enum.al
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License. See LICENSE in the project root for license information.

/// <summary>
/// The formats in which data is stored on the data lake
/// </summary>
enum 82562 "ADLSE CDM Format"
{
Access = Internal;
Extensible = false;

value(0; Csv)
{
Caption = 'CSV';
}

value(1; Parquet)
{
Caption = 'Parquet';
}
}
12 changes: 6 additions & 6 deletions businessCentral/src/ADLSECommunication.Codeunit.al
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ codeunit 82562 "ADLSE Communication"
EmitTelemetry := EmitTelemetryValue;
end;

procedure CheckEntity(var EntityJsonNeedsUpdate: Boolean; var ManifestJsonsNeedsUpdate: Boolean)
procedure CheckEntity(CdmDataFormat: Enum "ADLSE CDM Format"; var EntityJsonNeedsUpdate: Boolean; var ManifestJsonsNeedsUpdate: Boolean)
var
ADLSECdmUtil: Codeunit "ADLSE CDM Util";
ADLSEGen2Util: Codeunit "ADLSE Gen 2 Util";
Expand All @@ -85,7 +85,7 @@ codeunit 82562 "ADLSE Communication"

// check manifest. Assume that if the data manifest needs change, the delta manifest will also need be updated
OldJson := ADLSEGen2Util.GetBlobContent(GetBaseUrl() + StrSubstNo(CorpusJsonPathTxt, DataCdmManifestNameTxt), ADLSECredentials, BlobExists);
NewJson := ADLSECdmUtil.UpdateDefaultManifestContent(OldJson, TableID, 'data');
NewJson := ADLSECdmUtil.UpdateDefaultManifestContent(OldJson, TableID, 'data', CdmDataFormat);
ManifestJsonsNeedsUpdate := JsonsDifferent(OldJson, NewJson);
end;

Expand Down Expand Up @@ -242,13 +242,13 @@ codeunit 82562 "ADLSE Communication"
// Expected that multiple sessions that export data from different tables will be competing for writing to manifest. Semaphore applied.
ADLSEExecute.AcquireLockonADLSESetup(ADLSESetup);

UpdateManifest(GetBaseUrl() + StrSubstNo(CorpusJsonPathTxt, DataCdmManifestNameTxt), 'data');
UpdateManifest(GetBaseUrl() + StrSubstNo(CorpusJsonPathTxt, DeltaCdmManifestNameTxt), 'deltas');
UpdateManifest(GetBaseUrl() + StrSubstNo(CorpusJsonPathTxt, DataCdmManifestNameTxt), 'data', ADLSESetup.DataFormat);
UpdateManifest(GetBaseUrl() + StrSubstNo(CorpusJsonPathTxt, DeltaCdmManifestNameTxt), 'deltas', "ADLSE CDM Format"::Csv);
Commit(); // to release the lock above
end;
end;

local procedure UpdateManifest(BlobPath: Text; Folder: Text)
local procedure UpdateManifest(BlobPath: Text; Folder: Text; ADLSECdmFormat: Enum "ADLSE CDM Format")
var
ADLSECdmUtil: Codeunit "ADLSE CDM Util";
ADLSEGen2Util: Codeunit "ADLSE Gen 2 Util";
Expand All @@ -259,7 +259,7 @@ codeunit 82562 "ADLSE Communication"
LeaseID := ADLSEGen2Util.AcquireLease(BlobPath, ADLSECredentials, BlobExists);
if BlobExists then
ManifestJson := ADLSEGen2Util.GetBlobContent(BlobPath, ADLSECredentials, BlobExists);
ManifestJson := ADLSECdmUtil.UpdateDefaultManifestContent(ManifestJson, TableID, Folder);
ManifestJson := ADLSECdmUtil.UpdateDefaultManifestContent(ManifestJson, TableID, Folder, ADLSECdmFormat);
ADLSEGen2Util.CreateOrUpdateJsonBlob(BlobPath, ADLSECredentials, LeaseID, ManifestJson);
ADLSEGen2Util.ReleaseBlob(BlobPath, ADLSECredentials, LeaseID);
end;
Expand Down
5 changes: 4 additions & 1 deletion businessCentral/src/ADLSEExecute.Codeunit.al
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ codeunit 82561 "ADLSE Execute"
EntityJsonNeedsUpdate: Boolean;
ManifestJsonsNeedsUpdate: Boolean;
begin
ADLSESetup.Get(0);
EmitTelemetry := ADLSESetup."Emit telemetry";
CDMDataFormat := ADLSESetup.DataFormat;

// Register session started
ADLSECurrentSession.Start(Rec."Table ID");
Expand Down Expand Up @@ -102,6 +104,7 @@ codeunit 82561 "ADLSE Execute"
TimestampAscendingSortViewTxt: Label 'Sorting(Timestamp) Order(Ascending)', Locked = true;
InsufficientReadPermErr: Label 'You do not have sufficient permissions to read from the table.';
EmitTelemetry: Boolean;
CDMDataFormat: Enum "ADLSE CDM Format";

[TryFunction]
local procedure TryExportTableData(TableID: Integer; var ADLSECommunication: Codeunit "ADLSE Communication";
Expand All @@ -115,7 +118,7 @@ codeunit 82561 "ADLSE Execute"

// first export the upserts
ADLSECommunication.Init(TableID, FieldIdList, UpdatedLastTimeStamp, EmitTelemetry);
ADLSECommunication.CheckEntity(EntityJsonNeedsUpdate, ManifestJsonsNeedsUpdate);
ADLSECommunication.CheckEntity(CDMDataFormat, EntityJsonNeedsUpdate, ManifestJsonsNeedsUpdate);
ExportTableUpdates(TableID, FieldIdList, ADLSECommunication, UpdatedLastTimeStamp);

// then export the deletes
Expand Down
9 changes: 8 additions & 1 deletion businessCentral/src/ADLSESetup.Page.al
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ page 82560 "ADLSE Setup"
ADLSECredentials.SetClientID(ClientID);
end;
}
field("Client Secret"; ClientSecret)
field("Client secret"; ClientSecret)
{
ApplicationArea = All;
ExtendedDatatype = Masked;
Expand All @@ -83,11 +83,18 @@ page 82560 "ADLSE Setup"
Tooltip = 'Specifies the maximum size of the upload for each block of data in MiBs. A large value will reduce the number of iterations to upload the data but may interfear with the performance of other processes running on this environment.';
}

field("CDM data format"; Rec.DataFormat)
{
ApplicationArea = All;
ToolTip = 'Specifies the format of the CDM folder to store the exported data. The Parquet format is recommended for storing the data with the best fidelity.';
}

field("Emit telemetry"; Rec."Emit telemetry")
{
ApplicationArea = All;
Tooltip = 'Specifies if operational telemetry will be emitted to this extension publisher''s telemetry pipeline. You will have to configure a telemetry account for this extension first.';
}

}
}
part(Tables; "ADLSE Setup Tables")
Expand Down
6 changes: 6 additions & 0 deletions businessCentral/src/ADLSESetup.Table.al
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ table 82560 "ADLSE Setup"
MinValue = 1;
}

field(4; DataFormat; Enum "ADLSE CDM Format")
{
Caption = 'CDM data format';
InitValue = Parquet;
}

field(10; Running; Boolean)
{
Caption = 'Exporting data';
Expand Down
2 changes: 2 additions & 0 deletions businessCentral/src/ADLSEUtil.Codeunit.al
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,8 @@ codeunit 82564 "ADLSE Util"

local procedure ConvertStringToText(Val: Text): Text
begin
Val := Val.Replace('\', '\\'); // escape the escape character
Val := Val.Replace('"', '\"'); // escape the quote character
exit(StrSubstNo('"%1"', Val));
end;

Expand Down
Loading