Skip to content
This repository has been archived by the owner on Sep 6, 2023. It is now read-only.

Commit

Permalink
Error during export should save last exported timestamp (#109)
Browse files Browse the repository at this point in the history
* First draft to handle (1) saving the last timestamps on error, (2) timeouts on large data uploads to the lake

* Adding changelog

* Refining the changelog

* Missed saving the record after setting the Error field.

---------

Co-authored-by: Soumya Dutta <soudutta@microsoft.com>
  • Loading branch information
DuttaSoumya and Soumya Dutta authored Jun 6, 2023
1 parent 91046aa commit 7eed56e
Show file tree
Hide file tree
Showing 10 changed files with 106 additions and 36 deletions.
1 change: 1 addition & 0 deletions .assets/Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ This page lists all pull requests that made significant changes to bc2adls. To s

Pull request | Changes
--------------- | ---
[109](https://github.com/microsoft/bc2adls/pull/109) | Two issues addressed, (1) Exporting large amounts of data to the lake caused a query timeout issue when sorting the records as per the row version. Sorting is important as it helps ensure that the subsequent exports restart from records which were **not** exported during the last run. The timeout issue was caused by this sorting over large number of records. So a new flag called **Skip row version sorting** is introduced on the setup page that allows unsorted records to be uploaded to the lake. This is to be used only as a temporary measure and should be disabled once the data has been uploaded to the lake. See [documentation for the field](/.assets/Setup.md). (2) An error occurring late in the export process forces the system to start from the first record when the export is invoked again. So the system has been made robust by saving the last timestamps even in case of errors. This helps subsequent exports to "catch up" from the time the last export went into the lake.
[79](https://github.com/microsoft/bc2adls/pull/79) | The step to clean up tracked deleted records from the export process has now been removed to make exports more efficient. This clean up step can instead be performed either by clicking on the action **Clear tracked deleted records** on the main setup page, or by invoking the new codeunit **ADLSE Clear Tracked Deletions** through a low- frequency custom job queue entry.
[78](https://github.com/microsoft/bc2adls/pull/78) | Upgrading to new versions may lead the export configuration to enter an incorrect state, say, if a field that was being exported before gets obsoleted in the new version. This fix prevents such an occurence by raising an error during the upgrade process. If corrective actions, say, disabling such fields are not taken after multiple upgrade attempts, the bc2adls extension is uninstalled and upgrade is forced. A subsequent re-install of the extension will then disable such tables from being exported, so that the user can then react to the change in schema later on.
[56](https://github.com/microsoft/bc2adls/pull/56) | The table ADLSE Run has now been added to the retention policy so that the logs for the executions can be cleared periodically, thus taking up less space in the database.
Expand Down
7 changes: 6 additions & 1 deletion .assets/FAQs.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,9 @@ Normally, all the tables that are setup for export will export at the same time.
Incremental exports create files in the `deltas` folder in the lake container. Each such file has a `Modified` field that indicates the time when it was last updated, in other words, when the export process finished with that file. Each export process for an entity and in a company logs its execution on the [`ADLSE Run`](https://github.com/microsoft/bc2adls/blob/main/businessCentral/src/ADLSERun.Table.al) table using the `Started` and `Ended` fields. Thus you may tally the value in the `Modified` field of the file to these fields and determine which run resulted in creation of that file. You may also use telemetry to determine which run created which file.

### What should I do when a field I was exporting has been made obsolete?
Table fields that are obsoleted already cannot be configured to be exported but the system actually allows you to add fields that are pending obsoletion. In case you are already using such a field, you will get upgrade errors for upgrade to newer versions of the application where the field has been removed. It is recommended that you read the documentation of the obsoletion to determine if there are different fields that will hold the information from the new version onwards and then to enable those fields thereby. Of course, you will also have to disable the obsoleted field from the export. Such a change will alter the schema of export, thus changing the entity Jsons on the data lake. We advise you to archive the "older" data and if possible, create pipelines to correctly map the older data to the new schema.
Table fields that are obsoleted already cannot be configured to be exported but the system actually allows you to add fields that are pending obsoletion. In case you are already using such a field, you will get upgrade errors for upgrade to newer versions of the application where the field has been removed. It is recommended that you read the documentation of the obsoletion to determine if there are different fields that will hold the information from the new version onwards and then to enable those fields thereby. Of course, you will also have to disable the obsoleted field from the export. Such a change will alter the schema of export, thus changing the entity Jsons on the data lake. We advise you to archive the "older" data and if possible, create pipelines to correctly map the older data to the new schema.

### I need help because my export job is timing out!
Let's look at addressing timeout issues have been seen to occur at two possible places in the solution, both of them happening typically during the initial export of records,
1. The query to fetch the records during before the export to the lake may timeout if it takes more than the [operation limits](/business-central/dev-itpro/administration/operational-limits-online) defined. This may happen when bc2adls attempts to sort a large set of records as per the row version. You may _suspend_ the sorting temporarily using the field `Skip row version sorting` on the setup page.
1. Chunks of data (or [blocks](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction#:~:text=Block%20blobs), in the data lake parlance) are added to a lake file during the export. Adding too many such large chunks may cause timeout issues in the form of an error message like `Could not commit blocks to <redacted>. OperationTimedOutOperation could not be completed within the specified time.` We are using default timeouts in the bc2adls app, but you may add [additional timeout URL parameter](https://learn.microsoft.com/en-us/rest/api/storageservices/put-block-list?tabs=azure-ad#:~:text=timeout) if you want by suffixing the URL call in the procedure [`CommitAllBlocksOnDataBlob`](https://github.com/microsoft/bc2adls/blob/main/businessCentral/src/ADLSEGen2Util.Codeunit.al#:~:text=CommitAllBlocksOnDataBlob) with `?timeout=XX`, XX being the number of seconds for timeout to expire. This issue could typically happen when you are pushing a large payload to the server. Also consider reducing the number at the field [Max payload size (MiBs)](https://github.com/microsoft/bc2adls/blob/main/.assets/Setup.md#:~:text=Max%20payload%20size%20(MiBs)).
3 changes: 2 additions & 1 deletion .assets/Setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,9 @@ Let us take a look at the settings show in the sample screenshot below,
- **Client secret** The client credential key you had defined (refer to **c)** in the in the picture at [Step 1](/.assets/Setup.md#step-1-create-an-azure-service-principal))
- **Max payload size (MiBs)** The size of the individual data payload that constitutes a single REST Api upload operation to the data lake. A bigger size will surely mean less number of uploads but might consume too much memory on the BC side. Note that each upload creates a new block within the blob in the data lake. The size of such blocks are constrained as described at [Put Block (REST API) - Azure Storage | Microsoft Docs](https://docs.microsoft.com/en-us/rest/api/storageservices/put-block#remarks).
- **CDM data format** The format in which the exported data is stored on the data lake. Recommended format is Parquet, which is better at handling special characters in the BC text fields. Note that the `deltas` folder will always store files in the CSV format but the consolidated `data` folder will store files in the configured format.
- **Emit telemetry** The flag to enable or disable operational telemetry from this extension. It is set to True by default.
- **Multi- company export** The flag to allow exporting data from multiple companies at the same time. You should enable this only after the export schema is finalized- in other words, ensure that at least one export for a company has been successful with all the desired tables and the desired fields in those tables. We recommend that the json files are manually checked in the outbound container before enabling this flag. Changes to the export schema (adding or removing tables as well as changing the field set to be exported) are not allowed as long as this flag is checked.
- **Skip row version sorting** Allows the records to be exported as they are fetched through SQL. This can be useful to avoid query timeouts when there is a large amount of records to be exported to the lake from a table, say, during the first export. The records are usually sorted ascending on their row version so that in case of a failure, the next export can re-start by exporting only those records that have a row version higher than that of the last exported one. This helps incremental updates to reach the lake in the same order that the updates were made. Enabling this check, however, may thus cause a subsequent export job to re-send records that had been exported to the lake already, thus leading to performance degradation on the next run. It is recommended to use this cautiously for only a few tables (while disabling export for all other tables), and disabling this check once all the data has been transferred to the lake.
- **Emit telemetry** The flag to enable or disable operational telemetry from this extension. It is set to True by default.

![The Export to Azure Data Lake Storage page](/.assets/bcAdlsePage.png)

Expand Down
Binary file modified .assets/bcAdlsePage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion businessCentral/app.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"publisher": "The bc2adls team, Microsoft Denmark",
"brief": "Sync data from Business Central to the Azure storage",
"description": "Exports data in chosen tables to the Azure Data Lake and keeps it in sync by incremental updates. Before you use this tool, please read the SUPPORT.md file at https://github.com/microsoft/bc2adls.",
"version": "1.3.12.5",
"version": "1.3.13.0",
"privacyStatement": "https://go.microsoft.com/fwlink/?LinkId=724009",
"EULA": "https://go.microsoft.com/fwlink/?linkid=2009120",
"help": "https://go.microsoft.com/fwlink/?LinkId=724011",
Expand Down
7 changes: 7 additions & 0 deletions businessCentral/src/ADLSECommunication.Codeunit.al
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ codeunit 82562 "ADLSE Communication"
var
ADLSESetup: Record "ADLSE Setup";
ADLSEUtil: Codeunit "ADLSE Util";
ADLSEExecution: Codeunit "ADLSE Execution";
CustomDimensions: Dictionary of [Text, Text];
begin
TableID := TableIDValue;
FieldIdList := FieldIdListValue;
Expand All @@ -67,6 +69,11 @@ codeunit 82562 "ADLSE Communication"
ADLSESetup.GetSingleton();
MaxSizeOfPayloadMiB := ADLSESetup.MaxPayloadSizeMiB;
EmitTelemetry := EmitTelemetryValue;
if EmitTelemetry then begin
CustomDimensions.Add('Entity', EntityName);
CustomDimensions.Add('Last flushed time stamp', Format(LastFlushedTimeStampValue));
ADLSEExecution.Log('ADLSE-041', 'Initialized ADLSE Communication to write to the lake.', Verbosity::Verbose);
end;
end;

procedure CheckEntity(CdmDataFormat: Enum "ADLSE CDM Format"; var EntityJsonNeedsUpdate: Boolean; var ManifestJsonsNeedsUpdate: Boolean)
Expand Down
39 changes: 24 additions & 15 deletions businessCentral/src/ADLSEExecute.Codeunit.al
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ codeunit 82561 "ADLSE Execute"
OldDeletedLastEntryNo: BigInteger;
EntityJsonNeedsUpdate: Boolean;
ManifestJsonsNeedsUpdate: Boolean;
ExportSuccess: Boolean;
begin
ADLSESetup.GetSingleton();
EmitTelemetry := ADLSESetup."Emit telemetry";
Expand Down Expand Up @@ -52,10 +53,10 @@ codeunit 82561 "ADLSE Execute"
// Perform the export
OldUpdatedLastTimestamp := UpdatedLastTimestamp;
OldDeletedLastEntryNo := DeletedLastEntryNo;
if not TryExportTableData(Rec."Table ID", ADLSECommunication, UpdatedLastTimestamp, DeletedLastEntryNo, EntityJsonNeedsUpdate, ManifestJsonsNeedsUpdate) then begin
SetStateFinished(Rec, TableCaption);
exit;
end;
ExportSuccess := TryExportTableData(Rec."Table ID", ADLSECommunication, UpdatedLastTimestamp, DeletedLastEntryNo, EntityJsonNeedsUpdate, ManifestJsonsNeedsUpdate);
if not ExportSuccess then
ADLSERun.RegisterErrorInProcess(Rec."Table ID", EmitTelemetry, TableCaption);

if EmitTelemetry then begin
Clear(CustomDimensions);
CustomDimensions.Add('Entity', TableCaption);
Expand Down Expand Up @@ -96,7 +97,10 @@ codeunit 82561 "ADLSE Execute"
// Finalize
SetStateFinished(Rec, TableCaption);
if EmitTelemetry then
ADLSEExecution.Log('ADLSE-005', 'Export completed without error', Verbosity::Normal, CustomDimensions);
if ExportSuccess then
ADLSEExecution.Log('ADLSE-005', 'Export completed without error', Verbosity::Normal, CustomDimensions)
else
ADLSEExecution.Log('ADLSE-040', 'Export completed with errors', Verbosity::Warning, CustomDimensions);
end;

var
Expand Down Expand Up @@ -132,20 +136,22 @@ codeunit 82561 "ADLSE Execute"
Rec: RecordRef;
TimeStampField: FieldRef;
begin
SetFilterForUpdates(TableID, UpdatedLastTimeStamp, Rec, TimeStampField);
SetFilterForUpdates(TableID, UpdatedLastTimeStamp, false, Rec, TimeStampField);
exit(ADLSESeekData.RecordsExist(Rec));
end;

local procedure SetFilterForUpdates(TableID: Integer; UpdatedLastTimeStamp: BigInteger; var Rec: RecordRef; var TimeStampField: FieldRef)
local procedure SetFilterForUpdates(TableID: Integer; UpdatedLastTimeStamp: BigInteger; SkipTimestampSorting: Boolean; var Rec: RecordRef; var TimeStampField: FieldRef)
begin
Rec.Open(TableID);
Rec.SetView(TimestampAscendingSortViewTxt);
if not SkipTimestampSorting then
Rec.SetView(TimestampAscendingSortViewTxt);
TimeStampField := Rec.Field(0); // 0 is the TimeStamp field
TimeStampField.SetFilter('>%1', UpdatedLastTimestamp);
end;

local procedure ExportTableUpdates(TableID: Integer; FieldIdList: List of [Integer]; ADLSECommunication: Codeunit "ADLSE Communication"; var UpdatedLastTimeStamp: BigInteger)
var
ADLSESetup: Record "ADLSE Setup";
ADLSESeekData: Report "ADLSE Seek Data";
ADLSEExecution: Codeunit "ADLSE Execution";
Rec: RecordRef;
Expand All @@ -158,7 +164,8 @@ codeunit 82561 "ADLSE Execute"
FieldId: Integer;
SystemCreatedAt: DateTime;
begin
SetFilterForUpdates(TableID, UpdatedLastTimeStamp, Rec, TimeStampField);
ADLSESetup.GetSingleton();
SetFilterForUpdates(TableID, UpdatedLastTimeStamp, ADLSESetup."Skip Timestamp Sorting On Recs", Rec, TimeStampField);

foreach FieldId in FieldIdList do
Rec.AddLoadFields(FieldID);
Expand All @@ -182,15 +189,17 @@ codeunit 82561 "ADLSE Execute"
if SystemCreatedAt = 0DT then
Field.Value(CreateDateTime(DMY2Date(1, 1, 1900), 0T));

if ADLSECommunication.TryCollectAndSendRecord(Rec, TimeStampField.Value(), FlushedTimeStamp) then
UpdatedLastTimeStamp := FlushedTimeStamp
else
if ADLSECommunication.TryCollectAndSendRecord(Rec, TimeStampField.Value(), FlushedTimeStamp) then begin
if UpdatedLastTimeStamp < FlushedTimeStamp then // sample the highest timestamp, to cater to the eventuality that the records do not appear sorted per timestamp
UpdatedLastTimeStamp := FlushedTimeStamp;
end else
Error('%1%2', GetLastErrorText(), GetLastErrorCallStack());
until Rec.Next() = 0;

if ADLSECommunication.TryFinish(FlushedTimeStamp) then
UpdatedLastTimeStamp := FlushedTimeStamp
else
if ADLSECommunication.TryFinish(FlushedTimeStamp) then begin
if UpdatedLastTimeStamp < FlushedTimeStamp then // sample the highest timestamp, to cater to the eventuality that the records do not appear sorted per timestamp
UpdatedLastTimeStamp := FlushedTimeStamp
end else
Error('%1%2', GetLastErrorText(), GetLastErrorCallStack());
end;
if EmitTelemetry then
Expand Down
Loading

0 comments on commit 7eed56e

Please sign in to comment.