Skip to content
This repository has been archived by the owner on Sep 6, 2023. It is now read-only.

Error during export should save last exported timestamp #109

Merged
merged 8 commits into from
Jun 6, 2023

Conversation

DuttaSoumya
Copy link
Contributor

Two issues addressed,

  1. Exporting large amounts of data to the lake caused a query timeout issue when sorting the records as per the row version. Sorting is important as it helps ensure that the subsequent exports restart from records which were not exported during the last run. The timeout issue was caused by this sorting over large number of records. So a new flag is introduced that allows unsorted records to be uploaded to the lake. This is to be used only as a temporary measure and should be disabled once the data has been uploaded to the lake.

  2. An error occurring late in the export process forces the system to start from the first record when the export is invoked again. So the system has been made robust by saving the last timestamps even in case of errors. This helps subsequent exports to "catch up" from the time the last export went into the lake.

Soumya Dutta added 3 commits April 12, 2023 11:02
@RonKoppelaar
Copy link
Contributor

I've tested this PR in a customer environment. And all is working fine. If we enable the setting "Skip row version sorting" we were able to export the data of 45.000K of GL/Entries. After the full export completed, an incremental update could be done with "Skip row version sorting" set to False. From my end this PR maybe approved and put into main.

@DuttaSoumya DuttaSoumya merged commit 7eed56e into main Jun 6, 2023
@DuttaSoumya DuttaSoumya deleted the ErrorDuringExportShouldSaveLastExportedTimestamp branch June 6, 2023 08:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants