Feature/spline 684 data retention 2 #1113

dk1844 · 2022-10-18T14:35:07Z

This PR brings back unmerged PR #762.

The original service wrapper has been generally used (while brought up to date with latest develop). The internals of the foxx service where the db-prune actually happens has been taken from https://gist.github.com/Aditya-Sood/ecc07c9f296dbdf03d4946c5d1b4efce (#684 (comment)).

Naively tested.

Based on the measuring of the implementation options for Stage 1 and Stage 2, Option A for Stage 1 and Option A for Stage 2 have been selected for the implementation.

…with current develop - TODO test

…b.com/Aditya-Sood/ecc07c9f296dbdf03d4946c5d1b4efce - naively tested with test data (multiple lineages at different times - purge with time between - correct outcome - older purged, newer kept)

% Conflicts: % admin/src/main/scala/za/co/absa/spline/arango/ArangoManager.scala % arangodb-foxx-services/src/main/routes/index.ts

arangodb-foxx-services/src/main/routes/admin-router.ts

admin/src/main/scala/za/co/absa/spline/arango/DataRetentionManager.scala

wajda · 2022-10-24T14:10:08Z

arangodb-foxx-services/src/main/services/prune-database.ts

+
+import {aql, db} from '@arangodb'
+
+export function pruneBefore(timestamp) {


Was there any specific reason why you preferred breaking one complex AQL query into a series of smaller queries?
Although such approach is probably better from the ArangoDB memory perspective, but I am worried about transferring all those intermediate results (IDs) from AQL engine to V8 and back, which I'm sure are not sharing any memory. Also, it could result in less optimal AQL execution plan as the AQL optimizer does not see into the Foxx function.

I would suggest to try to combine some queries into bigger blocks.

This is maily because the most of the code comes from https://gist.github.com/Aditya-Sood/ecc07c9f296dbdf03d4946c5d1b4efce script - adapted only where needed for our needs.

I thought about it some more, for example in case of the looping for the collections to purge in stage 2, that can be done in AQL -- but in that case we would have to forgo the logging (I don't know a way to write to logs from AQL directly).

I would give it a try as my logic tells me that it would be more correct approach. Logging is not important here IMO.

Actually, I tend to think that logging here is crucial. For large pruning, the individual parts can take minutes/tens of minutes and without logging one is blind on what is happening in the DB -- it could run for hours without any sign of the current state.

Here, I would strive to keep it as is for now.

arangodb-foxx-services/src/main/services/prune-database.ts

…n-2' into feature/spline-684-data-retention-2

wajda

LGTM, excepts from a minor logging comment

wajda · 2022-11-15T14:16:58Z

arangodb-foxx-services/src/main/services/prune-database.ts

+ `).toArray()
+
+ const t1 = Date.now()
+ console.log(`Purged ${refPlanIds.length} plans, ${t1 - t0} ms`)


Let's use Logger helper object instead of console for logging.

import * as Logger from '../utils/logger' ... Logger.info(`Purged ${refPlanIds.length} plans, ${t1 - t0} ms`)

Thanks, redone as suggested. I have test-ran it with this logging, too, and witnessed it correctly shows log entries on the ArangoDB LOG UI.

admin/src/main/scala/za/co/absa/spline/admin/DateTimeUtils.scala

admin/src/main/scala/za/co/absa/spline/admin/commands.scala

admin/src/main/scala/za/co/absa/spline/arango/AutoClosingArangoManagerProxy.scala

sonarcloud · 2022-11-21T15:24:26Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
4 Code Smells

No Coverage information
0.0% Duplication

cerveada · 2022-11-22T15:13:57Z

admin/src/main/scala/za/co/absa/spline/admin/DateTimeUtils.scala

+ val ZonedDateTimeRegexp(ldt, tzOffset, tzId) = s
+ val maybeTzIds = Seq(tzId, tzOffset).map(Option.apply)
+
+ require(!maybeTzIds.forall(_.isDefined), "Either timezone ID or offset should be specified, not both")


ZonedDateTime.parse allows both offset and name at the same time . I think it's good idea to accept all inputs that are valid for ZonedDateTime.parse

The issue with allowing only geographical zone id is that it is ambiguous. Not unique.

2022-10-30T02:30:00[Europe/Prague] is both: 2022-10-30T02:30+02:00[Europe/Prague] 2022-10-30T02:30+01:00[Europe/Prague]

This is the day and hour of change from summer time to winter time:

val start = ZonedDateTime.parse("2022-10-30T02:30:00+02:00[Europe/Prague]") val end = start.plusHours(1) // start: java.time.ZonedDateTime = 2022-10-30T02:30+02:00[Europe/Prague] // end: java.time.ZonedDateTime = 2022-10-30T02:30+01:00[Europe/Prague]

Another interesting behaviour (not so relevant for us though) is this:

val start = ZonedDateTime.parse("2022-10-30T02:30:00+02:00") val end = start.plusHours(1) // start: java.time.ZonedDateTime = 2022-10-30T02:30+02:00 // end: java.time.ZonedDateTime = 2022-10-30T03:30+02:00

When no geographical zone id is provided the ZonedDateTime will stay in the same offset.

We agreed to solve this as part of other ticket #1139

dk1844 added 6 commits October 17, 2022 14:29

#684 data retention - original code from PR#762 - brought up to date …

a3fcaca

…with current develop - TODO test

#684 data retention - update to reflect logic from https://gist.githu…

6451f9c

…b.com/Aditya-Sood/ecc07c9f296dbdf03d4946c5d1b4efce - naively tested with test data (multiple lineages at different times - purge with time between - correct outcome - older purged, newer kept)

Merge branch 'develop' into feature/spline-684-data-retention-2

14cc8a0

% Conflicts: % admin/src/main/scala/za/co/absa/spline/arango/ArangoManager.scala % arangodb-foxx-services/src/main/routes/index.ts

Merge branch 'develop' into feature/spline-684-data-retention-2

3555bd8

#684 minor updates

c5e7a36

#684 Sonar/Codacy code quality updates

e366788

wajda reviewed Oct 24, 2022

View reviewed changes

dk1844 added 7 commits October 25, 2022 13:56

Merge branch 'develop' into feature/spline-684-data-retention-2

f97c309

#684 prune db time measurment logging added

26cfa2c

Merge branch 'develop' into feature/spline-684-data-retention-2

9ea591b

Merge remote-tracking branch 'origin/feature/spline-684-data-retentio…

f79f318

…n-2' into feature/spline-684-data-retention-2

db prune - stage1 option A, stage 2 option A applied

74c0236

review suggestions applied 1

70345e5

#684 dataSourceKeysInAffectsDependsUseArray in one query + toArray fix

35d97d0

dk1844 marked this pull request as ready for review November 15, 2022 12:42

dk1844 requested a review from cerveada as a code owner November 15, 2022 12:42

wajda reviewed Nov 15, 2022

View reviewed changes

#684 PR review - logging update

1bfb6be

cerveada reviewed Nov 16, 2022

View reviewed changes

admin/src/main/scala/za/co/absa/spline/admin/DateTimeUtils.scala Show resolved Hide resolved

cerveada reviewed Nov 16, 2022

View reviewed changes

admin/src/main/scala/za/co/absa/spline/admin/commands.scala Outdated Show resolved Hide resolved

cerveada reviewed Nov 16, 2022

View reviewed changes

admin/src/main/scala/za/co/absa/spline/arango/AutoClosingArangoManagerProxy.scala Outdated Show resolved Hide resolved

wajda previously approved these changes Nov 16, 2022

View reviewed changes

dk1844 added 2 commits November 21, 2022 15:55

Merge branch 'develop' into feature/spline-684-data-retention-2

1f9c46e

#684 indentation fix

fb72253

dk1844 dismissed wajda’s stale review via fb72253 November 21, 2022 15:23

dk1844 requested review from wajda and cerveada November 22, 2022 14:40

cerveada reviewed Nov 22, 2022

View reviewed changes

cerveada approved these changes Nov 23, 2022

View reviewed changes

dk1844 merged commit 7ea4c49 into develop Nov 23, 2022

dk1844 deleted the feature/spline-684-data-retention-2 branch November 23, 2022 10:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/spline 684 data retention 2 #1113

Feature/spline 684 data retention 2 #1113

dk1844 commented Oct 18, 2022 •

edited

Loading

wajda Oct 24, 2022

dk1844 Nov 14, 2022

dk1844 Nov 15, 2022

wajda Nov 16, 2022

dk1844 Nov 22, 2022

wajda left a comment •

edited

Loading

wajda Nov 15, 2022

dk1844 Nov 16, 2022

sonarcloud bot commented Nov 21, 2022

cerveada Nov 22, 2022

cerveada Nov 22, 2022

cerveada Nov 23, 2022


		import {aql, db} from '@arangodb'

		export function pruneBefore(timestamp) {

Feature/spline 684 data retention 2 #1113

Feature/spline 684 data retention 2 #1113

Conversation

dk1844 commented Oct 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wajda left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarcloud bot commented Nov 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dk1844 commented Oct 18, 2022 •

edited

Loading

wajda left a comment •

edited

Loading