Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/spline 684 data retention 2 #1113

Merged
merged 16 commits into from
Nov 23, 2022
Merged

Conversation

dk1844
Copy link
Contributor

@dk1844 dk1844 commented Oct 18, 2022

This PR brings back unmerged PR #762.

The original service wrapper has been generally used (while brought up to date with latest develop). The internals of the foxx service where the db-prune actually happens has been taken from https://gist.github.com/Aditya-Sood/ecc07c9f296dbdf03d4946c5d1b4efce (#684 (comment)).

Naively tested.

Based on the measuring of the implementation options for Stage 1 and Stage 2, Option A for Stage 1 and Option A for Stage 2 have been selected for the implementation.

…b.com/Aditya-Sood/ecc07c9f296dbdf03d4946c5d1b4efce - naively tested with test data (multiple lineages at different times - purge with time between - correct outcome - older purged, newer kept)
% Conflicts:
%	admin/src/main/scala/za/co/absa/spline/arango/ArangoManager.scala
%	arangodb-foxx-services/src/main/routes/index.ts
arangodb-foxx-services/src/main/routes/admin-router.ts Outdated Show resolved Hide resolved
arangodb-foxx-services/src/main/routes/admin-router.ts Outdated Show resolved Hide resolved

import {aql, db} from '@arangodb'

export function pruneBefore(timestamp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there any specific reason why you preferred breaking one complex AQL query into a series of smaller queries?
Although such approach is probably better from the ArangoDB memory perspective, but I am worried about transferring all those intermediate results (IDs) from AQL engine to V8 and back, which I'm sure are not sharing any memory. Also, it could result in less optimal AQL execution plan as the AQL optimizer does not see into the Foxx function.

I would suggest to try to combine some queries into bigger blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is maily because the most of the code comes from https://gist.github.com/Aditya-Sood/ecc07c9f296dbdf03d4946c5d1b4efce script - adapted only where needed for our needs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it some more, for example in case of the looping for the collections to purge in stage 2, that can be done in AQL -- but in that case we would have to forgo the logging (I don't know a way to write to logs from AQL directly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would give it a try as my logic tells me that it would be more correct approach. Logging is not important here IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I tend to think that logging here is crucial. For large pruning, the individual parts can take minutes/tens of minutes and without logging one is blind on what is happening in the DB -- it could run for hours without any sign of the current state.

Here, I would strive to keep it as is for now.

arangodb-foxx-services/src/main/services/prune-database.ts Outdated Show resolved Hide resolved
arangodb-foxx-services/src/main/services/prune-database.ts Outdated Show resolved Hide resolved
arangodb-foxx-services/src/main/services/prune-database.ts Outdated Show resolved Hide resolved
arangodb-foxx-services/src/main/services/prune-database.ts Outdated Show resolved Hide resolved
@dk1844 dk1844 marked this pull request as ready for review November 15, 2022 12:42
Copy link
Contributor

@wajda wajda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, excepts from a minor logging comment

`).toArray()

const t1 = Date.now()
console.log(`Purged ${refPlanIds.length} plans, ${t1 - t0} ms`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use Logger helper object instead of console for logging.

    import * as Logger from '../utils/logger'
    ...
    Logger.info(`Purged ${refPlanIds.length} plans, ${t1 - t0} ms`)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, redone as suggested. I have test-ran it with this logging, too, and witnessed it correctly shows log entries on the ArangoDB LOG UI.

wajda
wajda previously approved these changes Nov 16, 2022
@sonarcloud
Copy link

sonarcloud bot commented Nov 21, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 4 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

val ZonedDateTimeRegexp(ldt, tzOffset, tzId) = s
val maybeTzIds = Seq(tzId, tzOffset).map(Option.apply)

require(!maybeTzIds.forall(_.isDefined), "Either timezone ID or offset should be specified, not both")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZonedDateTime.parse allows both offset and name at the same time . I think it's good idea to accept all inputs that are valid for ZonedDateTime.parse

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with allowing only geographical zone id is that it is ambiguous. Not unique.

2022-10-30T02:30:00[Europe/Prague] is both:
2022-10-30T02:30+02:00[Europe/Prague]
2022-10-30T02:30+01:00[Europe/Prague]

This is the day and hour of change from summer time to winter time:

val start = ZonedDateTime.parse("2022-10-30T02:30:00+02:00[Europe/Prague]")
val end = start.plusHours(1)
// start: java.time.ZonedDateTime = 2022-10-30T02:30+02:00[Europe/Prague]
// end: java.time.ZonedDateTime =   2022-10-30T02:30+01:00[Europe/Prague]

Another interesting behaviour (not so relevant for us though) is this:

val start = ZonedDateTime.parse("2022-10-30T02:30:00+02:00")
val end = start.plusHours(1)
// start: java.time.ZonedDateTime = 2022-10-30T02:30+02:00
// end: java.time.ZonedDateTime   = 2022-10-30T03:30+02:00

When no geographical zone id is provided the ZonedDateTime will stay in the same offset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed to solve this as part of other ticket #1139

@dk1844 dk1844 merged commit 7ea4c49 into develop Nov 23, 2022
@dk1844 dk1844 deleted the feature/spline-684-data-retention-2 branch November 23, 2022 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants