-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/spline 684 data retention 2 #1113
Conversation
…with current develop - TODO test
…b.com/Aditya-Sood/ecc07c9f296dbdf03d4946c5d1b4efce - naively tested with test data (multiple lineages at different times - purge with time between - correct outcome - older purged, newer kept)
% Conflicts: % admin/src/main/scala/za/co/absa/spline/arango/ArangoManager.scala % arangodb-foxx-services/src/main/routes/index.ts
admin/src/main/scala/za/co/absa/spline/arango/DataRetentionManager.scala
Outdated
Show resolved
Hide resolved
|
||
import {aql, db} from '@arangodb' | ||
|
||
export function pruneBefore(timestamp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there any specific reason why you preferred breaking one complex AQL query into a series of smaller queries?
Although such approach is probably better from the ArangoDB memory perspective, but I am worried about transferring all those intermediate results (IDs) from AQL engine to V8 and back, which I'm sure are not sharing any memory. Also, it could result in less optimal AQL execution plan as the AQL optimizer does not see into the Foxx function.
I would suggest to try to combine some queries into bigger blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is maily because the most of the code comes from https://gist.github.com/Aditya-Sood/ecc07c9f296dbdf03d4946c5d1b4efce script - adapted only where needed for our needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it some more, for example in case of the looping for the collections to purge in stage 2, that can be done in AQL -- but in that case we would have to forgo the logging (I don't know a way to write to logs from AQL directly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would give it a try as my logic tells me that it would be more correct approach. Logging is not important here IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I tend to think that logging here is crucial. For large pruning, the individual parts can take minutes/tens of minutes and without logging one is blind on what is happening in the DB -- it could run for hours without any sign of the current state.
Here, I would strive to keep it as is for now.
…n-2' into feature/spline-684-data-retention-2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, excepts from a minor logging comment
`).toArray() | ||
|
||
const t1 = Date.now() | ||
console.log(`Purged ${refPlanIds.length} plans, ${t1 - t0} ms`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use Logger
helper object instead of console
for logging.
import * as Logger from '../utils/logger'
...
Logger.info(`Purged ${refPlanIds.length} plans, ${t1 - t0} ms`)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, redone as suggested. I have test-ran it with this logging, too, and witnessed it correctly shows log entries on the ArangoDB LOG UI.
admin/src/main/scala/za/co/absa/spline/arango/AutoClosingArangoManagerProxy.scala
Outdated
Show resolved
Hide resolved
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
val ZonedDateTimeRegexp(ldt, tzOffset, tzId) = s | ||
val maybeTzIds = Seq(tzId, tzOffset).map(Option.apply) | ||
|
||
require(!maybeTzIds.forall(_.isDefined), "Either timezone ID or offset should be specified, not both") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZonedDateTime.parse
allows both offset and name at the same time . I think it's good idea to accept all inputs that are valid for ZonedDateTime.parse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with allowing only geographical zone id is that it is ambiguous. Not unique.
2022-10-30T02:30:00[Europe/Prague] is both:
2022-10-30T02:30+02:00[Europe/Prague]
2022-10-30T02:30+01:00[Europe/Prague]
This is the day and hour of change from summer time to winter time:
val start = ZonedDateTime.parse("2022-10-30T02:30:00+02:00[Europe/Prague]")
val end = start.plusHours(1)
// start: java.time.ZonedDateTime = 2022-10-30T02:30+02:00[Europe/Prague]
// end: java.time.ZonedDateTime = 2022-10-30T02:30+01:00[Europe/Prague]
Another interesting behaviour (not so relevant for us though) is this:
val start = ZonedDateTime.parse("2022-10-30T02:30:00+02:00")
val end = start.plusHours(1)
// start: java.time.ZonedDateTime = 2022-10-30T02:30+02:00
// end: java.time.ZonedDateTime = 2022-10-30T03:30+02:00
When no geographical zone id is provided the ZonedDateTime will stay in the same offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed to solve this as part of other ticket #1139
This PR brings back unmerged PR #762.
The original service wrapper has been generally used (while brought up to date with latest develop). The internals of the foxx service where the db-prune actually happens has been taken from https://gist.github.com/Aditya-Sood/ecc07c9f296dbdf03d4946c5d1b4efce (#684 (comment)).
Naively tested.
Based on the measuring of the implementation options for Stage 1 and Stage 2, Option A for Stage 1 and Option A for Stage 2 have been selected for the implementation.