-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MARXAN-1616-scheduled-geodata-cleanup #1164
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
22eecc1
to
78f34cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this - and even more, your patience in navigating uncharted territory and the overall madness of my plans.
I am not sure about the approach you chose - see my inline notes: I understand your point (reusing existing logic), but I also find it tempting to be done in much less code.
You've been thinking this through in way more detail than myself though, so you may have reservations about this.
...pps/geoprocessing/src/migrations/geoprocessing/1657120155233-CreateProjectNukePreparation.ts
Outdated
Show resolved
Hide resolved
api/apps/geoprocessing/src/modules/cleanup-tasks/cleanup-tasks.service.ts
Outdated
Show resolved
Hide resolved
api/apps/geoprocessing/src/modules/cleanup-tasks/cleanup-tasks.service.ts
Outdated
Show resolved
Hide resolved
api/apps/geoprocessing/src/modules/cleanup-tasks/cleanup-tasks.service.ts
Show resolved
Hide resolved
api/apps/geoprocessing/src/modules/cleanup-tasks/cleanup-tasks.service.ts
Outdated
Show resolved
Hide resolved
api/apps/geoprocessing/src/modules/cleanup-tasks/cleanup-tasks.service.ts
Outdated
Show resolved
Hide resolved
); | ||
const missingScenarioIdsFromScenariosPuData: entitiesWithScenarioId[] = await this.geoEntityManager.query( | ||
`SELECT spd.scenario_id | ||
FROM scenarios_pu_data spd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not need this as db cascades should take care of deleting rows from this table when we delete the related rows from projects_pu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But all this cleanup is done just in case something gets wonky and on cascade do not delete anything properly, right? I mean, other tables I referenced here have also on cascades and I am cleaning them up anyways, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general, we should trust postgresql on cascades - if there are other entities that will be deleted via cascades and we're explicitly deleting them here, then we should avoid doing this (it may create more trouble than it tries to solve, especially if done outside of a db transaction) - I checked the entities we are deleting in your original implementation last week and IIRC this was the only one that was already covered by cascades.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr basically this task is about manually triggering deletes that would be left to sql cascades if all the data were to be in a single db, but when we can rely on db cascades within geodb
, then we should let postgresql work for us 😃
}); | ||
} | ||
|
||
@Cron(CronExpression.EVERY_1ST_DAY_OF_MONTH_AT_MIDNIGHT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's very much ok to hardcode the interval here, though I would still advise to move this to a config
item, so that
- it's easier to find settings that admins may want to tweak in the future
- it can be overridden per-environment, in case admins so wish (prod vs staging vs testing, etc - the testing one may be useful in fact)
In order to do so we'd likely need to use the textual cron syntax, but I'd say that's ok.
And by all means, I would default to at the very most once a day, not more than that - but ideally every six hours or so.
Not because I expect this cleanup task to find so many dangling objects so frequently - these should all be an exception - but because in case of restarts of the Node process(es) due to new versions being deployed, restarts after failures, etc. especially close to the moment the scheduled cron event is going to become due in the event loop, we may risk skipping it and waiting until the next scheduled occurrence - so a monthly occurrence may be problematic.
On the other hand, almost all the tables we scan for dangling items have an index on project_id
or scenario_id
(as relevant), except blm_<partial|final>_results
, but I don't think this index is going to be used in a negative constraint (not in...
), so it may still make sense to use the set difference query you used in your implementation, but then use the resulting set to delete ... where <scenario|project>_id in (<the set of scenario_ids not belonging to any current scenario>)
- but this would need some digging into query planning at scale... I'd not be too concerned at this stage 😏
1dca0c8
to
3b06269
Compare
3f429a8
to
c98f22e
Compare
c98f22e
to
d7d8d2a
Compare
b9025dc
to
0717105
Compare
-Adds Chron job (once per month) that will try to find any non-matching data for projects and scenarios and nuke it.
-It uses the same Project/Scenario Cleanup process that @angelhigueraacid introduced recently.
-It does not have tests for now (the Chron job set up to 50 seconds appears to do something, but maybe it is a good idea to test it in a non-used environment?).