MARXAN-1616-scheduled-geodata-cleanup #1164

rubvalave · 2022-07-08T08:41:24Z

-Adds Chron job (once per month) that will try to find any non-matching data for projects and scenarios and nuke it.
-It uses the same Project/Scenario Cleanup process that @angelhigueraacid introduced recently.
-It does not have tests for now (the Chron job set up to 50 seconds appears to do something, but maybe it is a good idea to test it in a non-used environment?).

vercel · 2022-07-08T08:41:26Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
marxan	✅ Ready (Inspect)	Visit Preview	Jul 11, 2022 at 1:15PM (UTC)
marxan-storybook	✅ Ready (Inspect)	Visit Preview	Jul 11, 2022 at 1:15PM (UTC)

hotzevzl

thanks for this - and even more, your patience in navigating uncharted territory and the overall madness of my plans.

I am not sure about the approach you chose - see my inline notes: I understand your point (reusing existing logic), but I also find it tempting to be done in much less code.

You've been thinking this through in way more detail than myself though, so you may have reservations about this.

...pps/geoprocessing/src/migrations/geoprocessing/1657120155233-CreateProjectNukePreparation.ts

api/apps/geoprocessing/src/modules/cleanup-tasks/cleanup-tasks.service.ts

hotzevzl · 2022-07-08T14:53:06Z

api/apps/geoprocessing/src/modules/cleanup-tasks/cleanup-tasks.service.ts

+      );
+      const missingScenarioIdsFromScenariosPuData: entitiesWithScenarioId[] = await this.geoEntityManager.query(
+        `SELECT spd.scenario_id
+          FROM scenarios_pu_data spd


I think we should not need this as db cascades should take care of deleting rows from this table when we delete the related rows from projects_pu

But all this cleanup is done just in case something gets wonky and on cascade do not delete anything properly, right? I mean, other tables I referenced here have also on cascades and I am cleaning them up anyways, right?

in general, we should trust postgresql on cascades - if there are other entities that will be deleted via cascades and we're explicitly deleting them here, then we should avoid doing this (it may create more trouble than it tries to solve, especially if done outside of a db transaction) - I checked the entities we are deleting in your original implementation last week and IIRC this was the only one that was already covered by cascades.

tl;dr basically this task is about manually triggering deletes that would be left to sql cascades if all the data were to be in a single db, but when we can rely on db cascades within geodb, then we should let postgresql work for us 😃

hotzevzl · 2022-07-08T15:23:07Z

api/apps/geoprocessing/src/modules/cleanup-tasks/cleanup-tasks.service.ts

+    });
+  }
+
+  @Cron(CronExpression.EVERY_1ST_DAY_OF_MONTH_AT_MIDNIGHT)


it's very much ok to hardcode the interval here, though I would still advise to move this to a config item, so that

it's easier to find settings that admins may want to tweak in the future

it can be overridden per-environment, in case admins so wish (prod vs staging vs testing, etc - the testing one may be useful in fact)

In order to do so we'd likely need to use the textual cron syntax, but I'd say that's ok.

And by all means, I would default to at the very most once a day, not more than that - but ideally every six hours or so.

Not because I expect this cleanup task to find so many dangling objects so frequently - these should all be an exception - but because in case of restarts of the Node process(es) due to new versions being deployed, restarts after failures, etc. especially close to the moment the scheduled cron event is going to become due in the event loop, we may risk skipping it and waiting until the next scheduled occurrence - so a monthly occurrence may be problematic.

On the other hand, almost all the tables we scan for dangling items have an index on project_id or scenario_id (as relevant), except blm_<partial|final>_results, but I don't think this index is going to be used in a negative constraint (not in...), so it may still make sense to use the set difference query you used in your implementation, but then use the resulting set to delete ... where <scenario|project>_id in (<the set of scenario_ids not belonging to any current scenario>) - but this would need some digging into query planning at scale... I'd not be too concerned at this stage 😏

…anup

…on config

ENV_VARS.md

rubvalave requested review from hotzevzl and angelhigueraacid July 8, 2022 08:41

vercel bot deployed to Preview – marxan July 8, 2022 08:42 View deployment

vercel bot deployed to Preview – marxan-storybook July 8, 2022 08:43 View deployment

rubvalave force-pushed the MARXAN-1616-scheduled-geodata-cleanup branch from 22eecc1 to 78f34cd Compare July 8, 2022 12:15

vercel bot deployed to Preview – marxan July 8, 2022 12:18 View deployment

vercel bot deployed to Preview – marxan-storybook July 8, 2022 12:19 View deployment

hotzevzl reviewed Jul 8, 2022

View reviewed changes

api/apps/geoprocessing/src/modules/cleanup-tasks/cleanup-tasks.service.ts Outdated Show resolved Hide resolved

hotzevzl reviewed Jul 8, 2022

View reviewed changes

vercel bot deployed to Preview – marxan July 11, 2022 09:56 View deployment

vercel bot deployed to Preview – marxan-storybook July 11, 2022 09:57 View deployment

rubvalave added 5 commits July 11, 2022 12:55

feat: adds cleanup-tasks-module

e6b5a8c

feat: adds migrations for new table, list all necessary repos for cle…

00fcb41

…anup

feat: implements cleanup tasks on cron job

e65ce5a

feat: adds projectCustomFeatures from cleanup-task

bd9a5a4

feat: slightly changes queries in cleanup to delete directly, adds cr…

3b06269

…on config

rubvalave force-pushed the MARXAN-1616-scheduled-geodata-cleanup branch from 1dca0c8 to 3b06269 Compare July 11, 2022 10:55

vercel bot deployed to Preview – marxan-storybook July 11, 2022 10:59 View deployment

vercel bot deployed to Preview – marxan July 11, 2022 11:00 View deployment

vercel bot deployed to Preview – marxan-storybook July 11, 2022 11:08 View deployment

vercel bot deployed to Preview – marxan July 11, 2022 11:10 View deployment

rubvalave force-pushed the MARXAN-1616-scheduled-geodata-cleanup branch from 3f429a8 to c98f22e Compare July 11, 2022 11:20

vercel bot deployed to Preview – marxan July 11, 2022 11:23 View deployment

vercel bot deployed to Preview – marxan-storybook July 11, 2022 11:24 View deployment

feat: fixes cleanup interval not present in config files

d7d8d2a

rubvalave force-pushed the MARXAN-1616-scheduled-geodata-cleanup branch from c98f22e to d7d8d2a Compare July 11, 2022 11:51

vercel bot deployed to Preview – marxan-storybook July 11, 2022 11:52 View deployment

vercel bot deployed to Preview – marxan July 11, 2022 11:54 View deployment

vercel bot deployed to Preview – marxan July 11, 2022 12:41 View deployment

vercel bot deployed to Preview – marxan-storybook July 11, 2022 12:45 View deployment

hotzevzl approved these changes Jul 11, 2022

View reviewed changes

ENV_VARS.md Show resolved Hide resolved

feat: adds Env_vars.md info about new var, remove delay from geo in api

0717105

rubvalave force-pushed the MARXAN-1616-scheduled-geodata-cleanup branch from b9025dc to 0717105 Compare July 11, 2022 13:12

vercel bot deployed to Preview – marxan-storybook July 11, 2022 13:13 View deployment

vercel bot deployed to Preview – marxan July 11, 2022 13:15 View deployment

rubvalave merged commit 8f553c2 into develop Jul 12, 2022

rubvalave deleted the MARXAN-1616-scheduled-geodata-cleanup branch July 12, 2022 10:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MARXAN-1616-scheduled-geodata-cleanup #1164

MARXAN-1616-scheduled-geodata-cleanup #1164

rubvalave commented Jul 8, 2022

vercel bot commented Jul 8, 2022 •

edited

Loading

hotzevzl left a comment

hotzevzl Jul 8, 2022

rubvalave Jul 11, 2022

hotzevzl Jul 11, 2022

hotzevzl Jul 11, 2022

hotzevzl Jul 8, 2022

MARXAN-1616-scheduled-geodata-cleanup #1164

MARXAN-1616-scheduled-geodata-cleanup #1164

Conversation

rubvalave commented Jul 8, 2022

vercel bot commented Jul 8, 2022 • edited Loading

hotzevzl left a comment

Choose a reason for hiding this comment

hotzevzl Jul 8, 2022

Choose a reason for hiding this comment

rubvalave Jul 11, 2022

Choose a reason for hiding this comment

hotzevzl Jul 11, 2022

Choose a reason for hiding this comment

hotzevzl Jul 11, 2022

Choose a reason for hiding this comment

hotzevzl Jul 8, 2022

Choose a reason for hiding this comment

vercel bot commented Jul 8, 2022 •

edited

Loading