Giving an invalid `--stopcp` on start "corrupts" database #4637

MetRonnie · 2022-01-31T18:19:40Z

If you have a not yet run workflow and give an invalid value for --stopcp when starting it for the first time, after the workflow shuts down due to the error, you are left with a workflow database with empty tables.

Because the workflow_params table is empty, if you try to restart the workflow, Cylc thinks the database is corrupted and refuses to play ball.

ERROR - Workflow shutting down - ServiceFileError: Cannot restart - Workflow database is incompatible with Cylc 8.0rc1.dev, or is corrupted

And you can't clean the database because of #4450

$ cylc clean myflow/ --rm .service
ServiceFileError: Cannot clean - Workflow database is incompatible with Cylc 8.0rc1.dev, or is corrupted

Pull requests welcome!
This is an Open Source project - please consider contributing a bug fix
yourself (please read CONTRIBUTING.md before starting any work though).

The text was updated successfully, but these errors were encountered:

wxtim · 2022-02-10T13:22:50Z

Can confirm that I can duplicate this bug.

hjoliver · 2022-02-10T20:23:57Z

I guess we have to make the scheduler deal with this problem at start-up (as apposed to deleting the DB on early shutdown ... or as well as, at least) because it could happen at an unclean shutdown.

MetRonnie · 2022-02-10T21:07:11Z

I think it has to happen on shutdown because you could still keyboard interrupt in between DB creation and population? (Just seemed to happen to me actually)

hjoliver · 2022-02-10T23:53:21Z

Not sure I follow you. Or if you didn't follow me 🤣 ... I should have said "I guess we have to make the scheduler deal with this at restart" (not at the original start-up). I mean, even if we tried to fix this at the source - i.e. delete the "corrupted" database just before premature shutdown, so that it does not affect a later restart - that might not be sufficient, because the scheduler (or the host it runs on) can be killed without a chance to clean up after itself.

wxtim · 2022-02-11T08:29:05Z

I had originally been entirely unclear whether to handle this at shutdown or restart.

Why do it at shutdown

Seems to me like shutdown shouldn't leave the database in a state we're not happy with.
Risk of future accrual of a long-winded set of checks on the database as we discover other db states where this happens.
[Human factor, not a good reason] Implementation seems fairly clear to me.

Why do it at restart

Problem is caused by workflow missing a step during shutdown.
Scheduler might shutdown horribly and not do any cleanup.

My Verdict - shutdown (certainty: low)

I wonder whether the scheduler shutting down horribly is a different problem to the one described where the scheduler never really gets started because the config is broken. I think it's reasonable to rely on the shutdown logic to clean the database in this case.

MetRonnie · 2022-02-11T10:37:30Z

I guess we have to make the scheduler deal with this at restart

Ah, that makes more sense! So if it finds the workflow_params table exists and is empty at start, delete the whole database (logging a warning presumably) and do a start as opposed to restart. (Need to check the table exists because in previous versions of Cylc it was called suite_params, so if the database exists and workflow_params table does not, it means it's an incompatible DB rather than corrupted)

If this functionality is in a self-contained function we could do this both on shutdown (because we ought to) and restart (because shutdown itself can be interrupted too)

wxtim · 2022-02-11T10:42:28Z

That makes sense to me as a proposal @MetRonnie - by making the check self contained I can test it easily and call it wherever if we change our minds later. 😄

wxtim · 2022-02-11T14:42:57Z

From team meeting discussion 20220211T1400Z

Does this apply to cases where the config is malformed as well as opts?
- No
Does this apply to any other opts?
- No
What happens to the database when you reinstall and re-start?
It looks like we should validate options and configs before creating the database.

wxtim · 2022-03-18T11:39:28Z

Blocked pending resolution of #4709

MetRonnie added the bug Something is wrong :( label Jan 31, 2022

MetRonnie added this to the cylc-8.0rc2 milestone Jan 31, 2022

MetRonnie mentioned this issue Jan 31, 2022

Traceback from invalid ISOdatetime point not swallowed. #4630

Closed

wxtim self-assigned this Feb 10, 2022

wxtim mentioned this issue Feb 11, 2022

check for databases with empty WORKFLOW_PARAMS table on workflow shut… #4679

Closed

7 tasks

wxtim mentioned this issue Feb 11, 2022

don't create DB until checks on stopcp are completed #4680

Closed

7 tasks

wxtim added the BLOCKED This can't happen until something else does label Mar 18, 2022

wxtim modified the milestones: cylc-8.0rc2, cylc-8.0rc3 Mar 18, 2022

MetRonnie mentioned this issue Mar 18, 2022

Fix stop after cycle point inconsistency with other cycle point options #4709

Closed

hjoliver removed the BLOCKED This can't happen until something else does label Apr 11, 2022

hjoliver modified the milestones: cylc-8.0rc3, cylc-8.0rc4 Apr 11, 2022

MetRonnie mentioned this issue Apr 19, 2022

Move processing of stop cycle point to config.py #4827

Merged

7 tasks

MetRonnie linked a pull request Apr 20, 2022 that will close this issue

Move processing of stop cycle point to config.py #4827

Merged

7 tasks

MetRonnie assigned MetRonnie and unassigned wxtim May 3, 2022

oliver-sanders closed this as completed in #4827 May 3, 2022

MetRonnie modified the milestones: cylc-8.0rc4, cylc-8.0rc3 May 3, 2022

MetRonnie mentioned this issue Dec 20, 2022

play: upgrade DB *after* user upgrade confirmation #5271

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Giving an invalid `--stopcp` on start "corrupts" database #4637

Giving an invalid `--stopcp` on start "corrupts" database #4637

MetRonnie commented Jan 31, 2022

wxtim commented Feb 10, 2022

hjoliver commented Feb 10, 2022

MetRonnie commented Feb 10, 2022

hjoliver commented Feb 10, 2022

wxtim commented Feb 11, 2022 •

edited

Loading

MetRonnie commented Feb 11, 2022 •

edited

Loading

wxtim commented Feb 11, 2022

wxtim commented Feb 11, 2022 •

edited

Loading

wxtim commented Mar 18, 2022 •

edited by MetRonnie

Loading

Giving an invalid --stopcp on start "corrupts" database #4637

Giving an invalid --stopcp on start "corrupts" database #4637

Comments

MetRonnie commented Jan 31, 2022

wxtim commented Feb 10, 2022

hjoliver commented Feb 10, 2022

MetRonnie commented Feb 10, 2022

hjoliver commented Feb 10, 2022

wxtim commented Feb 11, 2022 • edited Loading

Why do it at shutdown

Why do it at restart

My Verdict - shutdown (certainty: low)

MetRonnie commented Feb 11, 2022 • edited Loading

wxtim commented Feb 11, 2022

wxtim commented Feb 11, 2022 • edited Loading

From team meeting discussion 20220211T1400Z

wxtim commented Mar 18, 2022 • edited by MetRonnie Loading

Giving an invalid `--stopcp` on start "corrupts" database #4637

Giving an invalid `--stopcp` on start "corrupts" database #4637

wxtim commented Feb 11, 2022 •

edited

Loading

MetRonnie commented Feb 11, 2022 •

edited

Loading

wxtim commented Feb 11, 2022 •

edited

Loading

wxtim commented Mar 18, 2022 •

edited by MetRonnie

Loading