Regularly check that recipe flags are still valid #911

benoit74 · 2024-01-31T07:20:17Z

Every once in a while, offliners definitions (list of flags & validation constraints) are updated to reflect changes in scrapers capabilities / usage. However, no check on existing recipes flags are performed, meaning they could be incorrect given the new offliners definitions

We should check:

does the flag still exists for given offliner?
is the flag value still valid?

kelson42 · 2024-01-31T07:22:54Z

We don't need to check it "regularly" IMO, wee need to check it when we change a scraper cmd line interface. This is mostly a procedural problem. How can we secure that? I would propose that introduction of new version of scraper go through a precise list of checks. But maybe there is a better approach.

benoit74 · 2024-01-31T07:28:28Z

You're right.

We could have a maintenance script which is ran after all offliners definitions update and reports inconsistencies in recipes.

And then fix inconsistencies manually either via manual SQL queries or manual recipe updates through the UI; I don't believe fixing them automatically is a good thing because there are many different situations to consider + we might want to not always fix all of them (there is only one offliner definition but this is in fact linked to the scraper version used, so in some situations we might want to continue to use an former scraper version and hence a former set of flags, like we currently have with zimit2 and zimit1 ... except the flags have not changed)

kelson42 · 2024-01-31T07:36:03Z

How I use to implement this is:

Create a dedicated directory aiming to store the dataset updates
A dataset update is typically a check and (if valid) applies the change to the DB
This dataset updates scripts (over time you might have dozen of them) are run at each deployment

But I would first implement anyway the checklist to solve the problem with a process.

benoit74 · 2024-01-31T08:18:45Z

Sorry, I don't get it. Which datasets are you speaking about? If you speak about offliners definitions, these are code, and there is no plan to move this to dataset (even in #886 we agreed we will move only the description to a data-oriented thing).

kelson42 · 2024-01-31T09:09:42Z

@benoit74 The dataset are the recipes here, maybe even tasks.

benoit74 · 2024-01-31T09:59:06Z

We already have SQL update scripts, which are aimed at updating the DB schema and data. They are automatically applied at each application restart. They are ran inside a transaction, so should any of them fail the update stops.

I'm not sure this is a solution however because:

these update scripts are already possible but nothing in the process helps / forces us to create these update scripts
these update scripts would be very complex to write since we also want to check data validity, which is probably not an easy feat in a DB update script
these update scripts are meant to run only once, so we won't be able to check again that all recipes are valid if needed

Let's have a look at it once we work on this issue, maybe this is still sufficient.

benoit74 added the enhancement label Jan 31, 2024

This was referenced Jan 31, 2024

Review all input validations #783

Open

Unable to correct devdoc.io recipe #910

Closed

kelson42 assigned benoit74 Feb 9, 2024

kelson42 added the prio2 label Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regularly check that recipe flags are still valid #911

Regularly check that recipe flags are still valid #911

benoit74 commented Jan 31, 2024

kelson42 commented Jan 31, 2024 •

edited

Loading

benoit74 commented Jan 31, 2024

kelson42 commented Jan 31, 2024 •

edited

Loading

benoit74 commented Jan 31, 2024

kelson42 commented Jan 31, 2024

benoit74 commented Jan 31, 2024

Regularly check that recipe flags are still valid #911

Regularly check that recipe flags are still valid #911

Comments

benoit74 commented Jan 31, 2024

kelson42 commented Jan 31, 2024 • edited Loading

benoit74 commented Jan 31, 2024

kelson42 commented Jan 31, 2024 • edited Loading

benoit74 commented Jan 31, 2024

kelson42 commented Jan 31, 2024

benoit74 commented Jan 31, 2024

kelson42 commented Jan 31, 2024 •

edited

Loading

kelson42 commented Jan 31, 2024 •

edited

Loading