Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regularly check that recipe flags are still valid #911

Open
benoit74 opened this issue Jan 31, 2024 · 6 comments
Open

Regularly check that recipe flags are still valid #911

benoit74 opened this issue Jan 31, 2024 · 6 comments
Assignees

Comments

@benoit74
Copy link
Collaborator

Every once in a while, offliners definitions (list of flags & validation constraints) are updated to reflect changes in scrapers capabilities / usage. However, no check on existing recipes flags are performed, meaning they could be incorrect given the new offliners definitions

We should check:

  • does the flag still exists for given offliner?
  • is the flag value still valid?
@kelson42
Copy link
Contributor

kelson42 commented Jan 31, 2024

We don't need to check it "regularly" IMO, wee need to check it when we change a scraper cmd line interface. This is mostly a procedural problem. How can we secure that? I would propose that introduction of new version of scraper go through a precise list of checks. But maybe there is a better approach.

@benoit74
Copy link
Collaborator Author

You're right.

We could have a maintenance script which is ran after all offliners definitions update and reports inconsistencies in recipes.

And then fix inconsistencies manually either via manual SQL queries or manual recipe updates through the UI; I don't believe fixing them automatically is a good thing because there are many different situations to consider + we might want to not always fix all of them (there is only one offliner definition but this is in fact linked to the scraper version used, so in some situations we might want to continue to use an former scraper version and hence a former set of flags, like we currently have with zimit2 and zimit1 ... except the flags have not changed)

@kelson42
Copy link
Contributor

kelson42 commented Jan 31, 2024

How I use to implement this is:

  • Create a dedicated directory aiming to store the dataset updates
  • A dataset update is typically a check and (if valid) applies the change to the DB
  • This dataset updates scripts (over time you might have dozen of them) are run at each deployment

But I would first implement anyway the checklist to solve the problem with a process.

@benoit74
Copy link
Collaborator Author

Sorry, I don't get it. Which datasets are you speaking about? If you speak about offliners definitions, these are code, and there is no plan to move this to dataset (even in #886 we agreed we will move only the description to a data-oriented thing).

@kelson42
Copy link
Contributor

@benoit74 The dataset are the recipes here, maybe even tasks.

@benoit74
Copy link
Collaborator Author

We already have SQL update scripts, which are aimed at updating the DB schema and data. They are automatically applied at each application restart. They are ran inside a transaction, so should any of them fail the update stops.

I'm not sure this is a solution however because:

  • these update scripts are already possible but nothing in the process helps / forces us to create these update scripts
  • these update scripts would be very complex to write since we also want to check data validity, which is probably not an easy feat in a DB update script
  • these update scripts are meant to run only once, so we won't be able to check again that all recipes are valid if needed

Let's have a look at it once we work on this issue, maybe this is still sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants