-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improve timing of checks depending on changes since last check #163
Conversation
715a695
to
739ef30
Compare
32ac865
to
87d2c52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what was the use of the temporary table, was it to make sure that it's threadsafe?
Other than a small nit, this looks good to me ;)
Also, didn't investigate why, but it seems you've got a failing test. |
# Conflicts: # CHANGELOG.md
776e189
to
463f229
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR! I like the logic!
I think we may want to test this PR in our dev env and monitor the next_check_at
values? Since it's quite difficult to test locally actually
Maybe we want to add some tests!
udata_hydra/crawl/select_batch.py
Outdated
AND catalog.last_check = checks.id | ||
AND checks.created_at <= $1 | ||
AND checks.next_check_at >= $1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually nice that this logic is kept simple with the next_check_at strategy! 👏
Thanks for the review. I wanted to confirm the logic first, now currently fixing/adding tests - in the same PR. |
5b1053d
to
50c7931
Compare
50c7931
to
d58d4af
Compare
7062507
to
2fd106a
Compare
2fd106a
to
9e0f626
Compare
commit a38bd48 Author: Adrien Carpentier <adrien.carpentier@mail.numerique.gouv.fr> Date: Mon Nov 25 11:31:06 2024 +0100 fix: handle Sentry issue for CSV downloading errors (#212) Attempt to handle numerous Sentry errors similar to #145755 in a more explicit and generic way: - Catch the exception when the CSV URL is not reachable and rename it more clearly, so that it can be more instantaneously understood as non major when browsing Sentry - New custom `IOException` using a new parent class `ExceptionWithSentryDetails` which sends details to Sentry as tags, also inherited by the recent custom `ParseException` - Move the `errors.py` file from `/analysis` to `/utils`, since it's more broadly used throughout the code commit 9b0c78e Author: Pierlou Ramade <48205215+Pierlou@users.noreply.github.com> Date: Fri Nov 22 16:55:58 2024 +0100 Use bump'X (#226)
commit 1decfdd Author: Pierlou Ramade <48205215+Pierlou@users.noreply.github.com> Date: Tue Nov 26 15:59:19 2024 +0100 Get actual resource URL if 404 (#225) Fix https://github.com/datagouv/data.gouv.fr/issues/1574 / #224 --------- Co-authored-by: Adrien Carpentier <adrien.carpentier@mail.numerique.gouv.fr>
d42bc3c
to
df50c84
Compare
df50c84
to
5a914c6
Compare
This reverts commit 5a914c6.
# Conflicts: # CHANGELOG.md # tests/test_crawl/test_crawl.py
This is a refactoring PR to remove many non necessaries SQL queries, using the existing data in the code instead of re-querying it. Needs to be merged after #163.
Closes datagouv/data.gouv.fr#1312.
Add
CHECK_DELAYS
config var: list of delays (̶p̶o̶s̶t̶g̶r̶e̶s̶q̶l̶ ̶i̶n̶t̶e̶r̶v̶a̶l̶ ̶s̶y̶n̶t̶a̶x̶)̶ between two checks, if the resource has not been modifiedAdd a migration to add a
next_check_at
datetime column to thechecks
tableAdd logic to calculate
next_check_at
datetime when processing the check, depending on the date f the previous check and if it has changed since then, in a dedicatedcalculate_next_check_date
methodModify logic for
select_batch
in order to select resources that have a related last check that have an expirednext check
dateRefactor
store_last_modified_date
method to also store the next check date, and rename it toupdate_check_with_modification_and_next_dates
Rename
process_check_data
topreprocess_check_data
for more clarity[OUTDATED] First iteration using a SQL query
̶-̶ ̶R̶e̶m̶o̶v̶e̶s̶ ̶S̶Q̶L̶ ̶q̶u̶e̶r̶y̶ ̶w̶r̶a̶p̶p̶e̶r̶ ̶
̶s̶e̶l̶e̶c̶t̶_̶r̶o̶w̶s̶_̶b̶a̶s̶e̶d̶_̶o̶n̶_̶q̶u̶e̶r̶y̶
̶ ̶w̶h̶i̶c̶h̶ ̶w̶a̶s̶ ̶u̶s̶e̶d̶ ̶t̶o̶ ̶c̶r̶e̶a̶t̶e̶ ̶a̶ ̶t̶e̶m̶p̶o̶r̶a̶r̶y̶ ̶t̶a̶b̶l̶e̶ ̶a̶n̶d̶ ̶t̶e̶m̶p̶o̶r̶a̶r̶i̶l̶y̶ ̶m̶a̶r̶k̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶b̶e̶i̶n̶g̶ ̶c̶r̶a̶w̶l̶e̶d̶ ̶a̶s̶ ̶"̶c̶r̶a̶w̶l̶i̶n̶g̶"̶.̶̶S̶i̶n̶c̶e̶ ̶w̶e̶ ̶u̶p̶d̶a̶t̶e̶ ̶t̶h̶e̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶ ̶s̶t̶a̶t̶u̶s̶ ̶i̶n̶ ̶t̶h̶e̶ ̶c̶h̶e̶c̶k̶ ̶c̶o̶d̶e̶ ̶e̶l̶s̶e̶w̶h̶e̶r̶e̶,̶ ̶i̶t̶ ̶s̶h̶o̶u̶l̶d̶ ̶n̶o̶t̶ ̶b̶e̶ ̶n̶e̶c̶e̶s̶s̶a̶r̶y̶ ̶a̶n̶y̶m̶o̶r̶e̶ ̶t̶o̶ ̶h̶a̶v̶e̶ ̶t̶h̶i̶s̶ ̶t̶e̶m̶p̶o̶r̶a̶r̶y̶ ̶t̶a̶b̶l̶e̶.̶
̶̶̶O̶n̶ ̶r̶e̶v̶i̶e̶w̶,̶ ̶p̶l̶e̶a̶s̶e̶ ̶d̶o̶u̶b̶l̶e̶-̶c̶h̶e̶c̶k̶ ̶t̶h̶e̶ ̶v̶a̶l̶i̶d̶i̶t̶y̶ ̶o̶f̶ ̶t̶h̶i̶s̶ ̶r̶e̶m̶o̶v̶a̶l̶̶̶
̶-̶ ̶A̶d̶d̶ ̶
̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶_̶D̶E̶F̶A̶U̶L̶T̶
̶ ̶c̶o̶n̶f̶i̶g̶ ̶v̶a̶r̶:̶ ̶t̶i̶m̶e̶ ̶a̶f̶t̶e̶r̶ ̶w̶h̶i̶c̶h̶ ̶a̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶ ̶m̶u̶s̶t̶ ̶b̶e̶ ̶c̶h̶e̶c̶k̶ ̶i̶f̶ ̶w̶e̶ ̶d̶o̶n̶'̶t̶ ̶k̶n̶o̶w̶ ̶a̶b̶o̶u̶t̶ ̶i̶t̶s̶ ̶l̶a̶s̶t̶ ̶m̶o̶d̶i̶f̶i̶c̶a̶t̶i̶o̶n̶ ̶d̶a̶t̶e̶ ̶(̶r̶e̶p̶l̶a̶c̶e̶s̶ ̶t̶h̶e̶ ̶p̶r̶e̶v̶i̶o̶u̶s̶ ̶̶S̶I̶N̶C̶E̶
̶ ̶c̶o̶n̶f̶i̶g̶ ̶v̶a̶r̶)̶̶-̶ ̶U̶p̶d̶a̶t̶e̶ ̶t̶h̶e̶ ̶
̶s̶e̶l̶e̶c̶t̶_̶b̶a̶t̶c̶h̶_̶r̶e̶s̶o̶u̶r̶c̶e̶s̶_̶t̶o̶_̶c̶h̶e̶c̶k̶
̶ ̶m̶e̶t̶h̶o̶d̶ ̶w̶i̶t̶h̶ ̶t̶h̶e̶ ̶f̶o̶l̶l̶o̶w̶i̶n̶g̶ ̶l̶o̶g̶i̶c̶:̶̶ ̶ ̶ ̶ ̶1̶.̶ ̶F̶i̶r̶s̶t̶ ̶a̶d̶d̶s̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶i̶t̶h̶ ̶p̶r̶i̶o̶r̶i̶t̶y̶=̶T̶r̶u̶e̶ ̶t̶o̶ ̶t̶h̶e̶ ̶b̶a̶t̶c̶h̶ ̶(̶_̶t̶h̶i̶s̶ ̶d̶o̶e̶s̶n̶'̶t̶ ̶c̶h̶a̶n̶g̶e̶_̶)̶
̶ ̶ ̶ ̶ ̶2̶.̶ ̶T̶h̶e̶n̶ ̶a̶d̶d̶s̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶i̶t̶h̶o̶u̶t̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶(̶=̶w̶i̶t̶h̶ ̶n̶o̶ ̶c̶h̶e̶c̶k̶,̶ ̶n̶e̶v̶e̶r̶ ̶h̶a̶v̶e̶ ̶b̶e̶e̶n̶ ̶c̶h̶e̶c̶k̶e̶d̶)̶ ̶t̶o̶ ̶t̶h̶e̶ ̶b̶a̶t̶c̶h̶ ̶(̶_̶t̶h̶i̶s̶ ̶d̶o̶e̶s̶n̶'̶t̶ ̶c̶h̶a̶n̶g̶e̶_̶)̶
̶ ̶ ̶ ̶ ̶3̶.̶ ̶T̶h̶e̶n̶,̶ ̶i̶f̶ ̶t̶h̶e̶ ̶t̶o̶t̶a̶l̶ ̶n̶u̶m̶b̶e̶r̶ ̶o̶f̶ ̶s̶e̶l̶e̶c̶t̶e̶d̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶i̶s̶ ̶s̶t̶i̶l̶l̶ ̶l̶e̶s̶s̶ ̶t̶h̶a̶n̶ ̶t̶h̶e̶ ̶b̶a̶t̶c̶h̶ ̶s̶i̶z̶e̶:̶
̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶3̶.̶1̶.̶ ̶a̶d̶d̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶h̶i̶c̶h̶ ̶d̶o̶n̶'̶t̶ ̶h̶a̶v̶e̶ ̶a̶t̶ ̶l̶e̶a̶s̶t̶ ̶t̶w̶o̶ ̶c̶h̶e̶c̶k̶s̶ ̶y̶e̶t̶,̶ ̶a̶n̶d̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶i̶s̶ ̶o̶l̶d̶e̶r̶ ̶t̶h̶a̶n̶ ̶
̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶_̶D̶E̶F̶A̶U̶L̶T̶
̶̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶3̶.̶2̶.̶ ̶a̶d̶d̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶h̶i̶c̶h̶ ̶a̶t̶ ̶l̶e̶a̶s̶t̶ ̶o̶n̶e̶ ̶t̶h̶e̶ ̶t̶h̶e̶i̶r̶ ̶t̶w̶o̶ ̶m̶o̶s̶t̶ ̶r̶e̶c̶e̶n̶t̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶s̶ ̶h̶a̶s̶ ̶a̶n̶ ̶u̶n̶k̶n̶o̶w̶n̶ ̶m̶o̶d̶i̶f̶i̶e̶d̶ ̶d̶a̶t̶e̶ ̶(̶n̶o̶ ̶
̶d̶e̶t̶e̶c̶t̶e̶d̶_̶l̶a̶s̶t̶_̶m̶o̶d̶i̶f̶i̶e̶d̶_̶a̶t̶
̶,̶ ̶s̶o̶ ̶w̶e̶ ̶c̶a̶n̶n̶o̶t̶ ̶c̶o̶m̶p̶a̶r̶e̶ ̶w̶i̶t̶h̶ ̶a̶n̶o̶t̶h̶e̶r̶ ̶c̶h̶e̶c̶k̶)̶,̶ ̶a̶n̶d̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶i̶s̶ ̶o̶l̶d̶e̶r̶ ̶t̶h̶a̶n̶ ̶̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶_̶D̶E̶F̶A̶U̶L̶T̶
̶̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶3̶.̶3̶.̶ ̶a̶d̶d̶ ̶r̶e̶s̶s̶o̶u̶r̶c̶e̶s̶ ̶t̶o̶ ̶w̶h̶i̶c̶h̶ ̶t̶h̶e̶i̶r̶ ̶t̶w̶o̶ ̶m̶o̶s̶t̶ ̶r̶e̶c̶e̶n̶t̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶s̶ ̶h̶a̶v̶e̶ ̶c̶h̶a̶n̶g̶e̶d̶,̶ ̶a̶n̶d̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶i̶s̶ ̶o̶l̶d̶e̶r̶ ̶t̶h̶a̶n̶ ̶
̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶S̶[̶0̶]̶
̶ ̶(̶t̶h̶i̶s̶ ̶i̶s̶ ̶t̶o̶ ̶r̶e̶-̶c̶h̶e̶c̶k̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶t̶h̶a̶t̶ ̶w̶e̶ ̶k̶n̶o̶w̶ ̶i̶t̶ ̶h̶a̶s̶ ̶c̶h̶a̶n̶g̶e̶d̶)̶̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶3̶.̶4̶.̶ ̶a̶d̶d̶ ̶r̶e̶s̶s̶o̶u̶r̶c̶e̶s̶ ̶t̶o̶ ̶w̶h̶i̶c̶h̶ ̶t̶h̶e̶i̶r̶ ̶l̶a̶s̶t̶ ̶t̶w̶o̶ ̶c̶h̶e̶c̶k̶s̶ ̶h̶a̶v̶e̶ ̶n̶o̶t̶ ̶c̶h̶a̶n̶g̶e̶d̶,̶ ̶t̶h̶e̶ ̶t̶w̶o̶ ̶c̶h̶e̶c̶k̶s̶ ̶h̶a̶v̶e̶ ̶d̶o̶n̶e̶ ̶b̶e̶t̶w̶e̶e̶n̶ ̶t̶w̶o̶ ̶d̶e̶l̶a̶y̶s̶ ̶i̶n̶ ̶
̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶S̶
̶,̶ ̶a̶n̶d̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶c̶h̶e̶c̶k̶ ̶i̶s̶ ̶o̶l̶d̶e̶r̶ ̶t̶h̶a̶n̶ ̶t̶h̶e̶ ̶s̶a̶m̶e̶ ̶d̶e̶l̶a̶y̶ ̶i̶n̶ ̶̶C̶H̶E̶C̶K̶_̶D̶E̶L̶A̶Y̶S̶
̶ ̶(̶t̶h̶i̶s̶ ̶i̶s̶ ̶i̶n̶ ̶o̶r̶d̶e̶r̶ ̶t̶o̶ ̶a̶v̶o̶i̶d̶ ̶c̶h̶e̶c̶k̶i̶n̶g̶ ̶t̶o̶o̶ ̶o̶f̶t̶e̶n̶ ̶t̶h̶e̶ ̶s̶a̶m̶e̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶ ̶w̶h̶i̶c̶h̶ ̶d̶o̶e̶s̶n̶'̶t̶ ̶c̶h̶a̶n̶g̶e̶)̶ ̶̶
̶-̶ ̶A̶d̶a̶p̶t̶ ̶t̶h̶e̶ ̶w̶a̶y̶ ̶w̶e̶ ̶c̶o̶u̶n̶t̶ ̶
̶p̶e̶n̶d̶i̶n̶g̶_̶c̶h̶e̶c̶k̶s̶
̶ ̶a̶n̶d̶ ̶̶f̶r̶e̶s̶h̶_̶c̶h̶e̶c̶k̶s̶
̶ ̶i̶n̶ ̶t̶h̶e̶ ̶̶/̶a̶p̶i̶/̶s̶t̶a̶t̶u̶s̶/̶c̶r̶a̶w̶l̶e̶r̶
̶ ̶e̶n̶d̶p̶o̶i̶n̶t̶ ̶r̶e̶s̶p̶o̶n̶s̶e̶:̶̶ ̶ ̶ ̶ ̶-̶ ̶
̶p̶e̶n̶d̶i̶n̶g̶_̶c̶h̶e̶c̶k̶s̶
̶ ̶a̶r̶e̶ ̶t̶h̶e̶ ̶n̶u̶m̶b̶e̶r̶ ̶o̶f̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶i̶t̶h̶ ̶n̶o̶ ̶c̶h̶e̶c̶k̶,̶ ̶p̶l̶u̶s̶ ̶t̶h̶e̶ ̶n̶u̶m̶b̶e̶r̶ ̶w̶i̶t̶h̶ ̶o̶u̶t̶d̶a̶t̶e̶d̶ ̶c̶h̶e̶c̶k̶s̶ ̶(̶o̶u̶t̶d̶a̶t̶e̶d̶ ̶c̶h̶e̶c̶k̶s̶ ̶a̶r̶e̶ ̶n̶o̶w̶ ̶c̶o̶u̶n̶t̶e̶d̶ ̶d̶i̶f̶f̶e̶r̶e̶n̶t̶l̶y̶,̶ ̶d̶e̶p̶e̶n̶d̶i̶n̶g̶ ̶o̶n̶ ̶t̶h̶e̶ ̶d̶e̶l̶a̶y̶s̶)̶̶ ̶ ̶ ̶ ̶ ̶-̶ ̶
̶f̶r̶e̶s̶h̶_̶c̶h̶e̶c̶k̶s̶
̶ ̶a̶r̶e̶ ̶t̶h̶e̶ ̶n̶u̶m̶b̶e̶r̶ ̶o̶f̶ ̶r̶e̶s̶o̶u̶r̶c̶e̶s̶ ̶w̶i̶t̶h̶ ̶a̶ ̶c̶h̶e̶c̶k̶,̶ ̶m̶i̶n̶u̶s̶ ̶t̶h̶e̶ ̶n̶u̶m̶b̶e̶r̶ ̶w̶i̶t̶h̶ ̶o̶u̶t̶d̶a̶t̶e̶d̶ ̶c̶h̶e̶c̶k̶s̶ ̶(̶o̶u̶t̶d̶a̶t̶e̶d̶ ̶c̶h̶e̶c̶k̶s̶ ̶a̶r̶e̶ ̶n̶o̶w̶ ̶c̶o̶u̶n̶t̶e̶d̶ ̶d̶i̶f̶f̶e̶r̶e̶n̶t̶l̶y̶,̶ ̶d̶e̶p̶e̶n̶d̶i̶n̶g̶ ̶o̶n̶ ̶t̶h̶e̶ ̶d̶e̶l̶a̶y̶s̶)̶