-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unchanged source lost harvest_object_id after each reharvest #4362
Labels
bug
Software defect or bug
Comments
Closed
jbrown-xentity
moved this from 📔 Product Backlog
to 📟 Sprint Backlog [7]
in data.gov team board
Jun 16, 2023
jbrown-xentity
moved this from 📟 Sprint Backlog [7]
to 📔 Product Backlog
in data.gov team board
Jun 16, 2023
hkdctol
moved this from 📔 Product Backlog
to 📟 Sprint Backlog [7]
in data.gov team board
Aug 17, 2023
FuhuXia
moved this from 📟 Sprint Backlog [7]
to 🏗 In Progress [8]
in data.gov team board
Sep 11, 2023
The issue is due to ckanext-spatial's way of detecting IIS server is out of date. Will create a upstream PR. update: |
fixed verified by reharvesting /harvest/national-transportation-atlas-database-ntad-metadata. |
github-project-automation
bot
moved this from 👀 Needs Review [2]
to ✔ Done
in data.gov team board
Sep 25, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Found this issue when researching #4348.
For each record in the source, a reharvest will delete existing harvest_object_id and create a new harvest_object_id in DB, even the record has not changed. But it keeps the old harvest_object_id in the Solr. As a result, DB and Solr is out of sync after each reharvest. Before db-solr-sync run, the dataset page on the UI lost harvest_object_id.
example sources:
/harvest/national-transportation-atlas-database-ntad-metadata
/harvest/edit/fema-r02
How to reproduce
Using anyone of dataset in the source as example.
Locate the harvest_object_id in the API.
Reharvest
Now the dataset on the UI has lost its harvest source info.
reindex the package using ckan command
Now harvest_object_id in the API is updated. Now UI has harvest source info.
Reharvest
Now the dataset on the UI has lost its harvest source info.
Expected behavior
If a WAF record has no changes, gather should not create any fetch job.
Actual behavior
processing unchanged record and harvest_object_id is out of sync.
Sketch
[Notes or a checklist reflecting our understanding of the selected approach]
The text was updated successfully, but these errors were encountered: