Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unchanged source lost harvest_object_id after each reharvest #4362

Closed
FuhuXia opened this issue Jun 15, 2023 · 2 comments
Closed

unchanged source lost harvest_object_id after each reharvest #4362

FuhuXia opened this issue Jun 15, 2023 · 2 comments
Assignees
Labels
bug Software defect or bug

Comments

@FuhuXia
Copy link
Member

FuhuXia commented Jun 15, 2023

Found this issue when researching #4348.

For each record in the source, a reharvest will delete existing harvest_object_id and create a new harvest_object_id in DB, even the record has not changed. But it keeps the old harvest_object_id in the Solr. As a result, DB and Solr is out of sync after each reharvest. Before db-solr-sync run, the dataset page on the UI lost harvest_object_id.

example sources:
/harvest/national-transportation-atlas-database-ntad-metadata
/harvest/edit/fema-r02

How to reproduce

Using anyone of dataset in the source as example.
Locate the harvest_object_id in the API.
Reharvest
Now the dataset on the UI has lost its harvest source info.
reindex the package using ckan command
Now harvest_object_id in the API is updated. Now UI has harvest source info.
Reharvest
Now the dataset on the UI has lost its harvest source info.

Expected behavior

If a WAF record has no changes, gather should not create any fetch job.

Actual behavior

processing unchanged record and harvest_object_id is out of sync.

Sketch

[Notes or a checklist reflecting our understanding of the selected approach]

@FuhuXia FuhuXia added the bug Software defect or bug label Jun 15, 2023
@hkdctol hkdctol moved this to 📔 Product Backlog in data.gov team board Jun 15, 2023
@jbrown-xentity jbrown-xentity moved this from 📔 Product Backlog to 📟 Sprint Backlog [7] in data.gov team board Jun 16, 2023
@jbrown-xentity jbrown-xentity moved this from 📟 Sprint Backlog [7] to 📔 Product Backlog in data.gov team board Jun 16, 2023
@hkdctol hkdctol moved this from 📔 Product Backlog to 📟 Sprint Backlog [7] in data.gov team board Aug 17, 2023
@FuhuXia FuhuXia moved this from 📟 Sprint Backlog [7] to 🏗 In Progress [8] in data.gov team board Sep 11, 2023
@FuhuXia
Copy link
Member Author

FuhuXia commented Sep 22, 2023

The issue is due to ckanext-spatial's way of detecting IIS server is out of date. Will create a upstream PR.

update:
Upstream PR created.

@FuhuXia FuhuXia moved this from 🏗 In Progress [8] to 👀 Needs Review [2] in data.gov team board Sep 22, 2023
@FuhuXia
Copy link
Member Author

FuhuXia commented Sep 25, 2023

fixed verified by reharvesting /harvest/national-transportation-atlas-database-ntad-metadata.

@FuhuXia FuhuXia closed this as completed Sep 25, 2023
@github-project-automation github-project-automation bot moved this from 👀 Needs Review [2] to ✔ Done in data.gov team board Sep 25, 2023
@hkdctol hkdctol moved this from ✔ Done to 🗄 Closed in data.gov team board Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Software defect or bug
Projects
Archived in project
Development

No branches or pull requests

1 participant