Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Pelican with Cardinal in pipeline (and make detailed coverage available) #291

Open
jpmckinney opened this issue Jun 8, 2023 · 8 comments
Labels
enhancement New feature or request
Milestone

Comments

@jpmckinney
Copy link
Member

jpmckinney commented Jun 8, 2023

We are only using Pelican for field coverage, for which Cardinal is much faster.

We can store the output as part of the job, and make it available as part of the API in #268. We can also consider designing a report for the dataset's page, where a user can opt to view the detailed coverage.

We can then also use the output to either report:

  • fields related to priority use cases, e.g. gender and sustainability (SPP)
  • supported indicators (e.g. like in the usability notebooks)
@jpmckinney jpmckinney added the enhancement New feature or request label Jun 8, 2023
@jpmckinney
Copy link
Member Author

jpmckinney commented Jun 8, 2023

Edit: Moved tangential comment to #292

@jpmckinney jpmckinney added this to the Priority milestone Feb 8, 2024
@jpmckinney jpmckinney changed the title Replace Pelican with Cardinal in pipeline Replace Pelican with Cardinal in pipeline (and make detailed coverage available) Feb 20, 2024
@jpmckinney
Copy link
Member Author

jpmckinney commented Apr 9, 2024

From Pelican we get field counts and also some collection metadata. We can get the latter via an HTTP request to Kingfisher Process in the Process task's get_status method (once is_last_completed is true): open-contracting/kingfisher-process#421

@jpmckinney
Copy link
Member Author

jpmckinney commented Apr 19, 2024

  • Remove consume_exception=True from the request() call in in task/process.py, because it will no longer perform a presence check
  • Eliminate parse_date, as Kingfisher Process returns dates in a normal format

Copy link

sentry-io bot commented Oct 3, 2024

Sentry Issue: REGISTRY-PELICAN-FRONTEND-B

@jpmckinney
Copy link
Member Author

I linked a Sentry issue where a Pelican API request is quite slow on some collections (20s).

@jpmckinney
Copy link
Member Author

jpmckinney commented Nov 8, 2024

We can maybe also (semi-)automate the additional_data fields – at minimum checking whether "No extensions or additional fields are used." is still true. We can also check that a declared extension is actually used (we shouldn't report extensions unless the fields they define are used).

@jpmckinney
Copy link
Member Author

We can store the output as part of the job, and make it available as part of the API in #268

Once that's done, we can simplify the field list notebook: https://github.com/open-contracting/notebooks-ocds/blob/main/component_get_field_list_all_registry.ipynb

@yolile
Copy link
Member

yolile commented Dec 4, 2024

Once that's done, we can simplify the field list notebook

Or, add a button somewhere in the data registry to export the list of fields from all publications (if there is demand for that)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants