A collection of Jupyter notebooks for working with data from:
- Kingfisher Process
- Data Registry
- Field lists, like from a field-level mappings
Note: If you encounter unfamiliar errors, try the Runtime > Disconnect and delete runtime menu item. If the error still occurs, please open an issue.
To use a notebook:
- Click the Open In Colab button
- Click the File > Save a copy in Drive menu item
- Make your changes (e.g.
collection_ids
,schema_name
, etc.)
If you make any improvements or fixes, please follow the Contributing guide below to merge your changes back into this repository.
You can also use a notebook without creating a copy. However, if you re-open the notebook, any changes and outputs will be lost.
Notebook | Open in Colab | Description |
---|---|---|
Publisher analysis template | Analyze data from a specific publisher. | |
Meta analysis template | Analyze data from multiple publishers, or to perform other types of analysis on the Kingfisher Process database. | |
Basic criteria feedback template | Provide feedback on the OCDS basic criteria. | |
Structure and format feedback template | Provide feedback on structure and format errors reported by lib-cove-ocds. | |
Data quality feedback template | Provide detailed feedback on structure, format, conformance and quality issues. | |
Usability checks template | Provide feedback on data usability for OCDS datasets. | |
Red flags checks template | Provide feedback on red flags for OCDS datasets. |
Notebook | Open in Colab | Description |
---|---|---|
Usability checks using a field list | Provide feedback on data usability for prospective OCDS publishers, using a field list, like from a field-level mapping. | |
Usability checks using the Data Registry | Provide feedback on data usability using data from the Data Registry. | |
Relevant checks using a field list | Provide feedback on data relevance for prospective publishers, using a field list, like from a field-level mapping. | |
Relevant checks using the Data Registry | Provide feedback on data relevance using data from the Data Registry. | |
Relevant checks for all the Data Registry publications | Provide feedback on data relevance downloading all the publications from the Data Registry. | |
Red flags checks using the Data Registry | Provide feedback on coverage for red flags using data from the Data Registry. | |
Red flags checks using a field list | Provide feedback on red flags for prospective OCDS publishers, using a field list, like from a field-level mapping. | |
Field list for all the Data Registry publications | Extract the fields published by all the publications from the Data Registry. |
To ease maintenance, the notebooks are made up of reusable components. To see which components are used in each notebook, refer to the NOTEBOOKS
variable in manage.py
.
Reminder: If you edit the Check structure and format or Check quality components and change the headings or add new sections, check whether the related Document template in this process note needs an update.
Component name | Open in Colab | Tasks |
---|---|---|
Environment | Install requirements, import packages, load extensions and configure the notebook. | |
Cardinal setup | Install Cardinal requirements, define coverage functions and calculate the field list for a given file. | |
Charts setup | Install charts requirements, import charts packages and define plot functions. | |
Kingfisher Process setup | Connect to the database. Choose the collection(s) and schema to work with. | |
Field list setup | Load the field list. | |
Data Registry download data setup | Define the functions to list publications and download JSONL files from the registry. | |
Data Registry download data | Define the forms to select a publication and year and download the selected publication. | |
Kingfisher Process errors | Check for data collection and processing errors. | |
Structure scope | Check how many releases and records your data contains. Check the date range and stages of the contracting process covered by your data. | |
Usability setup | Define the usability functions. | |
Red flags setup | Define the red flags functions. | |
Usability scope | Calculate general statistics. | |
Structure checks | Check for structure and format errors reported by lib-cove-ocds. | |
Conformance checks | Check against the OCDS conformance criteria. | |
Quality checks | Check for conformance and quality issues that require manual review. | |
Usability checks using Kingfisher with coverage | ||
Usability checks using a field list without coverage | ||
Relevant checks using a field list | Given a field list, check if the list pass the "relevant" criteria. | |
Relevant checks against all the publications from the Data Registry | Downloads all the publications from the registry and performs the "relevant" checks against the active ones. | |
Red flags checks using a field list without coverage |
Use the buttons above to open the components from the main
branch for editing in Google Colaboratory (Colab).
To open a component from a different branch, use Colab's GitHub browser.
To encourage reuse, limit the scope of a component. The current scopes are:
- Environment: Setup Google Colaboratory in general.
- Setup: Setup Google Colaboratory for a data source.
- Errors: Review any issues in loading the data.
- Scope: Understand the scope of the data.
- Check: Perform a category of checks.
- Create a new notebook
- Set a title using H2 formatting and add your cells, following the style guide for SQL statements.
- Open the component in Colab.
- Add or edit cells, following the style guide for SQL statements.
In Colab:
- Click Edit -> Clear all outputs.
- Click File -> Save a copy in GitHub.
- Uncheck 'Include a link to Colaboratory'
- Select your branch, enter a commit message and click OK.
- Add the component to the entry for the notebook in the
NOTEBOOKS
variable inmanage.py
.
- Add an entry for the the notebook and its components to the
NOTEBOOKS
variable inmanage.py
. - Update the 'Notebooks' section of
README.md
.
- Create a pull request.
- Request a review from a data support manager.
- If the reviewer requests changes, make the changes then repeat this step.
Once approved, you can merge your own changes.
For small changes, you can review the raw diff in the GitHub review interface.
For larger changes, you can review and comment on a visual diff by clicking the button. You need to authorize the app the first time you open it.
-
Install requirements:
pip install -r requirements.txt
-
Install the pre-commit script:
pre-commit install