Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dev_tools/check_notebooks.py #244

Merged
merged 3 commits into from
Dec 15, 2021

Conversation

mpharrigan
Copy link
Collaborator

Add a script to run full notebook checks locally.

Don't we already have notebook checks?

Short: not really. Most of the notebooks get "tested" during doc building, rendering, and pushing which happens outside of GitHub. For some reason I don't get the emails when these fail but usually @MichaelBroughton will ping me.

A small portion of notebooks are too long-running, so their outputs are saved and the notebooks are not executed during doc building and are untested.

Some notebooks were written with one data generation notebook and one data consumption (analysis) notebook. The original doc build system supported this. The current doc build system requires independent notebooks. Some domains have been ported to a regime where a fetch function will download the example dataset from figshare for the analysis notebooks. This hasn't been done for all domains, so there are output-saved notebooks that are not tested.

The original notebook tests for ReCirq were done as part of the CI but coupled to the sphinx build system, which was removed cc #198.

How good of a test is this?

Short: good, but expensive. Since we want to support the flow where a user has launched an individual notebook into a vanilla colab and have everything work, the checks in this script are very expensive and realistic. It will create a new venv virtual environment for each. As such, running the full thing is expensive.

Is this part of the CI?

Short: no. This is an expensive but fully automated and complete end-to-end tests of the notebooks to save me the trouble of going through and manually checking colabs when e.g. I do a Cirq bump. The script in its current form is designed for manual execution.

To have automated checks as part of the CI, we'd likely have to do something fancy like Cirq where we only do isolated env tests in particular circumstances and/or only check changed notebooks.

But this is a good starting point for a CI notebook check.

Change-Id: I578c7a45d1946d89b8de1ca15d3353ccc8578f76
Copy link
Collaborator

@dstrain115 dstrain115 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I like the approach.

"""Helper function to actually do the work in `check_notebook`.

`check_notebook` has all the context managers and exception handling,
which would otherwise result in highly indented code.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Args to docstring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

'docs/hfvqe/molecular_data.ipynb',
'docs/hfvqe/quickstart.ipynb',
'docs/fermi_hubbard/publication_results.ipynb',
'docs/fermi_hubbard/experiment_example.ipynb',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes. Can we instead search for *.ipynb and have exceptions if needed rather than explicitly list each notebook?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost preemptively wrote a justification (excuse) for this.

Since this is so expensive, I found it very useful to be able to have some control over which notebooks are being tested. A full solution would use e.g. command line arguments to choose either a list of specific notebooks or glob for all the notebooks. The latter case would need exceptions as noted and to be careful about .ipynb_checkpoints and/or notebooks not added to git.

I figured this all could be done in follow up work, especially if and when this script is modified to run automatically. Especially since automated tests would need lots of control over which notebooks are being tested (e.g. "test only modified ones")

Change-Id: I07c189f27b2c7c2adcf184e56437abab2e8eafe8
Change-Id: I4308d149b9f088f565c68aad304ad6ee420e976b
@mpharrigan mpharrigan merged commit dc52158 into quantumlib:master Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants