-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RHOAIENG-16587: fix(test): ensure papermill tests run successfully for all supported notebooks #834
base: main
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Alternative proposalThe package versions we should have in our images are the ones given in the docs, which is at https://access.redhat.com/articles/rhoai-supported-configs (section "Supported workbench images"). This table is generated (or written manually, before red-hat-data-services#476), based on our ImageStream manifests at https://github.com/opendatahub-io/notebooks/tree/main/manifests/base (and https://github.com/red-hat-data-services/notebooks/tree/main/manifests/base for RHOAI)
We should not duplicate that information in the test ipython notebook files. The tests should check that what's in the image corresponds to what's given in the manifests. |
So, given today in our tests we are duplicating this information... do you want me to address your suggestion in this PR or a follow up PR? For emphasis - I wholly agree with your proposal - I've just scope-creeped the heck outta this present PR already 😎 |
We did not do enough PRs of this kind for me to discern a pattern, but on my previous team, there was preference for moving in small steps, e.g. first move script embedded in Makefile to it's own file unchanged, then modify said script. Guess this is largely a mater of personal taste. And also of review speed, it's hard to split things up when reviews for the parts are not forthcoming. |
b4be87f
to
f69e4f5
Compare
/test all |
I got to this for a brief review. I agree with Jiri to re-use what we define in our manifest files in imagestreams for the testing purposes instead of duplicating - but yeah, let's do it step by step as you wrote above. So, I am happy to see these tests being extracted from the Makefile making it a bit more readable. What I'm not a big fan here is that we're creating yet another bash script for this. I think we should utilize the existing pytest framework with the testcontainers for as much tests as possible so that we can run these in a unified environment. Anyway, since this aims only to extract these out of Makefile and to keep this compatible with the existing approach, I'm not against this in general right now. But to the future, I think we should think to migrate this again - this time to the pytest where most of our tests will be, hopefully. |
I'm with Jan, pulling the bash code out of Makefile is a good improvement in itself. |
ee21ef4
to
5dd5618
Compare
…notebooks Our testing framework had a fundamental issue in that it would, for certain images, naively run the test suite of parent images for a child image. In the case of `tensorflow` images, this caused problems because the (child) `tensorflow` image would **purposefully** _downgraded_ a package version given compatability issues. Furthermore, when attempting to verify the changes to address this core issue, numerous other bugs were encountered that required to be fixed. Significant changes introduced in this commit: - logic of the `test-%` Makefile target was moved into a shell script named `test_jupyter_with_papermill.sh` - test resources required to be copied to pod are now retrieved via the locally checked out repo - previously there were pulled from a remote branch via `wget` (defaulting to `main` branch) - this ensure our PR checks are now always leveraging any updated files - test_notebook.ipynb files now expect an `expected_versions.json` file to exist within the same directory. - `expected_versions.json` file is derived from the relevant `...-notebook-imagestream.yaml` manifest and leveraged when asserting on dependency versions - admittedly the duplication of various helper functions across all the notebook files is annoying - but helps to keep the `.ipynb` files self-contained - `...-notebook-imagestream.yaml` manifest had annotations updated to include any package that had a unit test in `test_notebook.ipynb` asserting versions - CUDA tensorflow unit test that converts to ONNX was updated to be actually functional Minor changes introduced in this commit: - use `ifdef OS` in Makefile to avoid warnings about undefined variable - all `test_notebook.ipynb` files: - have an `id` attribute defined in metadata - specify the `nbformat` as `4.5` - the more compute-intensive notebooks had `readinessProbe` and `livenessProbe` settings updated to be less aggressive - was observing liveness checks sporadically failing while the notebook was being tested - and this update seemed to help - `trustyai` notebook now runs the minimal and datascience papermill tests (similar to `tensorflow` and `pytorch`) instead of including the test code within its own `test_noteook.ipynb` file - various "quality of life" improvements where introduced into `test_jupyter_with_papermill.sh` - properly invoke tests for any valid/supported target - previously certain test targets required manual intervention in order to run end to end - improved logging (particularly when running with the `-x` flag) - more modular structure to hopefully improve readability - script now honors proper shell return code semantics (i.e. returns `0` on success) It should also be noted that while most `intel` notebooks are now passing the papermill tests - there are issues with the `intel` `tensorflow` unit tests still. More detail is captured in the JIRA ticket related to this commit. However, we are imminently removing `intel` logic from the `notebooks` repo entirely... so I didn't wanna burn any more time trying to get that last notebook to pass as it will be removed shortly! Further details on `expected_versions.json`: - `yq` (which is installed via the `Makefile` is used to: - query the relevant imagestream manifest to parse out the `["opendatahub.io/notebook-software"]` and `["opendatahub.io/notebook-python-dependencies"]` annotations from the first element of the `spec.tags` attribute - inject name/version elements for `nbdime` and `nbgitpuller` (which are asserted against in `minimal` notebook tests) - convert this `yaml` list to a JSON object of the form: `{<package>: <version>}` - this JSON object is then copied into the running notebook workload in the same directly that `test_notebook.ipynb` resides - each `test_notebook.ipynb` has a couple helper functions defined to then interact with this file: - `def load_expected_versions() -> dict:` - `def get_expected_version(dependency_name: str) -> str: - The argument provided to the `get_expected_version` function should match the `name` attribute of the JSON structure defined in the imagestream manifest Related-to: https://issues.redhat.com/browse/RHOAIENG-16587
5dd5618
to
7388f35
Compare
/test notebooks-ubi9-e2e-tests (weird GCP errors on building the cluster - hopefully transient 🤞 ) |
/test notebook-rocm-jupyter-tf-ubi9-python-3-11-pr-image-mirror notebook-rocm-jupyter-pyt-ubi9-python-3-11-pr-image-mirror |
@andyatmiami: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
initialDelaySeconds: 15 | ||
periodSeconds: 10 | ||
timeoutSeconds: 5 | ||
successThreshold: 1 | ||
failureThreshold: 3 | ||
readinessProbe: | ||
httpGet: | ||
path: /notebook/opendatahub/jovyan/api | ||
port: notebook-port | ||
scheme: HTTP | ||
initialDelaySeconds: 10 | ||
periodSeconds: 5 | ||
initialDelaySeconds: 15 | ||
periodSeconds: 10 | ||
timeoutSeconds: 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in production, these timeouts come from dashboard; maybe we should use what dashboard sets? or possibly be stricter; and not more generous here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dashboard also takes a user-supplied configured value for resources
:
And at some level.. the amount of compute resources available to the workload are going to influence the liveness/readiness probe performance..
I don't personally believe its a good apples-to-apples comparison..
That being said - I didn't spend time determining which "lever" of the probes was the problem - I suspect its actually the timeoutSeconds
- as compute heavy workloads like TF, PYT, and Intel-ML (whatever that is 😎 ) do a lot of initial churning when firing up...
timeoutSeconds: 1
is the default behavior.. so I fear following your recommendation would simply "revert" my change.. and I have plenty of evidence from running these tests to know without any changes.. these workloads WILL DIE leading to flaky tests..
However, if the desire is to keep these properties as close to Dashboard as possible - I can spend a little more time profiling to perhaps land on something like timeoutSeconds: 3 | 4
(with all other values unchanged) that will keep things consistently passing... I do think some deviation from the Dashboard probe settings is unavoidable though (just to set expectations)
nbdime_version=${nbdime_version} nbgitpuller_version=${nbgitpuller_version} "${yqbin}" '. + [{"name": "nbdime", "version": strenv(nbdime_version)},{"name": "nbgitpuller", "version": strenv(nbgitpuller_version)}]' | | ||
"${yqbin}" -N -o json '[ .[] | (.name | key) = "key" | (.version | key) = "value" ] | from_entries') | ||
|
||
# Following disabled schellcheck intentional as the intended behavior is for those ${1}, ${2} variables to only be expanded when running within kubernetes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Following disabled schellcheck intentional as the intended behavior is for those ${1}, ${2} variables to only be expanded when running within kubernetes | |
# Following disabled shellcheck intentional as the intended behavior is for those ${1}, ${2} variables to only be expanded when running within kubernetes |
Description
Our testing framework had a fundamental issue in that it would, for certain images, naively run the test suite of parent images for a child image. In the case of
tensorflow
images, this caused problems because the (child)tensorflow
image would purposefully downgraded a package version given compatability issues. Furthermore, when attempting to verify the changes to address this core issue, numerous other bugs were encountered that required to be fixed.Significant changes introduced in this commit:
test-%
Makefile target was moved into a shell script namedtest_jupyter_with_papermill.sh
wget
(defaulting tomain
branch)expected_versions.json
file to exist within the same directory.expected_versions.json
file is derived from the relevant...-notebook-imagestream.yaml
manifest and leveraged when asserting on dependency versions.ipynb
files self-contained...-notebook-imagestream.yaml
manifest had annotations updated to include any package that had a unit test intest_notebook.ipynb
asserting versionsMinor changes introduced in this commit:
ifdef OS
in Makefile to avoid warnings about undefined variabletest_notebook.ipynb
files:id
attribute defined in metadatanbformat
as4.5
readinessProbe
andlivenessProbe
settings updated to be less aggressivetrustyai
notebook now runs the minimal and datascience papermill tests (similar totensorflow
andpytorch
) instead of including the test code within its owntest_noteook.ipynb
filetest_jupyter_with_papermill.sh
-x
flag)0
on success)It should also be noted that while most
intel
notebooks are now passing the papermill tests - there are issues with theintel
tensorflow
unit tests still. More detail is captured in the JIRA ticket related to this commit. However, we are imminently removingintel
logic from thenotebooks
repo entirely... so I didn't wanna burn any more time trying to get that last notebook to pass as it will be removed shortly!Further details on
expected_versions.json
:yq
(which is installed via theMakefile
is used to:["opendatahub.io/notebook-software"]
and["opendatahub.io/notebook-python-dependencies"]
annotations from the first element of thespec.tags
attributenbdime
andnbgitpuller
(which are asserted against inminimal
notebook tests)yaml
list to a JSON object of the form:{<package>: <version>}
test_notebook.ipynb
residestest_notebook.ipynb
has a couple helper functions defined to then interact with this file:def load_expected_versions() -> dict:
def get_expected_version(dependency_name: str) -> str:
get_expected_version
function should match thename
attribute of the JSON structure defined in the imagestream manifestRelated-to: https://issues.redhat.com/browse/RHOAIENG-16587
How Has This Been Tested?
Testing can be performed simply by invoking
make
with a target that matches thetest-%
pattern-matching rule.ex:
gmake test-jupyter-pytorch-ubi9-python-3.11 -e NOTEBOOK_REPO_BRANCH_BASE=https://raw.githubusercontent.com/andyatmiami/notebooks/fix-e2e-asserts
for basic sanity checking. However, all known/supported/expectedtest-%
patterns should be verified.I have also created a utility in https://gitlab.cee.redhat.com/astonebe/notebook-utils that can be used to more efficiently test notebooks "in bulk". While it still runs sequentially, it handles deploying, testing, and undeploying a given list of targets.
Merge criteria: