RHOAIENG-16587: fix(test): ensure papermill tests run successfully for all supported notebooks #834

andyatmiami · 2025-01-12T19:07:57Z

Description

Our testing framework had a fundamental issue in that it would, for certain images, naively run the test suite of parent images for a child image. In the case of tensorflow images, this caused problems because the (child) tensorflow image would purposefully downgraded a package version given compatability issues. Furthermore, when attempting to verify the changes to address this core issue, numerous other bugs were encountered that required to be fixed.

Significant changes introduced in this commit:

logic of the test-% Makefile target was moved into a shell script named test_jupyter_with_papermill.sh
test resources required to be copied to pod are now retrieved via the locally checked out repo
- previously there were pulled from a remote branch via wget (defaulting to main branch)
- this ensure our PR checks are now always leveraging any updated files
test_notebook.ipynb files now expect an expected_versions.json file to exist within the same directory.
- expected_versions.json file is derived from the relevant ...-notebook-imagestream.yaml manifest and leveraged when asserting on dependency versions
- admittedly the duplication of various helper functions across all the notebook files is annoying - but helps to keep the .ipynb files self-contained
...-notebook-imagestream.yaml manifest had annotations updated to include any package that had a unit test in test_notebook.ipynb asserting versions
CUDA tensorflow unit test that converts to ONNX was updated to be actually functional

Minor changes introduced in this commit:

use ifdef OS in Makefile to avoid warnings about undefined variable
all test_notebook.ipynb files:
- have an id attribute defined in metadata
- specify the nbformat as 4.5
the more compute-intensive notebooks had readinessProbe and livenessProbe settings updated to be less aggressive
- was observing liveness checks sporadically failing while the notebook was being tested - and this update seemed to help
trustyai notebook now runs the minimal and datascience papermill tests (similar to tensorflow and pytorch) instead of including the test code within its own test_noteook.ipynb file
various "quality of life" improvements where introduced into test_jupyter_with_papermill.sh
- properly invoke tests for any valid/supported target
  - previously certain test targets required manual intervention in order to run end to end
- improved logging (particularly when running with the -x flag)
- more modular structure to hopefully improve readability
- script now honors proper shell return code semantics (i.e. returns 0 on success)

It should also be noted that while most intel notebooks are now passing the papermill tests - there are issues with the intel tensorflow unit tests still. More detail is captured in the JIRA ticket related to this commit. However, we are imminently removing intel logic from the notebooks repo entirely... so I didn't wanna burn any more time trying to get that last notebook to pass as it will be removed shortly!

Further details on expected_versions.json:

yq (which is installed via the Makefile is used to:
- query the relevant imagestream manifest to parse out the ["opendatahub.io/notebook-software"] and ["opendatahub.io/notebook-python-dependencies"] annotations from the first element of the spec.tags attribute
- inject name/version elements for nbdime and nbgitpuller (which are asserted against in minimal notebook tests)
- convert this yaml list to a JSON object of the form: {<package>: <version>}
this JSON object is then copied into the running notebook workload in the same directly that test_notebook.ipynb resides
each test_notebook.ipynb has a couple helper functions defined to then interact with this file:
- def load_expected_versions() -> dict:
- def get_expected_version(dependency_name: str) -> str:
  - The argument provided to the get_expected_version function should match the name attribute of the JSON structure defined in the imagestream manifest

Related-to: https://issues.redhat.com/browse/RHOAIENG-16587

How Has This Been Tested?

Testing can be performed simply by invoking make with a target that matches the test-% pattern-matching rule.

ex: gmake test-jupyter-pytorch-ubi9-python-3.11 -e NOTEBOOK_REPO_BRANCH_BASE=https://raw.githubusercontent.com/andyatmiami/notebooks/fix-e2e-asserts for basic sanity checking. However, all known/supported/expected test-% patterns should be verified.

I have also created a utility in https://gitlab.cee.redhat.com/astonebe/notebook-utils that can be used to more efficiently test notebooks "in bulk". While it still runs sequentially, it handles deploying, testing, and undeploying a given list of targets.

I used this utility script to test all notebooks that have a corresponding imagestream manifest.

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

openshift-ci · 2025-01-12T19:08:01Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2025-01-12T19:08:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from andyatmiami. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jiridanek · 2025-01-12T19:51:29Z

Alternative proposal

The package versions we should have in our images are the ones given in the docs, which is at https://access.redhat.com/articles/rhoai-supported-configs (section "Supported workbench images"). This table is generated (or written manually, before red-hat-data-services#476), based on our ImageStream manifests at https://github.com/opendatahub-io/notebooks/tree/main/manifests/base (and https://github.com/red-hat-data-services/notebooks/tree/main/manifests/base for RHOAI)

notebooks/manifests/base/jupyter-rocm-pytorch-notebook-imagestream.yaml

Lines 27 to 33 in 8a24d01

    
                   opendatahub.io/notebook-python-dependencies: | 
        
                     [ 
        
                       {"name": "ROCm-PyTorch", "version": "2.4"}, 
        
                       {"name": "Tensorboard", "version": "2.16"}, 
        
                       {"name": "Kafka-Python-ng", "version": "2.2"}, 
        
                       {"name": "Matplotlib", "version": "3.9"}, 
        
                       {"name": "Numpy", "version": "2.1"},

We should not duplicate that information in the test ipython notebook files. The tests should check that what's in the image corresponds to what's given in the manifests.

andyatmiami · 2025-01-12T22:15:31Z

So, given today in our tests we are duplicating this information...

do you want me to address your suggestion in this PR or a follow up PR?

For emphasis - I wholly agree with your proposal - I've just scope-creeped the heck outta this present PR already 😎

jiridanek · 2025-01-13T05:44:10Z

do you want me to address your suggestion in this PR or a follow up PR?

We did not do enough PRs of this kind for me to discern a pattern, but on my previous team, there was preference for moving in small steps, e.g. first move script embedded in Makefile to it's own file unchanged, then modify said script.

Guess this is largely a mater of personal taste. And also of review speed, it's hard to split things up when reviews for the parts are not forthcoming.

andyatmiami · 2025-01-14T17:31:23Z

/test all

jstourac · 2025-01-15T12:53:49Z

I got to this for a brief review. I agree with Jiri to re-use what we define in our manifest files in imagestreams for the testing purposes instead of duplicating - but yeah, let's do it step by step as you wrote above. So, I am happy to see these tests being extracted from the Makefile making it a bit more readable. What I'm not a big fan here is that we're creating yet another bash script for this. I think we should utilize the existing pytest framework with the testcontainers for as much tests as possible so that we can run these in a unified environment.

Anyway, since this aims only to extract these out of Makefile and to keep this compatible with the existing approach, I'm not against this in general right now. But to the future, I think we should think to migrate this again - this time to the pytest where most of our tests will be, hopefully.

jiridanek · 2025-01-15T12:55:39Z

I'm with Jan, pulling the bash code out of Makefile is a good improvement in itself.

…notebooks Our testing framework had a fundamental issue in that it would, for certain images, naively run the test suite of parent images for a child image. In the case of `tensorflow` images, this caused problems because the (child) `tensorflow` image would **purposefully** _downgraded_ a package version given compatability issues. Furthermore, when attempting to verify the changes to address this core issue, numerous other bugs were encountered that required to be fixed. Significant changes introduced in this commit: - logic of the `test-%` Makefile target was moved into a shell script named `test_jupyter_with_papermill.sh` - test resources required to be copied to pod are now retrieved via the locally checked out repo - previously there were pulled from a remote branch via `wget` (defaulting to `main` branch) - this ensure our PR checks are now always leveraging any updated files - test_notebook.ipynb files now expect an `expected_versions.json` file to exist within the same directory. - `expected_versions.json` file is derived from the relevant `...-notebook-imagestream.yaml` manifest and leveraged when asserting on dependency versions - admittedly the duplication of various helper functions across all the notebook files is annoying - but helps to keep the `.ipynb` files self-contained - `...-notebook-imagestream.yaml` manifest had annotations updated to include any package that had a unit test in `test_notebook.ipynb` asserting versions - CUDA tensorflow unit test that converts to ONNX was updated to be actually functional Minor changes introduced in this commit: - use `ifdef OS` in Makefile to avoid warnings about undefined variable - all `test_notebook.ipynb` files: - have an `id` attribute defined in metadata - specify the `nbformat` as `4.5` - the more compute-intensive notebooks had `readinessProbe` and `livenessProbe` settings updated to be less aggressive - was observing liveness checks sporadically failing while the notebook was being tested - and this update seemed to help - `trustyai` notebook now runs the minimal and datascience papermill tests (similar to `tensorflow` and `pytorch`) instead of including the test code within its own `test_noteook.ipynb` file - various "quality of life" improvements where introduced into `test_jupyter_with_papermill.sh` - properly invoke tests for any valid/supported target - previously certain test targets required manual intervention in order to run end to end - improved logging (particularly when running with the `-x` flag) - more modular structure to hopefully improve readability - script now honors proper shell return code semantics (i.e. returns `0` on success) It should also be noted that while most `intel` notebooks are now passing the papermill tests - there are issues with the `intel` `tensorflow` unit tests still. More detail is captured in the JIRA ticket related to this commit. However, we are imminently removing `intel` logic from the `notebooks` repo entirely... so I didn't wanna burn any more time trying to get that last notebook to pass as it will be removed shortly! Further details on `expected_versions.json`: - `yq` (which is installed via the `Makefile` is used to: - query the relevant imagestream manifest to parse out the `["opendatahub.io/notebook-software"]` and `["opendatahub.io/notebook-python-dependencies"]` annotations from the first element of the `spec.tags` attribute - inject name/version elements for `nbdime` and `nbgitpuller` (which are asserted against in `minimal` notebook tests) - convert this `yaml` list to a JSON object of the form: `{<package>: <version>}` - this JSON object is then copied into the running notebook workload in the same directly that `test_notebook.ipynb` resides - each `test_notebook.ipynb` has a couple helper functions defined to then interact with this file: - `def load_expected_versions() -> dict:` - `def get_expected_version(dependency_name: str) -> str: - The argument provided to the `get_expected_version` function should match the `name` attribute of the JSON structure defined in the imagestream manifest Related-to: https://issues.redhat.com/browse/RHOAIENG-16587

andyatmiami · 2025-01-17T22:32:37Z

/test notebooks-ubi9-e2e-tests

(weird GCP errors on building the cluster - hopefully transient 🤞 )

andyatmiami · 2025-01-17T23:17:22Z

PASSING TESTS!

andyatmiami · 2025-01-17T23:17:52Z

/test notebook-rocm-jupyter-tf-ubi9-python-3-11-pr-image-mirror notebook-rocm-jupyter-pyt-ubi9-python-3-11-pr-image-mirror

openshift-ci · 2025-01-17T23:58:13Z

@andyatmiami: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/rocm-notebooks-e2e-tests	`7388f35`	link	true	`/test rocm-notebooks-e2e-tests`
ci/prow/notebooks-ubi9-e2e-tests	`7388f35`	link	true	`/test notebooks-ubi9-e2e-tests`
ci/prow/intel-notebooks-e2e-tests	`7388f35`	link	true	`/test intel-notebooks-e2e-tests`
ci/prow/images	`7388f35`	link	true	`/test images`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

jiridanek · 2025-01-18T09:27:34Z

jupyter/intel/ml/ubi9-python-3.11/kustomize/base/statefulset.yaml

+            initialDelaySeconds: 15
+            periodSeconds: 10
+            timeoutSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /notebook/opendatahub/jovyan/api
              port: notebook-port
              scheme: HTTP
-            initialDelaySeconds: 10
-            periodSeconds: 5
+            initialDelaySeconds: 15
+            periodSeconds: 10
+            timeoutSeconds: 5


in production, these timeouts come from dashboard; maybe we should use what dashboard sets? or possibly be stricter; and not more generous here

https://github.com/opendatahub-io/odh-dashboard/blob/440d8fef243d9ae78ed5a3a4c51002d655521cb6/frontend/src/api/k8s/notebooks.ts#L160-L176

Dashboard also takes a user-supplied configured value for resources:

https://github.com/opendatahub-io/odh-dashboard/blob/440d8fef243d9ae78ed5a3a4c51002d655521cb6/frontend/src/api/k8s/notebooks.ts#L150

And at some level.. the amount of compute resources available to the workload are going to influence the liveness/readiness probe performance..

I don't personally believe its a good apples-to-apples comparison..

That being said - I didn't spend time determining which "lever" of the probes was the problem - I suspect its actually the timeoutSeconds - as compute heavy workloads like TF, PYT, and Intel-ML (whatever that is 😎 ) do a lot of initial churning when firing up...

timeoutSeconds: 1 is the default behavior.. so I fear following your recommendation would simply "revert" my change.. and I have plenty of evidence from running these tests to know without any changes.. these workloads WILL DIE leading to flaky tests..

However, if the desire is to keep these properties as close to Dashboard as possible - I can spend a little more time profiling to perhaps land on something like timeoutSeconds: 3 | 4 (with all other values unchanged) that will keep things consistently passing... I do think some deviation from the Dashboard probe settings is unavoidable though (just to set expectations)

jiridanek · 2025-01-18T09:31:18Z

scripts/test_jupyter_with_papermill.sh

+        nbdime_version=${nbdime_version} nbgitpuller_version=${nbgitpuller_version} "${yqbin}" '. + [{"name": "nbdime", "version": strenv(nbdime_version)},{"name": "nbgitpuller", "version": strenv(nbgitpuller_version)}]' |
+        "${yqbin}" -N -o json '[ .[] | (.name | key) = "key" | (.version | key) = "value" ] | from_entries')
+
+    # Following disabled schellcheck intentional as the intended behavior is for those ${1}, ${2} variables to only be expanded when running within kubernetes


Suggested change

# Following disabled schellcheck intentional as the intended behavior is for those ${1}, ${2} variables to only be expanded when running within kubernetes

# Following disabled shellcheck intentional as the intended behavior is for those ${1}, ${2} variables to only be expanded when running within kubernetes

openshift-ci bot added the do-not-merge/work-in-progress label Jan 12, 2025

andyatmiami force-pushed the fix-e2e-asserts branch from b4be87f to f69e4f5 Compare January 13, 2025 15:32

andyatmiami mentioned this pull request Jan 17, 2025

RHOAIENG-8388: rm(intel): Intel tensorflow notebook failed to get tested on OCP-CI so broken intel notebooks should be deleted #843

Open

3 tasks

andyatmiami force-pushed the fix-e2e-asserts branch 2 times, most recently from ee21ef4 to 5dd5618 Compare January 17, 2025 19:51

andyatmiami changed the title ~~fix(test): provide ability to override versions used in asserts during papermill tests~~ fix(test): ensure papermill tests run successfully for all supported notebooks Jan 17, 2025

andyatmiami changed the title ~~fix(test): ensure papermill tests run successfully for all supported notebooks~~ RHOAIENG-16587: fix(test): ensure papermill tests run successfully for all supported notebooks Jan 17, 2025

andyatmiami force-pushed the fix-e2e-asserts branch from 5dd5618 to 7388f35 Compare January 17, 2025 19:57

andyatmiami marked this pull request as ready for review January 17, 2025 19:57

openshift-ci bot removed the do-not-merge/work-in-progress label Jan 17, 2025

openshift-ci bot requested review from dibryant and jiridanek January 17, 2025 19:58

opendatahub-io deleted a comment from openshift-ci bot Jan 17, 2025

jiridanek reviewed Jan 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RHOAIENG-16587: fix(test): ensure papermill tests run successfully for all supported notebooks #834

RHOAIENG-16587: fix(test): ensure papermill tests run successfully for all supported notebooks #834

andyatmiami commented Jan 12, 2025 •

edited

Loading

openshift-ci bot commented Jan 12, 2025

openshift-ci bot commented Jan 12, 2025

jiridanek commented Jan 12, 2025

andyatmiami commented Jan 12, 2025 •

edited

Loading

jiridanek commented Jan 13, 2025

andyatmiami commented Jan 14, 2025

jstourac commented Jan 15, 2025

jiridanek commented Jan 15, 2025

andyatmiami commented Jan 17, 2025

andyatmiami commented Jan 17, 2025

andyatmiami commented Jan 17, 2025

openshift-ci bot commented Jan 17, 2025 •

edited

Loading

jiridanek Jan 18, 2025 •

edited

Loading

andyatmiami Jan 18, 2025 •

edited

Loading

jiridanek Jan 18, 2025

	# Following disabled schellcheck intentional as the intended behavior is for those ${1}, ${2} variables to only be expanded when running within kubernetes
	# Following disabled shellcheck intentional as the intended behavior is for those ${1}, ${2} variables to only be expanded when running within kubernetes

RHOAIENG-16587: fix(test): ensure papermill tests run successfully for all supported notebooks #834

Are you sure you want to change the base?

RHOAIENG-16587: fix(test): ensure papermill tests run successfully for all supported notebooks #834

Conversation

andyatmiami commented Jan 12, 2025 • edited Loading

Description

How Has This Been Tested?

Merge criteria:

openshift-ci bot commented Jan 12, 2025

openshift-ci bot commented Jan 12, 2025

jiridanek commented Jan 12, 2025

Alternative proposal

andyatmiami commented Jan 12, 2025 • edited Loading

jiridanek commented Jan 13, 2025

andyatmiami commented Jan 14, 2025

jstourac commented Jan 15, 2025

jiridanek commented Jan 15, 2025

andyatmiami commented Jan 17, 2025

andyatmiami commented Jan 17, 2025

andyatmiami commented Jan 17, 2025

openshift-ci bot commented Jan 17, 2025 • edited Loading

jiridanek Jan 18, 2025 • edited Loading

Choose a reason for hiding this comment

andyatmiami Jan 18, 2025 • edited Loading

Choose a reason for hiding this comment

jiridanek Jan 18, 2025

Choose a reason for hiding this comment

andyatmiami commented Jan 12, 2025 •

edited

Loading

andyatmiami commented Jan 12, 2025 •

edited

Loading

openshift-ci bot commented Jan 17, 2025 •

edited

Loading

jiridanek Jan 18, 2025 •

edited

Loading

andyatmiami Jan 18, 2025 •

edited

Loading