Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHOAIENG-16587: fix(test): ensure papermill tests run successfully for all supported notebooks #834

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

andyatmiami
Copy link
Contributor

@andyatmiami andyatmiami commented Jan 12, 2025

Description

Our testing framework had a fundamental issue in that it would, for certain images, naively run the test suite of parent images for a child image. In the case of tensorflow images, this caused problems because the (child) tensorflow image would purposefully downgraded a package version given compatability issues. Furthermore, when attempting to verify the changes to address this core issue, numerous other bugs were encountered that required to be fixed.

Significant changes introduced in this commit:

  • logic of the test-% Makefile target was moved into a shell script named test_jupyter_with_papermill.sh
  • test resources required to be copied to pod are now retrieved via the locally checked out repo
    • previously there were pulled from a remote branch via wget (defaulting to main branch)
    • this ensure our PR checks are now always leveraging any updated files
  • test_notebook.ipynb files now expect an expected_versions.json file to exist within the same directory.
    • expected_versions.json file is derived from the relevant ...-notebook-imagestream.yaml manifest and leveraged when asserting on dependency versions
    • admittedly the duplication of various helper functions across all the notebook files is annoying - but helps to keep the .ipynb files self-contained
  • ...-notebook-imagestream.yaml manifest had annotations updated to include any package that had a unit test in test_notebook.ipynb asserting versions
  • CUDA tensorflow unit test that converts to ONNX was updated to be actually functional

Minor changes introduced in this commit:

  • use ifdef OS in Makefile to avoid warnings about undefined variable
  • all test_notebook.ipynb files:
    • have an id attribute defined in metadata
    • specify the nbformat as 4.5
  • the more compute-intensive notebooks had readinessProbe and livenessProbe settings updated to be less aggressive
    • was observing liveness checks sporadically failing while the notebook was being tested - and this update seemed to help
  • trustyai notebook now runs the minimal and datascience papermill tests (similar to tensorflow and pytorch) instead of including the test code within its own test_noteook.ipynb file
  • various "quality of life" improvements where introduced into test_jupyter_with_papermill.sh
    • properly invoke tests for any valid/supported target
      • previously certain test targets required manual intervention in order to run end to end
    • improved logging (particularly when running with the -x flag)
    • more modular structure to hopefully improve readability
    • script now honors proper shell return code semantics (i.e. returns 0 on success)

It should also be noted that while most intel notebooks are now passing the papermill tests - there are issues with the intel tensorflow unit tests still. More detail is captured in the JIRA ticket related to this commit. However, we are imminently removing intel logic from the notebooks repo entirely... so I didn't wanna burn any more time trying to get that last notebook to pass as it will be removed shortly!

Further details on expected_versions.json:

  • yq (which is installed via the Makefile is used to:
    • query the relevant imagestream manifest to parse out the ["opendatahub.io/notebook-software"] and ["opendatahub.io/notebook-python-dependencies"] annotations from the first element of the spec.tags attribute
    • inject name/version elements for nbdime and nbgitpuller (which are asserted against in minimal notebook tests)
    • convert this yaml list to a JSON object of the form: {<package>: <version>}
  • this JSON object is then copied into the running notebook workload in the same directly that test_notebook.ipynb resides
  • each test_notebook.ipynb has a couple helper functions defined to then interact with this file:
    • def load_expected_versions() -> dict:
    • def get_expected_version(dependency_name: str) -> str:
      • The argument provided to the get_expected_version function should match the name attribute of the JSON structure defined in the imagestream manifest

Related-to: https://issues.redhat.com/browse/RHOAIENG-16587

How Has This Been Tested?

Testing can be performed simply by invoking make with a target that matches the test-% pattern-matching rule.

ex: gmake test-jupyter-pytorch-ubi9-python-3.11 -e NOTEBOOK_REPO_BRANCH_BASE=https://raw.githubusercontent.com/andyatmiami/notebooks/fix-e2e-asserts for basic sanity checking. However, all known/supported/expected test-% patterns should be verified.

I have also created a utility in https://gitlab.cee.redhat.com/astonebe/notebook-utils that can be used to more efficiently test notebooks "in bulk". While it still runs sequentially, it handles deploying, testing, and undeploying a given list of targets.

  • I used this utility script to test all notebooks that have a corresponding imagestream manifest.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link
Contributor

openshift-ci bot commented Jan 12, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Jan 12, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from andyatmiami. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jiridanek
Copy link
Member

Alternative proposal

The package versions we should have in our images are the ones given in the docs, which is at https://access.redhat.com/articles/rhoai-supported-configs (section "Supported workbench images"). This table is generated (or written manually, before red-hat-data-services#476), based on our ImageStream manifests at https://github.com/opendatahub-io/notebooks/tree/main/manifests/base (and https://github.com/red-hat-data-services/notebooks/tree/main/manifests/base for RHOAI)

opendatahub.io/notebook-python-dependencies: |
[
{"name": "ROCm-PyTorch", "version": "2.4"},
{"name": "Tensorboard", "version": "2.16"},
{"name": "Kafka-Python-ng", "version": "2.2"},
{"name": "Matplotlib", "version": "3.9"},
{"name": "Numpy", "version": "2.1"},

We should not duplicate that information in the test ipython notebook files. The tests should check that what's in the image corresponds to what's given in the manifests.

@andyatmiami
Copy link
Contributor Author

andyatmiami commented Jan 12, 2025

So, given today in our tests we are duplicating this information...

do you want me to address your suggestion in this PR or a follow up PR?

For emphasis - I wholly agree with your proposal - I've just scope-creeped the heck outta this present PR already 😎

@jiridanek
Copy link
Member

do you want me to address your suggestion in this PR or a follow up PR?

We did not do enough PRs of this kind for me to discern a pattern, but on my previous team, there was preference for moving in small steps, e.g. first move script embedded in Makefile to it's own file unchanged, then modify said script.

Guess this is largely a mater of personal taste. And also of review speed, it's hard to split things up when reviews for the parts are not forthcoming.

@andyatmiami
Copy link
Contributor Author

/test all

@jstourac
Copy link
Member

I got to this for a brief review. I agree with Jiri to re-use what we define in our manifest files in imagestreams for the testing purposes instead of duplicating - but yeah, let's do it step by step as you wrote above. So, I am happy to see these tests being extracted from the Makefile making it a bit more readable. What I'm not a big fan here is that we're creating yet another bash script for this. I think we should utilize the existing pytest framework with the testcontainers for as much tests as possible so that we can run these in a unified environment.

Anyway, since this aims only to extract these out of Makefile and to keep this compatible with the existing approach, I'm not against this in general right now. But to the future, I think we should think to migrate this again - this time to the pytest where most of our tests will be, hopefully.

@jiridanek
Copy link
Member

I'm with Jan, pulling the bash code out of Makefile is a good improvement in itself.

@andyatmiami andyatmiami force-pushed the fix-e2e-asserts branch 2 times, most recently from ee21ef4 to 5dd5618 Compare January 17, 2025 19:51
@andyatmiami andyatmiami changed the title fix(test): provide ability to override versions used in asserts during papermill tests fix(test): ensure papermill tests run successfully for all supported notebooks Jan 17, 2025
@andyatmiami andyatmiami changed the title fix(test): ensure papermill tests run successfully for all supported notebooks RHOAIENG-16587: fix(test): ensure papermill tests run successfully for all supported notebooks Jan 17, 2025
…notebooks

Our testing framework had a fundamental issue in that it would, for certain images, naively run the test suite of parent images for a child image.  In the case of `tensorflow` images, this caused problems because the (child) `tensorflow` image would **purposefully** _downgraded_ a package version given compatability issues. Furthermore, when attempting to verify the changes to address this core issue, numerous other bugs were encountered that required to be fixed.

Significant changes introduced in this commit:
- logic of the `test-%` Makefile target was moved into a shell script named `test_jupyter_with_papermill.sh`
- test resources required to be copied to pod are now retrieved via the locally checked out repo
    - previously there were pulled from a remote branch via `wget` (defaulting to `main` branch)
    - this ensure our PR checks are now always leveraging any updated files
- test_notebook.ipynb files now expect an `expected_versions.json` file to exist within the same directory.
    - `expected_versions.json` file is derived from the relevant `...-notebook-imagestream.yaml` manifest and leveraged when asserting on dependency versions
		- admittedly the duplication of various helper functions across all the notebook files is annoying - but helps to keep the `.ipynb` files self-contained
- `...-notebook-imagestream.yaml` manifest had annotations updated to include any package that had a unit test in `test_notebook.ipynb` asserting versions
- CUDA tensorflow unit test that converts to ONNX was updated to be actually functional

Minor changes introduced in this commit:
- use `ifdef OS` in Makefile to avoid warnings about undefined variable
- all `test_notebook.ipynb` files:
    - have an `id` attribute defined in metadata
		- specify the `nbformat` as `4.5`
- the more compute-intensive notebooks had `readinessProbe` and `livenessProbe` settings updated to be less aggressive
    - was observing liveness checks sporadically failing while the notebook was being tested - and this update seemed to help
- `trustyai` notebook now runs the minimal and datascience papermill tests (similar to `tensorflow` and `pytorch`) instead of including the test code within its own `test_noteook.ipynb` file
- various "quality of life" improvements where introduced into `test_jupyter_with_papermill.sh`
    - properly invoke tests for any valid/supported target
        - previously certain test targets required manual intervention in order to run end to end
    - improved logging (particularly when running with the `-x` flag)
		- more modular structure to hopefully improve readability
		- script now honors proper shell return code semantics (i.e. returns `0` on success)

It should also be noted that while most `intel` notebooks are now passing the papermill tests - there are issues with the `intel` `tensorflow` unit tests still.  More detail is captured in the JIRA ticket related to this commit.  However, we are imminently removing `intel` logic from the `notebooks` repo entirely... so I didn't wanna burn any more time trying to get that last notebook to pass as it will be removed shortly!

Further details on `expected_versions.json`:
- `yq` (which is installed via the `Makefile` is used to:
    - query the relevant imagestream manifest to parse out the `["opendatahub.io/notebook-software"]` and `["opendatahub.io/notebook-python-dependencies"]` annotations from the first element of the `spec.tags` attribute
		- inject name/version elements for `nbdime` and `nbgitpuller` (which are asserted against in `minimal` notebook tests)
		- convert this `yaml` list to a JSON object of the form: `{<package>: <version>}`
- this JSON object is then copied into the running notebook workload in the same directly that `test_notebook.ipynb` resides
- each `test_notebook.ipynb` has a couple helper functions defined to then interact with this file:
    - `def load_expected_versions() -> dict:`
		- `def get_expected_version(dependency_name: str) -> str:
- The argument provided to the `get_expected_version` function should match the `name` attribute of the JSON structure defined in the imagestream manifest

Related-to: https://issues.redhat.com/browse/RHOAIENG-16587
@andyatmiami andyatmiami marked this pull request as ready for review January 17, 2025 19:57
@openshift-ci openshift-ci bot requested review from dibryant and jiridanek January 17, 2025 19:58
@opendatahub-io opendatahub-io deleted a comment from openshift-ci bot Jan 17, 2025
@opendatahub-io opendatahub-io deleted a comment from openshift-ci bot Jan 17, 2025
@andyatmiami
Copy link
Contributor Author

/test notebooks-ubi9-e2e-tests

(weird GCP errors on building the cluster - hopefully transient 🤞 )

@andyatmiami
Copy link
Contributor Author

PASSING TESTS!

image

@andyatmiami
Copy link
Contributor Author

/test notebook-rocm-jupyter-tf-ubi9-python-3-11-pr-image-mirror notebook-rocm-jupyter-pyt-ubi9-python-3-11-pr-image-mirror

Copy link
Contributor

openshift-ci bot commented Jan 17, 2025

@andyatmiami: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/rocm-notebooks-e2e-tests 7388f35 link true /test rocm-notebooks-e2e-tests
ci/prow/notebooks-ubi9-e2e-tests 7388f35 link true /test notebooks-ubi9-e2e-tests
ci/prow/intel-notebooks-e2e-tests 7388f35 link true /test intel-notebooks-e2e-tests
ci/prow/images 7388f35 link true /test images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +39 to +51
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /notebook/opendatahub/jovyan/api
port: notebook-port
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
Copy link
Member

@jiridanek jiridanek Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in production, these timeouts come from dashboard; maybe we should use what dashboard sets? or possibly be stricter; and not more generous here

https://github.com/opendatahub-io/odh-dashboard/blob/440d8fef243d9ae78ed5a3a4c51002d655521cb6/frontend/src/api/k8s/notebooks.ts#L160-L176

Copy link
Contributor Author

@andyatmiami andyatmiami Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dashboard also takes a user-supplied configured value for resources:

And at some level.. the amount of compute resources available to the workload are going to influence the liveness/readiness probe performance..

I don't personally believe its a good apples-to-apples comparison..

That being said - I didn't spend time determining which "lever" of the probes was the problem - I suspect its actually the timeoutSeconds - as compute heavy workloads like TF, PYT, and Intel-ML (whatever that is 😎 ) do a lot of initial churning when firing up...

timeoutSeconds: 1 is the default behavior.. so I fear following your recommendation would simply "revert" my change.. and I have plenty of evidence from running these tests to know without any changes.. these workloads WILL DIE leading to flaky tests..

However, if the desire is to keep these properties as close to Dashboard as possible - I can spend a little more time profiling to perhaps land on something like timeoutSeconds: 3 | 4 (with all other values unchanged) that will keep things consistently passing... I do think some deviation from the Dashboard probe settings is unavoidable though (just to set expectations)

nbdime_version=${nbdime_version} nbgitpuller_version=${nbgitpuller_version} "${yqbin}" '. + [{"name": "nbdime", "version": strenv(nbdime_version)},{"name": "nbgitpuller", "version": strenv(nbgitpuller_version)}]' |
"${yqbin}" -N -o json '[ .[] | (.name | key) = "key" | (.version | key) = "value" ] | from_entries')

# Following disabled schellcheck intentional as the intended behavior is for those ${1}, ${2} variables to only be expanded when running within kubernetes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Following disabled schellcheck intentional as the intended behavior is for those ${1}, ${2} variables to only be expanded when running within kubernetes
# Following disabled shellcheck intentional as the intended behavior is for those ${1}, ${2} variables to only be expanded when running within kubernetes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants