Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Habana to main #1884

Merged
merged 38 commits into from
Sep 29, 2023
Merged

Habana to main #1884

merged 38 commits into from
Sep 29, 2023

Conversation

andrewballantyne
Copy link
Member

@andrewballantyne andrewballantyne commented Sep 29, 2023

Closes: #1450

Description

Merging Habana Feature into main.

How Has This Been Tested?

How I am testing Habana

  • Ability to create accelerator profile and see it in all locations in the UI: workbench, jupyter spawner, model serving, and in both add and edit mode
  • Model server details shows correct accelerator name and number of accelerators
  • Resources and tolerations are successfully added and removed for new and edited workbenches and serving runtimes
  • Existing settings keeps accelerator settings for both runtimes and workbenches
  • Auto detect Nvidia gpu works when tolerations and resource identifier match
  • Setting accelerator to none removes the identifier from the resources and tolerations are removed as well
    • In the case of existing settings -> tolerations are not removed because we don’t know which ones to remove
  • Workbench and serving runtimes will show that images/runtimes/accelerators are recommended if the identifier and annotation match
  • Error and warning cases are correct based on recommended annotation
  • Can update the count without changing the accelerator

NOTE
Migration and accelerator detection cannot be tested without a gpu cluster

Test Impact

Tests will be coming after the feature is merged

Request review criteria:

Self checklist (all need to be checked):

  • The developer has manually tested the changes and verified that the changes work
  • Commits have been squashed into descriptive, self-contained units of work (e.g. 'WIP' and 'Implements feedback' style messages have been removed)
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has added tests or explained why testing cannot be added (unit tests & storybook for related changes)

If you have UI changes:

  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change (find relevant UX in the SMEs section).

After the PR is posted & before it merges:

  • The developer has tested their solution on a cluster by using the image produced by the PR to main

Gkrumbach07 and others added 30 commits July 3, 2023 08:15
* added cr

* added accelerator profile crd

* added to kustomize
add copy to clipboard icon to tooltips
fixed detected accelerator count

connected accelerator detection

added accelerator UI user flow

hide accelerator dropdown when empty

switched the format of the notebook identifier

added accelerator name to serving runtime resource

added serving runtimes accelerators
commit 9387956
Author: Gage Krumbach <gkrumbach@gmail.com>
Date:   Fri Jun 30 14:56:37 2023 -0500

    added accelerator UI user flow

    fixed detected accelerator count

    connected accelerator detection

    added accelerator UI user flow

    hide accelerator dropdown when empty

    switched the format of the notebook identifier

    added accelerator name to serving runtime resource

    added serving runtimes accelerators
commit 26da289
Author: Gage Krumbach <gkrumbach@gmail.com>
Date:   Tue Aug 1 16:40:25 2023 -0500

    fix error state in migration

commit 391cbca
Author: Gage Krumbach <gkrumbach@gmail.com>
Date:   Tue Aug 1 15:09:25 2023 -0500

    added accelerator detection line

commit 50839ac
Author: Gage Krumbach <gkrumbach@gmail.com>
Date:   Thu Jul 27 13:52:24 2023 -0500

    added gpu migration
fix lint errors in accelerator support
update deployed notebooks and sr on migrate

fixed error logging

remove container migration

Added support for "keep what i have"

soft migrate nvidia gpus to profiles

fix

handle exisiting settings

refactored hooks

remove useRef

simplify functions

small changes to hook

merge hooks together

update cluster role

small changes

bug fixes

small type fix

fixed type issues
revert add rbac accelerator role
making image/servingruntime naming dynamic

fix count going back to 0

prevent 0 count and ux style fix

removed usage of unknown when not needed

remove double array usage

improved backend logging

fix logging undefined error

make ?? consistent

fixed "||" and fixed unknown / none details
@openshift-ci openshift-ci bot requested review from ppadti and uidoyen September 29, 2023 15:58
@andrewballantyne andrewballantyne changed the title F/accelerator support Habana to main Sep 29, 2023
@Gkrumbach07
Copy link
Member

@andrewballantyne added testing instructions

@Gkrumbach07
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Sep 29, 2023
@andrewballantyne
Copy link
Member Author

/approve

Going ahead with the merge. We'll have another week before code freeze / release. We can adjust small efforts along the way. QE has been verifying the feature for the past couple weeks.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 29, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewballantyne

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 284aa98 into main Sep 29, 2023
3 checks passed
@andrewballantyne andrewballantyne deleted the f/accelerator-support branch November 23, 2023 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[UI] Habana Support Part 1
4 participants