Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented visualizations for the NIM metrics #3612

Merged
merged 19 commits into from
Jan 17, 2025

Conversation

LinoyBitan1
Copy link
Contributor

@LinoyBitan1 LinoyBitan1 commented Jan 2, 2025

Description

JIRA - https://issues.redhat.com/browse/NVPE-31

  • Implemented visualizations for the NIM metrics. the graphs added to Nim tab-

    1 - Graph for KV Cache usage over time
    2 - Line graph for Running, Waiting and Max Request Count
    3 - Line graph with Total Prompt Token Count and Total Generation Token Count
    4 - Area chart with Time to First Token
    5 - Area chart with Time per Output Token
    6 - Donut chart with Success Request and Failed Requests

  • odh-model-controller creates a ConfigMap next to the InferenceService called -metrics-dashboard with a data indicating if metrics are available for the runtime and if so, the ConfigMap will also contain a JSON object specifying the queries required for the graphs mentioned above.

  • odh-dashboard grabs the above-mentioned ConfigMap based on its name, and if metrics are available for the runtime, it will parse the JSON object and create the required graphs.

NIM Metrics tab-

image
NIM Metrics graphs-

image
image
image
image
image

How Has This Been Tested?

Use the following doc -
NIM Metrics_ frontend.pdf

Test Impact

  • Cypress tests added
  • Unit tests of utility functions
  • Tests of new hooks to follow after.

Request review criteria:

Self checklist (all need to be checked):

  • The developer has manually tested the changes and verified that the changes work
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has added tests or explained why testing cannot be added (unit or cypress tests for related changes)

If you have UI changes:

  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change.

After the PR is posted & before it merges:

  • The developer has tested their solution on a cluster by using the image produced by the PR to main

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress This PR is in WIP state label Jan 2, 2025
Copy link
Contributor

openshift-ci bot commented Jan 2, 2025

Hi @LinoyBitan1. Thanks for your PR.

I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added the needs-ok-to-test The openshift bot needs to label PRs from non members to avoid strain on the CI label Jan 2, 2025
@andrewballantyne
Copy link
Member

FYI @emilys314

@LinoyBitan1 LinoyBitan1 changed the title Add nim metrics Implemented visualizations for the NIM metrics Jan 5, 2025
@LinoyBitan1
Copy link
Contributor Author

This is our (Matan Talvi and Linoy Bitan) first PR.

@LinoyBitan1 LinoyBitan1 marked this pull request as ready for review January 5, 2025 12:26
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress This PR is in WIP state label Jan 5, 2025
@emilys314
Copy link
Contributor

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test The openshift bot needs `ok-to-test` to allow non member PRs to run the tests. and removed needs-ok-to-test The openshift bot needs to label PRs from non members to avoid strain on the CI labels Jan 7, 2025
@emilys314
Copy link
Contributor

Can you run the linter / formatter? The CI checks will not pass unless it's following the linting rules See https://github.com/opendatahub-io/odh-dashboard/blob/main/CONTRIBUTING.md#linter-testing
vs code can show the errors. Or run npm run test:lint

@emilys314
Copy link
Contributor

Can you also add in the PR description the jira item this is associated with?

@LinoyBitan1
Copy link
Contributor Author

LinoyBitan1 commented Jan 8, 2025

done

@LinoyBitan1 LinoyBitan1 closed this Jan 8, 2025
@LinoyBitan1 LinoyBitan1 reopened this Jan 8, 2025
@LinoyBitan1
Copy link
Contributor Author

Can you run the linter / formatter? The CI checks will not pass unless it's following the linting rules See https://github.com/opendatahub-io/odh-dashboard/blob/main/CONTRIBUTING.md#linter-testing vs code can show the errors. Or run npm run test:lint

The linter and formatter were run, and we've added the updates in the commit along with a few changes in the tests. Let us know if there's anything else needed!

Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 83.80952% with 51 lines in your changes missing coverage. Please review.

Project coverage is 85.82%. Comparing base (3ea00fd) to head (eebe460).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
...end/src/api/prometheus/kservePerformanceMetrics.ts 45.16% 34 Missing ⚠️
...metrics/kserve/content/NIMCurrentRequestsGraph.tsx 80.00% 3 Missing ⚠️
...pts/metrics/kserve/useNimMetricsGraphDefinition.ts 84.21% 3 Missing ⚠️
...s/modelServing/screens/metrics/MetricsPageTabs.tsx 62.50% 3 Missing ⚠️
...etrics/kserve/content/NIMRequestsOutcomesGraph.tsx 84.61% 2 Missing ⚠️
...ontend/src/api/prometheus/NimPerformanceMetrics.ts 98.63% 1 Missing ⚠️
.../src/concepts/metrics/kserve/NimMetricsContext.tsx 96.00% 1 Missing ⚠️
...pts/metrics/kserve/content/NIMTokensCountGraph.tsx 92.30% 1 Missing ⚠️
...ts/metrics/kserve/content/NimPerformanceGraphs.tsx 95.65% 1 Missing ⚠️
frontend/src/concepts/metrics/kserve/utils.ts 75.00% 1 Missing ⚠️
... and 1 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3612      +/-   ##
==========================================
- Coverage   85.82%   85.82%   -0.01%     
==========================================
  Files        1419     1433      +14     
  Lines       32644    32957     +313     
  Branches     9167     9222      +55     
==========================================
+ Hits        28018    28284     +266     
- Misses       4626     4673      +47     
Files with missing lines Coverage Δ
frontend/src/concepts/metrics/kserve/const.ts 100.00% <100.00%> (ø)
...ts/metrics/kserve/content/NIMKVCacheUsageGraph.tsx 100.00% <100.00%> (ø)
...rics/kserve/content/NIMTimeForFirstTokenGraphs.tsx 100.00% <100.00%> (ø)
...rics/kserve/content/NIMTimePerOutputTokenGraph.tsx 100.00% <100.00%> (ø)
...cepts/metrics/kserve/content/NimMetricsContent.tsx 100.00% <100.00%> (ø)
...ages/modelServing/screens/metrics/MetricsChart.tsx 89.24% <100.00%> (+0.61%) ⬆️
...s/modelServing/screens/metrics/nim/ModelGraphs.tsx 100.00% <100.00%> (ø)
...es/modelServing/screens/metrics/nim/NimMetrics.tsx 100.00% <100.00%> (ø)
...nd/src/pages/modelServing/screens/metrics/types.ts 100.00% <100.00%> (ø)
...rving/screens/metrics/useMetricsPageEnabledTabs.ts 100.00% <100.00%> (ø)
... and 12 more

... and 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ea00fd...eebe460. Read the comment docs.

Copy link
Contributor

@emilys314 emilys314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to add a link to the metrics view in the model list for NIM projects, currently there is no way to get there? frontend/src/pages/modelServing/screens/global/InferenceServiceTableRow.tsx

image

@emilys314
Copy link
Contributor

@LinoyBitan1 LinoyBitan1 force-pushed the add-nim-metrics branch 3 times, most recently from d36865b to 859f097 Compare January 15, 2025 13:04
Copy link
Contributor

@emilys314 emilys314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor comments (sorry lol), but everything else looks good and is working as intended

…e initIntercepts

Signed-off-by: Olga Lavtar <olavtar@redhat.com>
@emilys314
Copy link
Contributor

/lgtm
/approve

Copy link
Contributor

openshift-ci bot commented Jan 17, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: emilys314

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit b263b57 into opendatahub-io:main Jan 17, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm ok-to-test The openshift bot needs `ok-to-test` to allow non member PRs to run the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants