Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory consumption estimation for models in profile API. #853

Merged
merged 2 commits into from
Apr 17, 2023

Conversation

jngz-es
Copy link
Collaborator

@jngz-es jngz-es commented Apr 14, 2023

Description

Add CPU/GPU memory consumption estimation for DL models in profile API. An example is as following,

{
"nodes" : {
"jJA5JA5ES1CG71Yugpc84g" : {
"models" : {
"FMtWfYcB6ZKIXgDcq6sw" : {
"model_state" : "DEPLOYED",
"predictor" : "org.opensearch.ml.engine.algorithms.text_embedding.TextEmbeddingModel@253a6c05",
"target_worker_nodes" : [
"jJA5JA5ES1CG71Yugpc84g"
],
"worker_nodes" : [
"jJA5JA5ES1CG71Yugpc84g"
],
"mem_size_estimation_cpu" : 105529143,
"mem_size_estimation_gpu" : 105529143
},
"p1hsfIcBSHStRf2jiLu6" : {
"model_state" : "DEPLOYED",
"predictor" : "org.opensearch.ml.engine.algorithms.text_embedding.TextEmbeddingModel@76d713a3",
"target_worker_nodes" : [
"jJA5JA5ES1CG71Yugpc84g"
],
"worker_nodes" : [
"jJA5JA5ES1CG71Yugpc84g"
],
"mem_size_estimation_cpu" : 148025802,
"mem_size_estimation_gpu" : 148025802
}
}
}
}
}

Issues Resolved

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Jing Zhang <jngz@amazon.com>
@codecov-commenter
Copy link

Codecov Report

Merging #853 (15b99eb) into 2.x (036bda0) will decrease coverage by 0.14%.
The diff coverage is 73.68%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##                2.x     #853      +/-   ##
============================================
- Coverage     84.81%   84.67%   -0.14%     
- Complexity     1628     1631       +3     
============================================
  Files           135      135              
  Lines          6079     6116      +37     
  Branches        596      601       +5     
============================================
+ Hits           5156     5179      +23     
- Misses          666      674       +8     
- Partials        257      263       +6     
Flag Coverage Δ
ml-commons 84.67% <73.68%> (-0.14%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...n/java/org/opensearch/ml/model/MLModelManager.java 79.44% <0.00%> (-0.19%) ⬇️
...java/org/opensearch/ml/profile/MLModelProfile.java 62.50% <60.00%> (-0.47%) ⬇️
...va/org/opensearch/ml/model/MLModelCacheHelper.java 90.06% <77.27%> (-2.19%) ⬇️
...ain/java/org/opensearch/ml/model/MLModelCache.java 86.76% <100.00%> (+0.40%) ⬆️
...va/org/opensearch/ml/rest/RestMLProfileAction.java 94.73% <100.00%> (+0.11%) ⬆️

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

public synchronized void setMemSizeEstimation(String modelId, MLModelFormat format, Long size) {
Long memSize = getMemSizeEstimation(format, size);
log.debug("Updating memSizeEstimation of Model {} to {}", modelId, memSize);
getExistingModelCache(modelId).setMemSizeEstimationCPU(memSize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we set same value for CPU and GPU, does that mean the CPU and GPU memory consumption is almost the same?

Copy link
Collaborator

@dhrubo-os dhrubo-os Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have the same question. Github should have a +1 icon :)

Also what if the model is trained in the CPU but performing inference in GPU. Will the memory consumption be similar like if it's trained in GPU and performing inference in GPU?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, from my experiments the cpu, gpu memory consumption is similar.

Double scale = 1.0;
switch (format) {
case ONNX:
scale = 1.5;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment about explanation of setting up these magic numbers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a rough estimation, actually we will deprecate it by a accurate method.

ylwu-amzn
ylwu-amzn previously approved these changes Apr 17, 2023
dhrubo-os
dhrubo-os previously approved these changes Apr 17, 2023
@jngz-es jngz-es dismissed stale reviews from dhrubo-os and ylwu-amzn via cd2215c April 17, 2023 17:10
@jngz-es jngz-es requested review from ylwu-amzn and dhrubo-os April 17, 2023 17:11
@jngz-es jngz-es merged commit dd2799a into opensearch-project:2.x Apr 17, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 17, 2023
Signed-off-by: Jing Zhang <jngz@amazon.com>
(cherry picked from commit dd2799a)
jngz-es added a commit that referenced this pull request Apr 17, 2023
…856)

Signed-off-by: Jing Zhang <jngz@amazon.com>
(cherry picked from commit dd2799a)

Co-authored-by: Jing Zhang <jngz@amazon.com>
rbhavna pushed a commit to rbhavna/ml-commons that referenced this pull request Jun 16, 2023
rbhavna added a commit that referenced this pull request Jun 16, 2023
…991)

Signed-off-by: Jing Zhang <jngz@amazon.com>
Co-authored-by: Jing Zhang <jngz@amazon.com>
zane-neo pushed a commit to zane-neo/ml-commons that referenced this pull request Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants