Add memory consumption estimation for models in profile API. #853

jngz-es · 2023-04-14T01:47:33Z

Description

Add CPU/GPU memory consumption estimation for DL models in profile API. An example is as following,

{
"nodes" : {
"jJA5JA5ES1CG71Yugpc84g" : {
"models" : {
"FMtWfYcB6ZKIXgDcq6sw" : {
"model_state" : "DEPLOYED",
"predictor" : "org.opensearch.ml.engine.algorithms.text_embedding.TextEmbeddingModel@253a6c05",
"target_worker_nodes" : [
"jJA5JA5ES1CG71Yugpc84g"
],
"worker_nodes" : [
"jJA5JA5ES1CG71Yugpc84g"
],
"mem_size_estimation_cpu" : 105529143,
"mem_size_estimation_gpu" : 105529143
},
"p1hsfIcBSHStRf2jiLu6" : {
"model_state" : "DEPLOYED",
"predictor" : "org.opensearch.ml.engine.algorithms.text_embedding.TextEmbeddingModel@76d713a3",
"target_worker_nodes" : [
"jJA5JA5ES1CG71Yugpc84g"
],
"worker_nodes" : [
"jJA5JA5ES1CG71Yugpc84g"
],
"mem_size_estimation_cpu" : 148025802,
"mem_size_estimation_gpu" : 148025802
}
}
}
}
}

Issues Resolved

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Jing Zhang <jngz@amazon.com>

codecov-commenter · 2023-04-14T17:52:08Z

Codecov Report

Merging #853 (15b99eb) into 2.x (036bda0) will decrease coverage by 0.14%.
The diff coverage is 73.68%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##                2.x     #853      +/-   ##
============================================
- Coverage     84.81%   84.67%   -0.14%     
- Complexity     1628     1631       +3     
============================================
  Files           135      135              
  Lines          6079     6116      +37     
  Branches        596      601       +5     
============================================
+ Hits           5156     5179      +23     
- Misses          666      674       +8     
- Partials        257      263       +6

Flag	Coverage Δ
ml-commons	`84.67% <73.68%> (-0.14%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...n/java/org/opensearch/ml/model/MLModelManager.java	`79.44% <0.00%> (-0.19%)`	⬇️
...java/org/opensearch/ml/profile/MLModelProfile.java	`62.50% <60.00%> (-0.47%)`	⬇️
...va/org/opensearch/ml/model/MLModelCacheHelper.java	`90.06% <77.27%> (-2.19%)`	⬇️
...ain/java/org/opensearch/ml/model/MLModelCache.java	`86.76% <100.00%> (+0.40%)`	⬆️
...va/org/opensearch/ml/rest/RestMLProfileAction.java	`94.73% <100.00%> (+0.11%)`	⬆️

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

ylwu-amzn · 2023-04-14T17:54:24Z

plugin/src/main/java/org/opensearch/ml/model/MLModelCacheHelper.java

+    public synchronized void setMemSizeEstimation(String modelId, MLModelFormat format, Long size) {
+        Long memSize = getMemSizeEstimation(format, size);
+        log.debug("Updating memSizeEstimation of Model {}  to {}", modelId, memSize);
+        getExistingModelCache(modelId).setMemSizeEstimationCPU(memSize);


I see we set same value for CPU and GPU, does that mean the CPU and GPU memory consumption is almost the same?

I also have the same question. Github should have a +1 icon :)

Also what if the model is trained in the CPU but performing inference in GPU. Will the memory consumption be similar like if it's trained in GPU and performing inference in GPU?

Yes, from my experiments the cpu, gpu memory consumption is similar.

dhrubo-os · 2023-04-14T18:28:28Z

plugin/src/main/java/org/opensearch/ml/model/MLModelCacheHelper.java

+        Double scale = 1.0;
+        switch (format) {
+            case ONNX:
+                scale = 1.5;


Can we add a comment about explanation of setting up these magic numbers?

It is a rough estimation, actually we will deprecate it by a accurate method.

Signed-off-by: Jing Zhang <jngz@amazon.com> (cherry picked from commit dd2799a)

…856) Signed-off-by: Jing Zhang <jngz@amazon.com> (cherry picked from commit dd2799a) Co-authored-by: Jing Zhang <jngz@amazon.com>

…rch-project#853) Signed-off-by: Jing Zhang <jngz@amazon.com>

…991) Signed-off-by: Jing Zhang <jngz@amazon.com> Co-authored-by: Jing Zhang <jngz@amazon.com>

…rch-project#853) Signed-off-by: Jing Zhang <jngz@amazon.com>

Add memory consumption estimation for models in profile API.

15b99eb

Signed-off-by: Jing Zhang <jngz@amazon.com>

jngz-es requested review from ylwu-amzn, Zhangxunmt, b4sjoo, dhrubo-os and rbhavna April 14, 2023 01:47

ylwu-amzn reviewed Apr 14, 2023

View reviewed changes

dhrubo-os reviewed Apr 14, 2023

View reviewed changes

ylwu-amzn previously approved these changes Apr 17, 2023

View reviewed changes

dhrubo-os previously approved these changes Apr 17, 2023

View reviewed changes

Merge branch '2.x' into 2.x

cd2215c

jngz-es dismissed stale reviews from dhrubo-os and ylwu-amzn via cd2215c April 17, 2023 17:10

jngz-es requested review from ylwu-amzn and dhrubo-os April 17, 2023 17:11

ylwu-amzn approved these changes Apr 17, 2023

View reviewed changes

dhrubo-os approved these changes Apr 17, 2023

View reviewed changes

jngz-es merged commit dd2799a into opensearch-project:2.x Apr 17, 2023

jngz-es added the backport 2.7 label Apr 17, 2023

opensearch-trigger-bot bot mentioned this pull request Apr 17, 2023

[Backport 2.7] Add memory consumption estimation for models in profile API. #856

Merged

opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 17, 2023

Add memory consumption estimation for models in profile API. (#853)

1479465

Signed-off-by: Jing Zhang <jngz@amazon.com> (cherry picked from commit dd2799a)

jngz-es added a commit that referenced this pull request Apr 17, 2023

Add memory consumption estimation for models in profile API. (#853) (#…

c5beb06

…856) Signed-off-by: Jing Zhang <jngz@amazon.com> (cherry picked from commit dd2799a) Co-authored-by: Jing Zhang <jngz@amazon.com>

rbhavna pushed a commit to rbhavna/ml-commons that referenced this pull request Jun 16, 2023

Add memory consumption estimation for models in profile API. (opensea…

a084d5a

…rch-project#853) Signed-off-by: Jing Zhang <jngz@amazon.com>

rbhavna mentioned this pull request Jun 16, 2023

[Backport to main] Add memory consumption estimation for models in profile API. (#853) #991

Merged

5 tasks

rbhavna added a commit that referenced this pull request Jun 16, 2023

Add memory consumption estimation for models in profile API. (#853) (#…

42fa5cd

…991) Signed-off-by: Jing Zhang <jngz@amazon.com> Co-authored-by: Jing Zhang <jngz@amazon.com>

zane-neo pushed a commit to zane-neo/ml-commons that referenced this pull request Aug 23, 2023

Add memory consumption estimation for models in profile API. (opensea…

bb4046b

…rch-project#853) Signed-off-by: Jing Zhang <jngz@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add memory consumption estimation for models in profile API. #853

Add memory consumption estimation for models in profile API. #853

jngz-es commented Apr 14, 2023

codecov-commenter commented Apr 14, 2023

ylwu-amzn Apr 14, 2023

dhrubo-os Apr 14, 2023 •

edited

Loading

jngz-es Apr 14, 2023

dhrubo-os Apr 14, 2023

jngz-es Apr 17, 2023

Add memory consumption estimation for models in profile API. #853

Add memory consumption estimation for models in profile API. #853

Conversation

jngz-es commented Apr 14, 2023

Description

Issues Resolved

Check List

codecov-commenter commented Apr 14, 2023

Codecov Report

ylwu-amzn Apr 14, 2023

Choose a reason for hiding this comment

dhrubo-os Apr 14, 2023 • edited Loading

Choose a reason for hiding this comment

jngz-es Apr 14, 2023

Choose a reason for hiding this comment

dhrubo-os Apr 14, 2023

Choose a reason for hiding this comment

jngz-es Apr 17, 2023

Choose a reason for hiding this comment

dhrubo-os Apr 14, 2023 •

edited

Loading