-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrected performance data when batch size is greater than 1 #1100
base: master
Are you sure you want to change the base?
Corrected performance data when batch size is greater than 1 #1100
Conversation
A note "// If in 10 ms a batch of 5 new tokens is generated then TPOT is 10 / 5 = 2 tok/ms." seems valid to me (except it should be 2 ms / tok). |
@pavel-esir Looks like we can't simply remove this line, can you create a PR? The request is to give the duration time for a batch. |
thanks for opening this discussion, i will take a look |
@peterchen-intel if you need time points/duration of each batch you can get them from eg:
All fields of Do we still a new PR? |
@peterchen-intel i have added example of getting times of generation of each token/batch of tokens from raw performance metrics here #1118 |
@pavel-esir Can we expose m_durations[i] with the time for the token(s) from one inference (generates batch_sizes[i] tokens)? Current m_durations[i] is not so convinced, since it can't generate one token in m_durations[i] when batch_sizes[i] >1, but generates batch_sizes[i] tokens with one inference in m_durations[i]. |
openvino.genai/src/cpp/src/perf_metrics.cpp
Line 115 in fa324cf
raw_metrics.m_durations[i] /= batch_sizes[i];
@m_durations[i] should be the duration for one inference which may generate one or more (batch_size > 1) tokens. In the case batch_size>1, it means @batch_size tokens are generated together in @m_durations[i] time, not one token is generated in @m_durations[i]/@batch_size time