Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various request time metrics #121

Merged
merged 70 commits into from
Aug 7, 2024

Conversation

Bslabe123
Copy link
Contributor

@Bslabe123 Bslabe123 commented Jul 24, 2024

Added the following metrics:

  • jetstream_queue_duration
  • jetstream_time_to_first_token
  • jetstream_time_per_output_token
  • jetstream_time_per_prefill_token
  • jetstream_time_per_request
  • jetstream_wait_time_per_request

Scrape results after running the following command twice:

seq 50 | xargs -P 50 -n 1 curl --request POST --header "Content-type: application/json" -s localhost:8000/generate --data '{
    "prompt": "Can you provide a comprehensive and detailed overview of the history and development of artificial intelligence.",
    "max_tokens": 200
}'
# HELP jetstream_queue_duration The total time each request spends enqueued in seconds
# TYPE jetstream_queue_duration histogram
jetstream_queue_duration_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.01"} 2.0
jetstream_queue_duration_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.02"} 5.0
jetstream_queue_duration_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.05"} 9.0
jetstream_queue_duration_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.1"} 15.0
jetstream_queue_duration_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.2"} 16.0
jetstream_queue_duration_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.5"} 17.0
jetstream_queue_duration_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="1.0"} 17.0
jetstream_queue_duration_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="2.0"} 17.0
jetstream_queue_duration_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="5.0"} 33.0
jetstream_queue_duration_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="10.0"} 51.0
jetstream_queue_duration_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="20.0"} 66.0
jetstream_queue_duration_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="50.0"} 100.0
jetstream_queue_duration_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="100.0"} 100.0
jetstream_queue_duration_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="+Inf"} 100.0
jetstream_queue_duration_count{id="maxengine-server-58dc8c7895-2h8ks"} 100.0
jetstream_queue_duration_sum{id="maxengine-server-58dc8c7895-2h8ks"} 1640.7883876687847
# HELP jetstream_queue_duration_created The total time each request spends enqueued in seconds
# TYPE jetstream_queue_duration_created gauge
jetstream_queue_duration_created{id="maxengine-server-58dc8c7895-2h8ks"} 1.722895603926823e+09
# HELP jetstream_time_to_first_token Time to first token per request in seconds
# TYPE jetstream_time_to_first_token histogram
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.001"} 0.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.005"} 0.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.01"} 0.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.02"} 45.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.04"} 47.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.06"} 80.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.08"} 83.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.1"} 87.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.25"} 98.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.5"} 99.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.75"} 99.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="1.0"} 99.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="2.5"} 99.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="5.0"} 99.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="7.5"} 99.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="10.0"} 99.0
jetstream_time_to_first_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="+Inf"} 100.0
jetstream_time_to_first_token_count{id="maxengine-server-58dc8c7895-2h8ks"} 100.0
jetstream_time_to_first_token_sum{id="maxengine-server-58dc8c7895-2h8ks"} 19.157443011994474
# HELP jetstream_time_to_first_token_created Time to first token per request in seconds
# TYPE jetstream_time_to_first_token_created gauge
jetstream_time_to_first_token_created{id="maxengine-server-58dc8c7895-2h8ks"} 1.7228956039332345e+09
# HELP jetstream_time_per_output_token Average time per output token per request in seconds
# TYPE jetstream_time_per_output_token histogram
jetstream_time_per_output_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.01"} 0.0
jetstream_time_per_output_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.025"} 64.0
jetstream_time_per_output_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.05"} 80.0
jetstream_time_per_output_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.075"} 80.0
jetstream_time_per_output_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.1"} 95.0
jetstream_time_per_output_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.15"} 100.0
jetstream_time_per_output_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.2"} 100.0
jetstream_time_per_output_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.3"} 100.0
jetstream_time_per_output_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.4"} 100.0
jetstream_time_per_output_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.5"} 100.0
jetstream_time_per_output_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.75"} 100.0
jetstream_time_per_output_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="1.0"} 100.0
jetstream_time_per_output_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="2.5"} 100.0
jetstream_time_per_output_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="+Inf"} 100.0
jetstream_time_per_output_token_count{id="maxengine-server-58dc8c7895-2h8ks"} 100.0
jetstream_time_per_output_token_sum{id="maxengine-server-58dc8c7895-2h8ks"} 3.2044022676919535
# HELP jetstream_time_per_output_token_created Average time per output token per request in seconds
# TYPE jetstream_time_per_output_token_created gauge
jetstream_time_per_output_token_created{id="maxengine-server-58dc8c7895-2h8ks"} 1.7228956241800778e+09
# HELP jetstream_time_per_prefill_token Prefill time per token per request in seconds
# TYPE jetstream_time_per_prefill_token histogram
jetstream_time_per_prefill_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="1e-05"} 0.0
jetstream_time_per_prefill_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="2e-05"} 0.0
jetstream_time_per_prefill_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="5e-05"} 0.0
jetstream_time_per_prefill_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.0001"} 0.0
jetstream_time_per_prefill_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.0002"} 0.0
jetstream_time_per_prefill_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.0005"} 93.0
jetstream_time_per_prefill_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.001"} 94.0
jetstream_time_per_prefill_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.002"} 94.0
jetstream_time_per_prefill_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.005"} 97.0
jetstream_time_per_prefill_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.01"} 97.0
jetstream_time_per_prefill_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.02"} 99.0
jetstream_time_per_prefill_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.05"} 99.0
jetstream_time_per_prefill_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.1"} 99.0
jetstream_time_per_prefill_token_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="+Inf"} 100.0
jetstream_time_per_prefill_token_count{id="maxengine-server-58dc8c7895-2h8ks"} 100.0
jetstream_time_per_prefill_token_sum{id="maxengine-server-58dc8c7895-2h8ks"} 0.8837934013507844
# HELP jetstream_time_per_prefill_token_created Prefill time per token per request in seconds
# TYPE jetstream_time_per_prefill_token_created gauge
jetstream_time_per_prefill_token_created{id="maxengine-server-58dc8c7895-2h8ks"} 1.722895603904875e+09
# HELP jetstream_time_per_request End to end request latency in seconds
# TYPE jetstream_time_per_request histogram
jetstream_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="1.0"} 0.0
jetstream_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="2.5"} 0.0
jetstream_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="5.0"} 64.0
jetstream_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="10.0"} 80.0
jetstream_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="15.0"} 80.0
jetstream_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="20.0"} 95.0
jetstream_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="30.0"} 100.0
jetstream_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="40.0"} 100.0
jetstream_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="50.0"} 100.0
jetstream_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="60.0"} 100.0
jetstream_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="+Inf"} 100.0
jetstream_time_per_request_count{id="maxengine-server-58dc8c7895-2h8ks"} 100.0
jetstream_time_per_request_sum{id="maxengine-server-58dc8c7895-2h8ks"} 644.0848558060825
# HELP jetstream_time_per_request_created End to end request latency in seconds
# TYPE jetstream_time_per_request_created gauge
jetstream_time_per_request_created{id="maxengine-server-58dc8c7895-2h8ks"} 1.7228956241802106e+09
# HELP jetstream_wait_time_per_request Time each request is not being prefilled or decoded
# TYPE jetstream_wait_time_per_request histogram
jetstream_wait_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.01"} 2.0
jetstream_wait_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.02"} 3.0
jetstream_wait_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.05"} 9.0
jetstream_wait_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.1"} 15.0
jetstream_wait_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.2"} 16.0
jetstream_wait_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="0.5"} 17.0
jetstream_wait_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="1.0"} 17.0
jetstream_wait_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="2.0"} 17.0
jetstream_wait_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="5.0"} 33.0
jetstream_wait_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="10.0"} 51.0
jetstream_wait_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="20.0"} 66.0
jetstream_wait_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="50.0"} 100.0
jetstream_wait_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="100.0"} 100.0
jetstream_wait_time_per_request_bucket{id="maxengine-server-58dc8c7895-2h8ks",le="+Inf"} 100.0
jetstream_wait_time_per_request_count{id="maxengine-server-58dc8c7895-2h8ks"} 100.0
jetstream_wait_time_per_request_sum{id="maxengine-server-58dc8c7895-2h8ks"} 1641.1120911570033
# HELP jetstream_wait_time_per_request_created Time each request is not being prefilled or decoded
# TYPE jetstream_wait_time_per_request_created gauge
jetstream_wait_time_per_request_created{id="maxengine-server-58dc8c7895-2h8ks"} 1.7228956241802614e+09

@Bslabe123 Bslabe123 requested a review from vipannalla as a code owner July 24, 2024 23:55
Copy link
Contributor

@FanhaiLu1 FanhaiLu1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you fix check errors?

jetstream/core/orchestrator.py Show resolved Hide resolved
@JoeZijunZhou JoeZijunZhou merged commit d681995 into AI-Hypercomputer:main Aug 7, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants