Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Optimize /metrics, fix unbounded request queue issue and fix race conditions in metricsBufferResponse mode #22494

Merged
merged 17 commits into from
Apr 13, 2024

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Apr 12, 2024

Fixes #22477

Motivation

There are multiple problems in the /metrics endpoint:

  • requests are handled one-by-one and added into a queue
    • when requests have timed out, they get processed regardless. time outs are detected in the later phases of the processing. Instead, timed out requests should be short-circuited in the beginning of the processing.
  • processing is single threaded and therefore the throughput is low
  • the metricsBufferResponse mode added in Improve /metrics endpoint performance  #14453 improves things, but it contains some race conditions where a buffer can get released when it is in use.

Modifications

  • fix the request timeout issue
  • enable multithreading. currently fixed to 4 threads since higher concurrency requires more memory and leads to OOM issues.
    • by default, combine concurrent requests to reuse the same result. this improves throughput even when metricsBufferResponse mode isn't enabled
  • remove the previous metricsBufferResponse solution since the TimeWindow and WindowWrap classes arean't needed at all. The concurrent request combining solution can also be used for metricsBufferResponse solution.
  • optimize some details of stats generation which were allocating a lot of objects.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@lhotari lhotari added this to the 3.3.0 milestone Apr 12, 2024
@lhotari lhotari self-assigned this Apr 12, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Apr 12, 2024
@lhotari
Copy link
Member Author

lhotari commented Apr 12, 2024

Performance test results with the changes.

Reproducing the results:

gh pr checkout 22494
mvn -Pcore-modules,-main -T 1C clean install -DskipTests -Dspotbugs.skip=true
rm -rf data
PULSAR_MEM="-Xms2g -Xmx4g -XX:MaxDirectMemorySize=6g" PULSAR_GC="-XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+AlwaysPreTouch" PULSAR_EXTRA_OPTS="-Djute.maxbuffer=20000000" PULSAR_STANDALONE_USE_ZOOKEEPER=1 bin/pulsar standalone -nss -nfw 2>&1 | tee standalone.log
git clone https://github.com/lhotari/pulsar-playground
cd pulsar-playground
./gradlew shadowJar
# create the topics
java -cp build/libs/pulsar-playground-all.jar com.github.lhotari.pulsar.playground.TestScenarioCreateLongNamedTopics
git clone https://github.com/lhotari/pulsar-playground
cd pulsar-playground/experiments/metrics-performance

❯ k6 run load_test.js

          /\      |‾‾| /‾‾/   /‾‾/
     /\  /  \     |  |/  /   /  /
    /  \/    \    |     (   /   ‾‾\
   /          \   |  |\  \ |  (‾)  |
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: load_test.js
        output: -

     scenarios: (100.00%) 1 scenario, 100 max VUs, 10m30s max duration (incl. graceful stop):
              * default: 10000 iterations shared among 100 VUs (maxDuration: 10m0s, gracefulStop: 30s)


     data_received..................: 3.8 TB 8.1 GB/s
     data_sent......................: 880 kB 1.9 kB/s
     http_req_blocked...............: avg=35.04µs  min=1µs      med=3µs     max=3.92ms   p(90)=6µs      p(95)=8µs
     http_req_connecting............: avg=25.17µs  min=0s       med=0s      max=3.02ms   p(90)=0s       p(95)=0s
     http_req_duration..............: avg=4.7s     min=457.09ms med=4.68s   max=9.71s    p(90)=8.15s    p(95)=8.58s
       { expected_response:true }...: avg=4.7s     min=457.09ms med=4.68s   max=9.71s    p(90)=8.15s    p(95)=8.58s
     http_req_failed................: 0.00%  ✓ 0         ✗ 10000
     http_req_receiving.............: avg=185.35ms min=86.9ms   med=183.9ms max=469.76ms p(90)=214.08ms p(95)=224.08ms
     http_req_sending...............: avg=58.44µs  min=3µs      med=15µs    max=41.26ms  p(90)=28µs     p(95)=38µs
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s      max=0s       p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=4.51s    min=302.02ms med=4.5s    max=9.56s    p(90)=7.96s    p(95)=8.4s
     http_reqs......................: 10000  21.179615/s
     iteration_duration.............: avg=4.7s     min=457.16ms med=4.69s   max=9.71s    p(90)=8.15s    p(95)=8.58s
     iterations.....................: 10000  21.179615/s
     vus............................: 5      min=5       max=100
     vus_max........................: 100    min=100     max=100


running (07m52.2s), 000/100 VUs, 10000 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs  07m52.2s/10m0s  10000/10000 shared iters

@lhotari
Copy link
Member Author

lhotari commented Apr 12, 2024

One interesting detail is that in the load test, the system couldn't keep up with the load without making the optimization to direct buffer allocation in commit eb53342.

@lhotari
Copy link
Member Author

lhotari commented Apr 12, 2024

Most recent test run without turning on the caching:

❯ k6 run load_test.js

          /\      |‾‾| /‾‾/   /‾‾/
     /\  /  \     |  |/  /   /  /
    /  \/    \    |     (   /   ‾‾\
   /          \   |  |\  \ |  (‾)  |
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: load_test.js
        output: -

     scenarios: (100.00%) 1 scenario, 100 max VUs, 10m30s max duration (incl. graceful stop):
              * default: 10000 iterations shared among 100 VUs (maxDuration: 10m0s, gracefulStop: 30s)


     data_received..................: 3.8 TB 8.5 GB/s
     data_sent......................: 880 kB 2.0 kB/s
     http_req_blocked...............: avg=42.58µs  min=1µs      med=3µs      max=4.93ms   p(90)=5µs      p(95)=7µs
     http_req_connecting............: avg=33.7µs   min=0s       med=0s       max=4ms      p(90)=0s       p(95)=0s
     http_req_duration..............: avg=4.46s    min=298.14ms med=4.44s    max=8.7s     p(90)=7.8s     p(95)=8.2s
       { expected_response:true }...: avg=4.46s    min=298.14ms med=4.44s    max=8.7s     p(90)=7.8s     p(95)=8.2s
     http_req_failed................: 0.00%  ✓ 0         ✗ 10000
     http_req_receiving.............: avg=178.21ms min=75.75ms  med=176.97ms max=359.87ms p(90)=204.48ms p(95)=213.44ms
     http_req_sending...............: avg=37.42µs  min=4µs      med=13µs     max=25.6ms   p(90)=25µs     p(95)=33µs
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s       max=0s       p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=4.29s    min=143.93ms med=4.26s    max=8.56s    p(90)=7.62s    p(95)=8.03s
     http_reqs......................: 10000  22.281335/s
     iteration_duration.............: avg=4.46s    min=298.19ms med=4.44s    max=8.7s     p(90)=7.8s     p(95)=8.2s
     iterations.....................: 10000  22.281335/s
     vus............................: 22     min=22      max=100
     vus_max........................: 100    min=100     max=100


running (07m28.8s), 000/100 VUs, 10000 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs  07m28.8s/10m0s  10000/10000 shared iters

@lhotari
Copy link
Member Author

lhotari commented Apr 12, 2024

I did allocation profiling with Async Profiler. Used similar commands as in #22494 (comment) , but with async profiler activated with OPTS.

OPTS="-agentpath:$HOME/tools/async-profiler/lib/libasyncProfiler.so=start,event=cpu,alloc=2m,lock=10ms,file=$PWD/profile.jfr" PULSAR_MEM="-Xms2g -Xmx4g -XX:MaxDirectMemorySize=6g" PULSAR_GC="-XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+AlwaysPreTouch" PULSAR_EXTRA_OPTS="-Djute.maxbuffer=20000000" PULSAR_STANDALONE_USE_ZOOKEEPER=1 bin/pulsar standalone -nss -nfw 2>&1 | tee standalone.log

With OPTS="-agentpath:$HOME/tools/async-profiler/lib/libasyncProfiler.so=start,event=cpu,alloc=2m,lock=10ms,file=$PWD/profile.jfr", it's possible to do CPU, allocation and lock profiling all at once. I run async-profiler on a Linux box to get best accuracy.

After profiling, I then use this shell script function to generate multiple flamegraph htmls out of the JFR file:
https://github.com/lhotari/pulsar-contributor-toolbox/blob/c150c3d9afc23d4865c2e3283c087e1c1261b4ee/functions/pulsar-contributor-toolbox-functions.sh#L1438-L1458

@lhotari lhotari requested a review from asafm April 13, 2024 10:36
@dao-jun
Copy link
Member

dao-jun commented Apr 13, 2024

One interesting detail is that in the load test, the system couldn't keep up with the load without making the optimization to direct buffer allocation in commit eb53342.

I guess it is related to CompositeByteBuf#consolidateIfNeeded, once the number of components exceeds MAX_COMPONENT, it will combine all the components to one component, memory copy happens.

@lhotari
Copy link
Member Author

lhotari commented Apr 13, 2024

One interesting detail is that in the load test, the system couldn't keep up with the load without making the optimization to direct buffer allocation in commit eb53342.

I guess it is related to CompositeByteBuf#consolidateIfNeeded, once the number of components exceeds MAX_COMPONENT, it will combine all the components to one component, memory copy happens.

Memory copying isn't the biggest problem. A bigger problem is direct memory OOM that seems to happen when the memory space is so fragmented that there isn't free space for the allocation requests. The Netty pool has a maximum chunk size (8MB default) and all larger allocations are huge allocations that aren't pooled. That's why I think the added logic helps since it pre-allocates up to 8MB chunks.
Memory copying will happen, but that seems to be fine.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work

@merlimat merlimat merged commit 7009071 into apache:master Apr 13, 2024
49 of 50 checks passed
lhotari added a commit to lhotari/pulsar that referenced this pull request Apr 15, 2024
…d fix race conditions in metricsBufferResponse mode (apache#22494)

(cherry picked from commit 7009071)
lhotari added a commit to lhotari/pulsar that referenced this pull request Apr 15, 2024
…d fix race conditions in metricsBufferResponse mode (apache#22494)

(cherry picked from commit 7009071)
lhotari added a commit that referenced this pull request Apr 15, 2024
…d fix race conditions in metricsBufferResponse mode (#22494)

(cherry picked from commit 7009071)

# Conflicts:
#	pulsar-broker/src/test/java/org/apache/pulsar/broker/service/persistent/BucketDelayedDeliveryTest.java
#	pulsar-broker/src/test/java/org/apache/pulsar/broker/service/persistent/PersistentTopicTest.java
#	pulsar-broker/src/test/java/org/apache/pulsar/broker/service/schema/SchemaServiceTest.java
#	pulsar-broker/src/test/java/org/apache/pulsar/broker/stats/PrometheusMetricsTest.java
#	pulsar-broker/src/test/java/org/apache/pulsar/broker/transaction/buffer/TransactionBufferClientTest.java
#	pulsar-broker/src/test/java/org/apache/pulsar/broker/transaction/pendingack/PendingAckPersistentTest.java
#	pulsar-broker/src/test/java/org/apache/pulsar/broker/web/WebServiceTest.java
lhotari added a commit to lhotari/pulsar that referenced this pull request Apr 15, 2024
…d fix race conditions in metricsBufferResponse mode (apache#22494)

(cherry picked from commit 7009071)
@lhotari
Copy link
Member Author

lhotari commented Apr 15, 2024

With 'Accept-Encoding: gzip header in a k6 test, the performance isn't great, however the system is able to stay available and about 25% of the requests timed out under load.
It would be able to optimize the solution further to cache the Gzipped response, but that seems premature optimization.

mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 16, 2024
…d fix race conditions in metricsBufferResponse mode (apache#22494)

(cherry picked from commit 7009071)
(cherry picked from commit 5f9d7c5)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 17, 2024
…d fix race conditions in metricsBufferResponse mode (apache#22494)

(cherry picked from commit 7009071)
(cherry picked from commit 5f9d7c5)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 17, 2024
…d fix race conditions in metricsBufferResponse mode (apache#22494)

(cherry picked from commit 7009071)
(cherry picked from commit 5f9d7c5)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 19, 2024
…d fix race conditions in metricsBufferResponse mode (apache#22494)

(cherry picked from commit 7009071)
(cherry picked from commit 5f9d7c5)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 23, 2024
…d fix race conditions in metricsBufferResponse mode (apache#22494)

(cherry picked from commit 7009071)
(cherry picked from commit 5f9d7c5)
+ "the connection due to a timeout ({} ms elapsed): {}", time, e + "");
} else {
log.error("Failed to generate prometheus stats, {} ms elapsed", time, e);
// set hard timeout to 2 * timeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this makes me happy we're moving to a consolidated maintained exporters in OTel :)

pgier pushed a commit to pgier/pulsar that referenced this pull request Aug 23, 2024
…d fix race conditions in metricsBufferResponse mode (apache#22494)

(cherry picked from commit 7009071)
(cherry picked from commit 5f9d7c5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants