all: Use `gomaxprocs` library to set GOMAXPROCS #8175

marclop · 2022-05-20T08:44:29Z

Motivation/summary

Adds a new explicit dependency on the Uber gomaxprocs library to check
if any CFS quotas have been set, adjusting the Go runtime gomaxprocs
setting accordingly. This prevents the APM Server from "starving" the
CFS allowed bandwidth and get into a situation where the majority of
threads are sleeping (throttled by the CFS scheduler), reducing the
amount of work that can be done.

The initial macro-benchmarks I ran seemed to indicate that the change
has a positive effect on the APM Server's throughput:

$ benchstat automaxprocs.txt noautoprocs.txt
name                 old time/op              new time/op              delta
1000Transactions-64  99.5ms ±25%             119.2ms ±13%   +19.78%  (p=0.032 n=5+5)
AgentGo-64            657ms ±21%              1602ms ± 5%  +143.86%  (p=0.008 n=5+5)
AgentNodeJS-64        336ms ± 5%               875ms ±13%  +160.50%  (p=0.008 n=5+5)
AgentPython-64        1.14s ±21%               3.37s ± 4%  +196.62%  (p=0.008 n=5+5)
AgentRuby-64          540ms ±15%              1437ms ± 5%  +166.17%  (p=0.008 n=5+5)
OTLPTraces-64        82.9µs ± 3%              98.1µs ± 8%   +18.34%  (p=0.016 n=4+5)

name                 old error_responses/sec  new error_responses/sec  delta
1000Transactions-64    0.00                     0.00           ~     (all equal)
AgentGo-64             0.00                     0.00           ~     (all equal)
AgentNodeJS-64         0.00                     0.00           ~     (all equal)
AgentPython-64         0.00                     0.00           ~     (all equal)
AgentRuby-64           0.00                     0.00           ~     (all equal)
OTLPTraces-64          0.00                     0.00           ~     (all equal)

name                 old errors/sec           new errors/sec           delta
1000Transactions-64    0.00                     0.00           ~     (all equal)
AgentGo-64              162 ±18%                  66 ± 5%   -59.44%  (p=0.008 n=5+5)
AgentNodeJS-64          146 ± 6%                  56 ±12%   -61.51%  (p=0.008 n=5+5)
AgentPython-64         64.1 ±18%                21.4 ± 4%   -66.61%  (p=0.008 n=5+5)
AgentRuby-64            250 ±16%                  93 ± 5%   -62.66%  (p=0.008 n=5+5)
OTLPTraces-64          0.00                     0.00           ~     (all equal)

name                 old events/sec           new events/sec           delta
1000Transactions-64  4.80k ±128%               4.85k ±76%      ~     (p=0.841 n=5+5)
AgentGo-64            8.06k ±18%               3.27k ± 5%   -59.46%  (p=0.008 n=5+5)
AgentNodeJS-64        6.00k ± 6%               2.31k ±12%   -61.49%  (p=0.008 n=5+5)
AgentPython-64        6.26k ±18%               2.09k ± 4%   -66.65%  (p=0.008 n=5+5)
AgentRuby-64          7.00k ±16%               2.62k ± 4%   -62.64%  (p=0.008 n=5+5)
OTLPTraces-64         12.1k ± 3%               10.2k ± 7%   -15.31%  (p=0.016 n=4+5)

name                 old metrics/sec          new metrics/sec          delta
1000Transactions-64   0.69 ±301%               0.08 ±138%      ~     (p=0.714 n=5+4)
AgentGo-64              373 ±17%                 150 ± 5%   -59.85%  (p=0.008 n=5+5)
AgentNodeJS-64          360 ± 4%                 139 ±13%   -61.44%  (p=0.008 n=5+5)
AgentPython-64        1.89k ±18%               0.63k ± 4%   -66.73%  (p=0.008 n=5+5)
AgentRuby-64            516 ±16%                 194 ± 2%   -62.48%  (p=0.008 n=5+5)
OTLPTraces-64          0.00                    0.03 ±300%      ~     (p=0.889 n=5+4)

name                 old spans/sec            new spans/sec            delta
1000Transactions-64    0.00                     0.00           ~     (all equal)
AgentGo-64            5.32k ±18%               2.16k ± 5%   -59.44%  (p=0.008 n=5+5)
AgentNodeJS-64        3.16k ± 6%               1.22k ±12%   -61.51%  (p=0.008 n=5+5)
AgentPython-64        3.63k ±18%               1.21k ± 4%   -66.62%  (p=0.008 n=5+5)
AgentRuby-64          4.12k ±16%               1.54k ± 5%   -62.66%  (p=0.008 n=5+5)
OTLPTraces-64          0.00                     0.00           ~     (all equal)

name                 old txs/sec              new txs/sec              delta
1000Transactions-64  4.80k ±128%               4.85k ±76%      ~     (p=0.841 n=5+5)
AgentNodeJS-64        2.33k ± 6%               0.90k ±12%   -61.50%  (p=0.008 n=5+5)
AgentRuby-64          2.12k ±16%               0.80k ± 2%   -62.22%  (p=0.016 n=5+4)
OTLPTraces-64         12.1k ± 3%               10.2k ± 7%   -15.33%  (p=0.016 n=4+5)

name                 old alloc/op             new alloc/op             delta
1000Transactions-64  3.23MB ±36%              3.11MB ±24%      ~     (p=1.000 n=5+5)
AgentNodeJS-64       3.49MB ± 5%              6.01MB ± 8%   +72.34%  (p=0.008 n=5+5)
AgentRuby-64         4.69MB ±15%              9.59MB ± 0%  +104.68%  (p=0.016 n=5+4)
OTLPTraces-64        1.41kB ± 0%              1.38kB ± 1%    -2.08%  (p=0.016 n=4+5)

name                 old allocs/op            new allocs/op            delta
1000Transactions-64   6.58k ± 8%               6.40k ± 4%      ~     (p=0.548 n=5+5)
AgentNodeJS-64        6.69k ± 4%              10.45k ± 8%   +56.12%  (p=0.008 n=5+5)
AgentRuby-64          11.9k ±18%               23.8k ± 0%   +99.41%  (p=0.016 n=5+4)
OTLPTraces-64          9.00 ± 0%                8.00 ± 0%      ~     (p=0.079 n=4+5)

Gist with all the results: https://gist.github.com/marclop/2de5da81c5be7cd6dbab5080623c0ba1

Checklist

Update CHANGELOG.asciidoc
Update package changelog.yml (only if changes to apmpackage have been made)
Documentation has been updated

How to test these changes

docker-compose up -d
cd systemtest/cmd/runapm && go run main.go, copy the image that is used.
Inspect the created container and obtain the FLEET_ENROLLMENT_TOKEN from the runapm container (you can stop the runapm container afterwards).
Create a docker-compose.override.yml in the root of the APM Server with these contents:

services:
  apm-server:
    image: USE_BUILDAPM_RESULTING_IMAGE
    cpus: 2.7
    ports:
      - 8200:8200
    environment:
      - "FLEET_URL=https://fleet-server:8220"
      - "FLEET_CA=/etc/pki/tls/certs/fleet-ca.pem"
      - "FLEET_ENROLL=1"
      - "FLEET_ENROLLMENT_TOKEN=USE_RUNAPM_ENROLLMENT_TOKEN"
    healthcheck:
      test: ["CMD-SHELL", "curl -s http://localhost:8200 | grep -q 'true'"]
      retries: 300
      interval: 1s
    depends_on:
      elasticsearch: { condition: service_healthy }
      kibana: { condition: service_healthy }
      fleet-server: { condition: service_healthy }
    volumes:
      - "./testing/docker/fleet-server/certificate.pem:/etc/pki/tls/certs/fleet-ca.pem"

docker-compose up -d. Verify that the apm-server container is reachable.
cd systemtest/cmd/apmbench && go build .
./apmbench -benchtime=15s -count=5 -warmup-events=0 -agents=64

Related issues

Closes #7967

Adds a new explicit dependency on the Uber gomaxprocs library to check if any CFS quotas have been set, adjusting the Go runtime gomaxprocs setting accordingly. This prevents the APM Server from "starving" the CFS allowed bandwidth and get into a situation where the majority of threads are sleeping (throttled by the CFS scheduler), reducing the amount of work that can be done. The initial macro-benchmarks I ran seemed to indicate that the change has a positive effect on the APM Server's throughput: ``` $ benchstat automaxprocs.txt noautoprocs.txt name old time/op new time/op delta 1000Transactions-64 99.5ms ±25% 119.2ms ±13% +19.78% (p=0.032 n=5+5) AgentGo-64 657ms ±21% 1602ms ± 5% +143.86% (p=0.008 n=5+5) AgentNodeJS-64 336ms ± 5% 875ms ±13% +160.50% (p=0.008 n=5+5) AgentPython-64 1.14s ±21% 3.37s ± 4% +196.62% (p=0.008 n=5+5) AgentRuby-64 540ms ±15% 1437ms ± 5% +166.17% (p=0.008 n=5+5) OTLPTraces-64 82.9µs ± 3% 98.1µs ± 8% +18.34% (p=0.016 n=4+5) name old error_responses/sec new error_responses/sec delta 1000Transactions-64 0.00 0.00 ~ (all equal) AgentGo-64 0.00 0.00 ~ (all equal) AgentNodeJS-64 0.00 0.00 ~ (all equal) AgentPython-64 0.00 0.00 ~ (all equal) AgentRuby-64 0.00 0.00 ~ (all equal) OTLPTraces-64 0.00 0.00 ~ (all equal) name old errors/sec new errors/sec delta 1000Transactions-64 0.00 0.00 ~ (all equal) AgentGo-64 162 ±18% 66 ± 5% -59.44% (p=0.008 n=5+5) AgentNodeJS-64 146 ± 6% 56 ±12% -61.51% (p=0.008 n=5+5) AgentPython-64 64.1 ±18% 21.4 ± 4% -66.61% (p=0.008 n=5+5) AgentRuby-64 250 ±16% 93 ± 5% -62.66% (p=0.008 n=5+5) OTLPTraces-64 0.00 0.00 ~ (all equal) name old events/sec new events/sec delta 1000Transactions-64 4.80k ±128% 4.85k ±76% ~ (p=0.841 n=5+5) AgentGo-64 8.06k ±18% 3.27k ± 5% -59.46% (p=0.008 n=5+5) AgentNodeJS-64 6.00k ± 6% 2.31k ±12% -61.49% (p=0.008 n=5+5) AgentPython-64 6.26k ±18% 2.09k ± 4% -66.65% (p=0.008 n=5+5) AgentRuby-64 7.00k ±16% 2.62k ± 4% -62.64% (p=0.008 n=5+5) OTLPTraces-64 12.1k ± 3% 10.2k ± 7% -15.31% (p=0.016 n=4+5) name old metrics/sec new metrics/sec delta 1000Transactions-64 0.69 ±301% 0.08 ±138% ~ (p=0.714 n=5+4) AgentGo-64 373 ±17% 150 ± 5% -59.85% (p=0.008 n=5+5) AgentNodeJS-64 360 ± 4% 139 ±13% -61.44% (p=0.008 n=5+5) AgentPython-64 1.89k ±18% 0.63k ± 4% -66.73% (p=0.008 n=5+5) AgentRuby-64 516 ±16% 194 ± 2% -62.48% (p=0.008 n=5+5) OTLPTraces-64 0.00 0.03 ±300% ~ (p=0.889 n=5+4) name old spans/sec new spans/sec delta 1000Transactions-64 0.00 0.00 ~ (all equal) AgentGo-64 5.32k ±18% 2.16k ± 5% -59.44% (p=0.008 n=5+5) AgentNodeJS-64 3.16k ± 6% 1.22k ±12% -61.51% (p=0.008 n=5+5) AgentPython-64 3.63k ±18% 1.21k ± 4% -66.62% (p=0.008 n=5+5) AgentRuby-64 4.12k ±16% 1.54k ± 5% -62.66% (p=0.008 n=5+5) OTLPTraces-64 0.00 0.00 ~ (all equal) name old txs/sec new txs/sec delta 1000Transactions-64 4.80k ±128% 4.85k ±76% ~ (p=0.841 n=5+5) AgentNodeJS-64 2.33k ± 6% 0.90k ±12% -61.50% (p=0.008 n=5+5) AgentRuby-64 2.12k ±16% 0.80k ± 2% -62.22% (p=0.016 n=5+4) OTLPTraces-64 12.1k ± 3% 10.2k ± 7% -15.33% (p=0.016 n=4+5) name old alloc/op new alloc/op delta 1000Transactions-64 3.23MB ±36% 3.11MB ±24% ~ (p=1.000 n=5+5) AgentNodeJS-64 3.49MB ± 5% 6.01MB ± 8% +72.34% (p=0.008 n=5+5) AgentRuby-64 4.69MB ±15% 9.59MB ± 0% +104.68% (p=0.016 n=5+4) OTLPTraces-64 1.41kB ± 0% 1.38kB ± 1% -2.08% (p=0.016 n=4+5) name old allocs/op new allocs/op delta 1000Transactions-64 6.58k ± 8% 6.40k ± 4% ~ (p=0.548 n=5+5) AgentNodeJS-64 6.69k ± 4% 10.45k ± 8% +56.12% (p=0.008 n=5+5) AgentRuby-64 11.9k ±18% 23.8k ± 0% +99.41% (p=0.016 n=5+4) OTLPTraces-64 9.00 ± 0% 8.00 ± 0% ~ (p=0.079 n=4+5) ``` Signed-off-by: Marc Lopez Rubio <marc5.12@outlook.com>

mergify · 2022-05-20T08:59:46Z

This pull request does not have a backport label. Could you fix it @marclop? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-7.x is the label to automatically backport to the 7.x branch.
backport-7./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

apmmachine · 2022-05-20T09:14:34Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-05-24T12:16:32.781+0000
Duration: 27 min 17 sec

Test stats 🧪

Test	Results
Failed	0
Passed	3990
Skipped	13
Total	4003

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/hey-apm : Run the hey-apm benchmark.
/package : Generate and publish the docker images.
/test windows : Build & tests on Windows.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

apmmachine · 2022-05-20T09:14:43Z

🌐 Coverage report

Name	Metrics % (`covered/total`)	Diff
Packages	100.0% (`42/42`)	💚
Files	91.878% (`181/197`)	👍
Classes	93.392% (`424/454`)	👍
Methods	89.238% (`1078/1208`)	👍 0.083
Lines	76.826% (`13115/17071`)	👎 -0.02
Conditionals	100.0% (`0/0`)	💚

simitt · 2022-05-24T12:15:04Z

Given the issues that @marclop encountered with packaging local apm-servers, I suggest to mark this as ready for review and get it merged into main (8.3), and then define a test plan which deployment sizes to test on which cloud providers and instance sizes. The cloud benchnmarks should be run timely after FF to be able to react on any issues we might detect.
@axw @marclop any objections on this plan?

axw · 2022-05-24T12:59:40Z

@simitt sounds good.

axw

LGTM!

axw · 2022-06-13T03:25:29Z

Removing test-plan, will be covered by #8278

marclop added the enhancement label May 20, 2022

mergify bot added the backport-skip Skip notification from the automated backport with mergify label May 20, 2022

marclop marked this pull request as ready for review May 24, 2022 12:15

Merge branch 'main' into f/use-autogomaxprocs

c9d43af

axw approved these changes May 24, 2022

View reviewed changes

marclop merged commit 58a6c2c into elastic:main May 24, 2022

marclop deleted the f/use-autogomaxprocs branch May 24, 2022 13:27

simitt added test-plan v8.3.0 labels May 25, 2022

axw removed the test-plan label Jun 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

all: Use `gomaxprocs` library to set GOMAXPROCS #8175

all: Use `gomaxprocs` library to set GOMAXPROCS #8175

marclop commented May 20, 2022 •

edited

Loading

mergify bot commented May 20, 2022

apmmachine commented May 20, 2022 •

edited

Loading

Build stats

Test stats 🧪

apmmachine commented May 20, 2022 •

edited

Loading

simitt commented May 24, 2022

axw commented May 24, 2022

axw left a comment

axw commented Jun 13, 2022

all: Use gomaxprocs library to set GOMAXPROCS #8175

all: Use gomaxprocs library to set GOMAXPROCS #8175

Conversation

marclop commented May 20, 2022 • edited Loading

Motivation/summary

Checklist

How to test these changes

Related issues

mergify bot commented May 20, 2022

apmmachine commented May 20, 2022 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

🤖 GitHub comments

apmmachine commented May 20, 2022 • edited Loading

🌐 Coverage report

simitt commented May 24, 2022

axw commented May 24, 2022

axw left a comment

Choose a reason for hiding this comment

axw commented Jun 13, 2022

all: Use `gomaxprocs` library to set GOMAXPROCS #8175

all: Use `gomaxprocs` library to set GOMAXPROCS #8175

marclop commented May 20, 2022 •

edited

Loading

apmmachine commented May 20, 2022 •

edited

Loading

apmmachine commented May 20, 2022 •

edited

Loading