Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: Use gomaxprocs library to set GOMAXPROCS #8175

Merged
merged 2 commits into from
May 24, 2022

Conversation

marclop
Copy link
Contributor

@marclop marclop commented May 20, 2022

Motivation/summary

Adds a new explicit dependency on the Uber gomaxprocs library to check
if any CFS quotas have been set, adjusting the Go runtime gomaxprocs
setting accordingly. This prevents the APM Server from "starving" the
CFS allowed bandwidth and get into a situation where the majority of
threads are sleeping (throttled by the CFS scheduler), reducing the
amount of work that can be done.

The initial macro-benchmarks I ran seemed to indicate that the change
has a positive effect on the APM Server's throughput:

$ benchstat automaxprocs.txt noautoprocs.txt
name                 old time/op              new time/op              delta
1000Transactions-64  99.5ms ±25%             119.2ms ±13%   +19.78%  (p=0.032 n=5+5)
AgentGo-64            657ms ±21%              1602ms ± 5%  +143.86%  (p=0.008 n=5+5)
AgentNodeJS-64        336ms ± 5%               875ms ±13%  +160.50%  (p=0.008 n=5+5)
AgentPython-64        1.14s ±21%               3.37s ± 4%  +196.62%  (p=0.008 n=5+5)
AgentRuby-64          540ms ±15%              1437ms ± 5%  +166.17%  (p=0.008 n=5+5)
OTLPTraces-64        82.9µs ± 3%              98.1µs ± 8%   +18.34%  (p=0.016 n=4+5)

name                 old error_responses/sec  new error_responses/sec  delta
1000Transactions-64    0.00                     0.00           ~     (all equal)
AgentGo-64             0.00                     0.00           ~     (all equal)
AgentNodeJS-64         0.00                     0.00           ~     (all equal)
AgentPython-64         0.00                     0.00           ~     (all equal)
AgentRuby-64           0.00                     0.00           ~     (all equal)
OTLPTraces-64          0.00                     0.00           ~     (all equal)

name                 old errors/sec           new errors/sec           delta
1000Transactions-64    0.00                     0.00           ~     (all equal)
AgentGo-64              162 ±18%                  66 ± 5%   -59.44%  (p=0.008 n=5+5)
AgentNodeJS-64          146 ± 6%                  56 ±12%   -61.51%  (p=0.008 n=5+5)
AgentPython-64         64.1 ±18%                21.4 ± 4%   -66.61%  (p=0.008 n=5+5)
AgentRuby-64            250 ±16%                  93 ± 5%   -62.66%  (p=0.008 n=5+5)
OTLPTraces-64          0.00                     0.00           ~     (all equal)

name                 old events/sec           new events/sec           delta
1000Transactions-64  4.80k ±128%               4.85k ±76%      ~     (p=0.841 n=5+5)
AgentGo-64            8.06k ±18%               3.27k ± 5%   -59.46%  (p=0.008 n=5+5)
AgentNodeJS-64        6.00k ± 6%               2.31k ±12%   -61.49%  (p=0.008 n=5+5)
AgentPython-64        6.26k ±18%               2.09k ± 4%   -66.65%  (p=0.008 n=5+5)
AgentRuby-64          7.00k ±16%               2.62k ± 4%   -62.64%  (p=0.008 n=5+5)
OTLPTraces-64         12.1k ± 3%               10.2k ± 7%   -15.31%  (p=0.016 n=4+5)

name                 old metrics/sec          new metrics/sec          delta
1000Transactions-64   0.69 ±301%               0.08 ±138%      ~     (p=0.714 n=5+4)
AgentGo-64              373 ±17%                 150 ± 5%   -59.85%  (p=0.008 n=5+5)
AgentNodeJS-64          360 ± 4%                 139 ±13%   -61.44%  (p=0.008 n=5+5)
AgentPython-64        1.89k ±18%               0.63k ± 4%   -66.73%  (p=0.008 n=5+5)
AgentRuby-64            516 ±16%                 194 ± 2%   -62.48%  (p=0.008 n=5+5)
OTLPTraces-64          0.00                    0.03 ±300%      ~     (p=0.889 n=5+4)

name                 old spans/sec            new spans/sec            delta
1000Transactions-64    0.00                     0.00           ~     (all equal)
AgentGo-64            5.32k ±18%               2.16k ± 5%   -59.44%  (p=0.008 n=5+5)
AgentNodeJS-64        3.16k ± 6%               1.22k ±12%   -61.51%  (p=0.008 n=5+5)
AgentPython-64        3.63k ±18%               1.21k ± 4%   -66.62%  (p=0.008 n=5+5)
AgentRuby-64          4.12k ±16%               1.54k ± 5%   -62.66%  (p=0.008 n=5+5)
OTLPTraces-64          0.00                     0.00           ~     (all equal)

name                 old txs/sec              new txs/sec              delta
1000Transactions-64  4.80k ±128%               4.85k ±76%      ~     (p=0.841 n=5+5)
AgentNodeJS-64        2.33k ± 6%               0.90k ±12%   -61.50%  (p=0.008 n=5+5)
AgentRuby-64          2.12k ±16%               0.80k ± 2%   -62.22%  (p=0.016 n=5+4)
OTLPTraces-64         12.1k ± 3%               10.2k ± 7%   -15.33%  (p=0.016 n=4+5)

name                 old alloc/op             new alloc/op             delta
1000Transactions-64  3.23MB ±36%              3.11MB ±24%      ~     (p=1.000 n=5+5)
AgentNodeJS-64       3.49MB ± 5%              6.01MB ± 8%   +72.34%  (p=0.008 n=5+5)
AgentRuby-64         4.69MB ±15%              9.59MB ± 0%  +104.68%  (p=0.016 n=5+4)
OTLPTraces-64        1.41kB ± 0%              1.38kB ± 1%    -2.08%  (p=0.016 n=4+5)

name                 old allocs/op            new allocs/op            delta
1000Transactions-64   6.58k ± 8%               6.40k ± 4%      ~     (p=0.548 n=5+5)
AgentNodeJS-64        6.69k ± 4%              10.45k ± 8%   +56.12%  (p=0.008 n=5+5)
AgentRuby-64          11.9k ±18%               23.8k ± 0%   +99.41%  (p=0.016 n=5+4)
OTLPTraces-64          9.00 ± 0%                8.00 ± 0%      ~     (p=0.079 n=4+5)

Gist with all the results: https://gist.github.com/marclop/2de5da81c5be7cd6dbab5080623c0ba1

Checklist

How to test these changes

  1. docker-compose up -d
  2. cd systemtest/cmd/runapm && go run main.go, copy the image that is used.
  3. Inspect the created container and obtain the FLEET_ENROLLMENT_TOKEN from the runapm container (you can stop the runapm container afterwards).
  4. Create a docker-compose.override.yml in the root of the APM Server with these contents:
services:
  apm-server:
    image: USE_BUILDAPM_RESULTING_IMAGE
    cpus: 2.7
    ports:
      - 8200:8200
    environment:
      - "FLEET_URL=https://fleet-server:8220"
      - "FLEET_CA=/etc/pki/tls/certs/fleet-ca.pem"
      - "FLEET_ENROLL=1"
      - "FLEET_ENROLLMENT_TOKEN=USE_RUNAPM_ENROLLMENT_TOKEN"
    healthcheck:
      test: ["CMD-SHELL", "curl -s http://localhost:8200 | grep -q 'true'"]
      retries: 300
      interval: 1s
    depends_on:
      elasticsearch: { condition: service_healthy }
      kibana: { condition: service_healthy }
      fleet-server: { condition: service_healthy }
    volumes:
      - "./testing/docker/fleet-server/certificate.pem:/etc/pki/tls/certs/fleet-ca.pem"
  1. docker-compose up -d. Verify that the apm-server container is reachable.
  2. cd systemtest/cmd/apmbench && go build .
  3. ./apmbench -benchtime=15s -count=5 -warmup-events=0 -agents=64

Related issues

Closes #7967

Adds a new explicit dependency on the Uber gomaxprocs library to check
if any CFS quotas have been set, adjusting the Go runtime gomaxprocs
setting accordingly. This prevents the APM Server from "starving" the
CFS allowed bandwidth and get into a situation where the majority of
threads are sleeping (throttled by the CFS scheduler), reducing the
amount of work that can be done.

The initial macro-benchmarks I ran seemed to indicate that the change
has a positive effect on the APM Server's throughput:

```
$ benchstat automaxprocs.txt noautoprocs.txt
name                 old time/op              new time/op              delta
1000Transactions-64  99.5ms ±25%             119.2ms ±13%   +19.78%  (p=0.032 n=5+5)
AgentGo-64            657ms ±21%              1602ms ± 5%  +143.86%  (p=0.008 n=5+5)
AgentNodeJS-64        336ms ± 5%               875ms ±13%  +160.50%  (p=0.008 n=5+5)
AgentPython-64        1.14s ±21%               3.37s ± 4%  +196.62%  (p=0.008 n=5+5)
AgentRuby-64          540ms ±15%              1437ms ± 5%  +166.17%  (p=0.008 n=5+5)
OTLPTraces-64        82.9µs ± 3%              98.1µs ± 8%   +18.34%  (p=0.016 n=4+5)

name                 old error_responses/sec  new error_responses/sec  delta
1000Transactions-64    0.00                     0.00           ~     (all equal)
AgentGo-64             0.00                     0.00           ~     (all equal)
AgentNodeJS-64         0.00                     0.00           ~     (all equal)
AgentPython-64         0.00                     0.00           ~     (all equal)
AgentRuby-64           0.00                     0.00           ~     (all equal)
OTLPTraces-64          0.00                     0.00           ~     (all equal)

name                 old errors/sec           new errors/sec           delta
1000Transactions-64    0.00                     0.00           ~     (all equal)
AgentGo-64              162 ±18%                  66 ± 5%   -59.44%  (p=0.008 n=5+5)
AgentNodeJS-64          146 ± 6%                  56 ±12%   -61.51%  (p=0.008 n=5+5)
AgentPython-64         64.1 ±18%                21.4 ± 4%   -66.61%  (p=0.008 n=5+5)
AgentRuby-64            250 ±16%                  93 ± 5%   -62.66%  (p=0.008 n=5+5)
OTLPTraces-64          0.00                     0.00           ~     (all equal)

name                 old events/sec           new events/sec           delta
1000Transactions-64  4.80k ±128%               4.85k ±76%      ~     (p=0.841 n=5+5)
AgentGo-64            8.06k ±18%               3.27k ± 5%   -59.46%  (p=0.008 n=5+5)
AgentNodeJS-64        6.00k ± 6%               2.31k ±12%   -61.49%  (p=0.008 n=5+5)
AgentPython-64        6.26k ±18%               2.09k ± 4%   -66.65%  (p=0.008 n=5+5)
AgentRuby-64          7.00k ±16%               2.62k ± 4%   -62.64%  (p=0.008 n=5+5)
OTLPTraces-64         12.1k ± 3%               10.2k ± 7%   -15.31%  (p=0.016 n=4+5)

name                 old metrics/sec          new metrics/sec          delta
1000Transactions-64   0.69 ±301%               0.08 ±138%      ~     (p=0.714 n=5+4)
AgentGo-64              373 ±17%                 150 ± 5%   -59.85%  (p=0.008 n=5+5)
AgentNodeJS-64          360 ± 4%                 139 ±13%   -61.44%  (p=0.008 n=5+5)
AgentPython-64        1.89k ±18%               0.63k ± 4%   -66.73%  (p=0.008 n=5+5)
AgentRuby-64            516 ±16%                 194 ± 2%   -62.48%  (p=0.008 n=5+5)
OTLPTraces-64          0.00                    0.03 ±300%      ~     (p=0.889 n=5+4)

name                 old spans/sec            new spans/sec            delta
1000Transactions-64    0.00                     0.00           ~     (all equal)
AgentGo-64            5.32k ±18%               2.16k ± 5%   -59.44%  (p=0.008 n=5+5)
AgentNodeJS-64        3.16k ± 6%               1.22k ±12%   -61.51%  (p=0.008 n=5+5)
AgentPython-64        3.63k ±18%               1.21k ± 4%   -66.62%  (p=0.008 n=5+5)
AgentRuby-64          4.12k ±16%               1.54k ± 5%   -62.66%  (p=0.008 n=5+5)
OTLPTraces-64          0.00                     0.00           ~     (all equal)

name                 old txs/sec              new txs/sec              delta
1000Transactions-64  4.80k ±128%               4.85k ±76%      ~     (p=0.841 n=5+5)
AgentNodeJS-64        2.33k ± 6%               0.90k ±12%   -61.50%  (p=0.008 n=5+5)
AgentRuby-64          2.12k ±16%               0.80k ± 2%   -62.22%  (p=0.016 n=5+4)
OTLPTraces-64         12.1k ± 3%               10.2k ± 7%   -15.33%  (p=0.016 n=4+5)

name                 old alloc/op             new alloc/op             delta
1000Transactions-64  3.23MB ±36%              3.11MB ±24%      ~     (p=1.000 n=5+5)
AgentNodeJS-64       3.49MB ± 5%              6.01MB ± 8%   +72.34%  (p=0.008 n=5+5)
AgentRuby-64         4.69MB ±15%              9.59MB ± 0%  +104.68%  (p=0.016 n=5+4)
OTLPTraces-64        1.41kB ± 0%              1.38kB ± 1%    -2.08%  (p=0.016 n=4+5)

name                 old allocs/op            new allocs/op            delta
1000Transactions-64   6.58k ± 8%               6.40k ± 4%      ~     (p=0.548 n=5+5)
AgentNodeJS-64        6.69k ± 4%              10.45k ± 8%   +56.12%  (p=0.008 n=5+5)
AgentRuby-64          11.9k ±18%               23.8k ± 0%   +99.41%  (p=0.016 n=5+4)
OTLPTraces-64          9.00 ± 0%                8.00 ± 0%      ~     (p=0.079 n=4+5)
```

Signed-off-by: Marc Lopez Rubio <marc5.12@outlook.com>
@mergify
Copy link
Contributor

mergify bot commented May 20, 2022

This pull request does not have a backport label. Could you fix it @marclop? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.x is the label to automatically backport to the 7.x branch.
  • backport-7./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label May 20, 2022
@apmmachine
Copy link
Contributor

apmmachine commented May 20, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-05-24T12:16:32.781+0000

  • Duration: 27 min 17 sec

Test stats 🧪

Test Results
Failed 0
Passed 3990
Skipped 13
Total 4003

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /hey-apm : Run the hey-apm benchmark.

  • /package : Generate and publish the docker images.

  • /test windows : Build & tests on Windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@apmmachine
Copy link
Contributor

apmmachine commented May 20, 2022

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (42/42) 💚
Files 91.878% (181/197) 👍
Classes 93.392% (424/454) 👍
Methods 89.238% (1078/1208) 👍 0.083
Lines 76.826% (13115/17071) 👎 -0.02
Conditionals 100.0% (0/0) 💚

@simitt
Copy link
Contributor

simitt commented May 24, 2022

Given the issues that @marclop encountered with packaging local apm-servers, I suggest to mark this as ready for review and get it merged into main (8.3), and then define a test plan which deployment sizes to test on which cloud providers and instance sizes. The cloud benchnmarks should be run timely after FF to be able to react on any issues we might detect.
@axw @marclop any objections on this plan?

@marclop marclop marked this pull request as ready for review May 24, 2022 12:15
@axw
Copy link
Member

axw commented May 24, 2022

@simitt sounds good.

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@marclop marclop merged commit 58a6c2c into elastic:main May 24, 2022
@marclop marclop deleted the f/use-autogomaxprocs branch May 24, 2022 13:27
@axw
Copy link
Member

axw commented Jun 13, 2022

Removing test-plan, will be covered by #8278

@axw axw removed the test-plan label Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify enhancement v8.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate auto-setting GOMAXPROCS
4 participants