Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java_tools 10.3 regressed bazel performance, before java_tools javac11 v10.4 improved it. #12574

Closed
joeleba opened this issue Nov 26, 2020 · 10 comments
Assignees
Labels
area-java-toolchains javabase, java_toolchain flags, JDK selection, java_toolchain rules, java_tools repository P1 I'll work on this now. (Assignee required) team-Rules-Java Issues for Java rules

Comments

@joeleba
Copy link
Member

joeleba commented Nov 26, 2020

Description of the problem / feature request:

Our bazel-bench pipeline runs nightly benchmarking of a sample of the bazel commits throughout the day, by using the bazel binaries built at those commits to build the Bazel project itself (bazel build //src:bazel). We've identified the following:

  1. 5c92b11 regressed wall time by ~48%.
Bazel commit: 5c92b11dc5e1decf1b2a30b72d9bdaaf6e3e4380, Project commit: c282526c071389cd6f88cb77565283b257316267, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval   
    wall:  116.968s ( -0.93%)   117.305s ( -0.48%)     2.287s    0.00000 
     cpu:   79.557s ( +0.29%)    80.480s ( +0.51%)     1.464s    0.40000 
  system:   19.167s ( -1.83%)    19.240s ( -2.58%)     0.148s    0.40000 
  memory:   94.667s ( +0.00%)    95.000s ( +0.00%)     0.471s    0.00000 

Bazel commit: c282526c071389cd6f88cb77565283b257316267, Project commit: c282526c071389cd6f88cb77565283b257316267, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval   
    wall:  173.391s (+48.24%)   173.878s (+48.23%)     1.361s    0.90000 
     cpu:   81.277s ( +2.16%)    81.930s ( +1.80%)     2.404s    0.40000 
  system:   18.187s ( -5.11%)    18.450s ( -4.11%)     0.794s    0.40000 
  memory:   94.000s ( -0.70%)    94.000s ( -1.05%)     0.000s    0.40000 

image

  1. cd3480e improved wall time by ~30%.
Bazel commit: 1517286783762647b4922ed1c656262002abedf1, Project commit: 1517286783762647b4922ed1c656262002abedf1, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval   
    wall:  184.560s             185.592s              1.563s             
     cpu:   82.817s              84.350s              2.996s             
  system:   19.877s              19.230s              1.111s             
  memory:   95.000MB             95.000MB             0.000MB            

Bazel commit: cd3480e4d53160d20c634134d3917c4e53ac1c49, Project commit: 1517286783762647b4922ed1c656262002abedf1, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval   
    wall:  128.454s  (-30.40%)  129.437s  (-30.26%)   1.443s     0.90000 
     cpu:   86.013s  ( +3.86%)   87.650s  ( +3.91%)   2.625s     0.40000 
  system:   20.917s  ( +5.23%)   20.820s  ( +8.27%)   0.227s     0.40000 
  memory:   94.000MB ( -1.05%)   94.000MB ( -1.05%)   0.000MB    0.90000 

image

Feature requests: what underlying problem are you trying to solve with this feature?

Maybe we can use this to figure out what in java_tools regressed the performance of Bazel and what corrected it (partially), and whether there's something more we can do.

What operating system are you running Bazel on?

Ubuntu18.04

Any other information, logs, or outputs that you want to share?

To reproduce, clone bazelbuild/bazel-bench and run:

# 1st benchmark.
bazel run benchmark --  \
--bazel_source=https://github.com/bazelbuild/bazel.git \
--project_source=https://github.com/bazelbuild/bazel.git \
--collect_memory \
--runs=3 \
--bazel_commits=5c92b11dc5e1decf1b2a30b72d9bdaaf6e3e4380,c282526c071389cd6f88cb77565283b257316267 \
--project_commits=c282526c071389cd6f88cb77565283b257316267 \
-- build //src:bazel

# 2nd benchmark
bazel run benchmark --  \
--bazel_source=https://github.com/bazelbuild/bazel.git \
--project_source=https://github.com/bazelbuild/bazel.git \
--collect_memory \
--data_directory=/tmp/.bazel-bench/out \
--runs=3 \
--bazel_commits=1517286783762647b4922ed1c656262002abedf1,cd3480e4d53160d20c634134d3917c4e53ac1c49 \
--project_commits=1517286783762647b4922ed1c656262002abedf1 \
-- build //src:bazel
@joeleba joeleba added P1 I'll work on this now. (Assignee required) team-Rules-Java Issues for Java rules area-java-toolchains javabase, java_toolchain flags, JDK selection, java_toolchain rules, java_tools repository labels Nov 26, 2020
@comius
Copy link
Contributor

comius commented Nov 26, 2020

Do I understand correctly that wall time is completely restored?
If I get mathematics right: wall * 1.4823 * (1 - 0.3026) = wall * 1.034. So after v10.4 wall is 3.4% bigger?
Looking at absolute deltas in wall 173 - 117 = 184 - 128 = 56 they are exactly the same.
Set priority back to P1 if you disagree.

I have PR waiting to merge in java_tools v11.0: #12552

Changelog for v10.3:

$ git log --pretty=format:"%h %an %s" 3f460b4...c282526 -- src/java_tools/ third_party/ijar/ tools/jdk/BUILD.java_tools
afacc9cb68 Ivo List Use selects based on constraint_values in java_tools.
65e818e65b Ivo List Resolve labels in java_toolchain_default relative to java_tools repo
7cc85772b4 Ivo List Reenable jacocoagent test.
5c8e584e88 steinman Move WorkRequestHandler from build jar to Worker package. This is part of the cleanup associated with enabling JSON workers.
febf4438cd cushon Restore GenClass's --temp_dir flag once more
e16f4d7206 cushon Remove obsolete flags
cca33299a5 eaftan Automated rollback of commit 366de6958ff449b273079be551c443ace2d8e7fa.
41b41947cb cushon Delete javac-turbine

Changelog for v10.4:

$ git log --pretty=format:"%h %an %s" c282526...cd3480e4 -- src/java_tools/ third_party/ijar/ tools/jdk/BUILD.java_tools
3fb0f211de ilist Replace "javac" attribute with "tools" attribute in java_toolchain.
c9cdec2471 Ivo List Remove jarjar from java_tools.
7d52cc518d cushon Remove unused GenClass --temp_dir flag
e68b1e3dd7 Ivo List Add predefined configurations to java_toolchain_default
2bf37381e0 Benjamin Peterson ijar: Avoid running past the end of a static buffer.
a4251eab69 steinman Refactor WorkRequestHandler to be an interface, of which Proto is one implementation.

Potential culprits look to me:
65e818e, 5c8e584, e68b1e3, a4251ea

cc @cushon, @susinmotion

@comius comius added P2 We'll consider working on this in future. (Assignee optional) and removed P1 I'll work on this now. (Assignee required) labels Nov 26, 2020
@meisterT
Copy link
Member

It would be great if the commits of the java_tools releases contain these changelogs, so it's easier to understand what went into them.

I think Ivo is correct, that the regression is completely fixed by the 10.4 release, because there's another regression in the 8fce67f..b1d1485 range that adds up:
image

Ivo: it would be still good to know what caused the regression (and what fixed it) before we pull the trigger on the 4.0 release.

@joeleba
Copy link
Member Author

joeleba commented Nov 27, 2020

Let me know if you need help performing the benchmarks.

@comius comius added P1 I'll work on this now. (Assignee required) and removed P2 We'll consider working on this in future. (Assignee optional) labels Nov 27, 2020
@comius
Copy link
Contributor

comius commented Nov 27, 2020

Let me know if you need help performing the benchmarks.

Can you also run benchmark on #12552? It has release 11.0. I don't have a workstation only cloudtop.

@joeleba
Copy link
Member Author

joeleba commented Nov 27, 2020

Let me know if you need help performing the benchmarks.

Can you also run benchmark on #12552? It has release 11.0. I don't have a workstation only cloudtop.

Sure, gimme a moment.

Edit: here's the result on a Linux machine, 12 cores, runs=7. It looks safe.

RESULTS:
Bazel commit: /tmp/bazel-head, Project commit: c85ecbed46d4703c7d6bf6b52dab98972dbe9ea1, Project source: None
  metric          mean                median          stddev      pval   
    wall:  185.647s              185.445s                2.224s             
     cpu:   55.616s               54.410s                1.792s             
  system:   18.043s               18.060s                0.330s             
  memory:   94.000MB              94.000MB               0.000MB            

Bazel commit: /tmp/comius-bazel, Project commit: c85ecbed46d4703c7d6bf6b52dab98972dbe9ea1, Project source: None
  metric          mean                median          stddev      pval   
    wall:  184.544s  ( -0.59%)   184.456s  ( -0.53%)     1.626s     0.03730 
     cpu:   54.484s  ( -2.03%)    54.990s  ( +1.07%)     1.940s     0.42483 
  system:   17.364s  ( -3.76%)    17.260s  ( -4.43%)     0.392s     0.94697 
  memory:   94.000MB ( +0.00%)    94.000MB ( +0.00%)     0.000MB    0.00000 

@comius
Copy link
Contributor

comius commented Nov 27, 2020

TL;DR culprits were 5c8e584 (regression) and a4251ea (improvement), both refactoring WorkRequestHandler, but they got into different releases.

Releasing java_tools at:
commit 4 no regression febf4438cd cushon Restore GenClass's --temp_dir flag once more

commit 5 38% regression in wall 5c8e584e88 steinman Move WorkRequestHandler from build jar to Worker package. This is part of the cleanup associated with enabling JSON workers.

commit 8 -26% regression in wall a4251eab69 steinman Refactor WorkRequestHandler to be an interface, of which Proto is one implementation.

1,38*(1-0.26)=1,02

Thanks @joeleba!

@comius
Copy link
Contributor

comius commented Nov 27, 2020

Raw data:

RESULTS:
Bazel commit: /tmp/benchmark-4-febf4438cd_HEAD_1, Project commit: c85ecbed46d4703c7d6bf6b52dab98972dbe9ea1, Project source: None
  metric          mean                median          stddev      pval   
    wall:  163.424s             163.606s               0.718s            
     cpu:   54.550s              54.520s               1.046s            
  system:   16.302s              16.280s               0.094s            
  memory:   92.400s              92.000s               0.490s            

Bazel commit: /tmp/benchmark-4-febf4438cd_HEAD, Project commit: c85ecbed46d4703c7d6bf6b52dab98972dbe9ea1, Project source: None
  metric          mean                median          stddev      pval   
    wall:  163.633s ( +0.13%)   162.889s ( -0.44%)     1.184s    0.12698 
     cpu:   53.980s ( -1.04%)    53.710s ( -1.49%)     1.565s    0.12698 
  system:   16.528s ( +1.39%)    16.290s ( +0.06%)     0.549s    0.12698 
  memory:   92.800s ( +0.43%)    93.000s ( +1.09%)     0.400s    0.12698 

RESULTS:
Bazel commit: /tmp/benchmark-5-5c8e584e88_HEAD_1, Project commit: c85ecbed46d4703c7d6bf6b52dab98972dbe9ea1, Project source: None
  metric          mean                median          stddev      pval   
    wall:  162.196s             161.956s               1.264s            
     cpu:   53.400s              53.060s               0.952s            
  system:   16.470s              16.430s               0.294s            
  memory:   92.800s              93.000s               0.400s            

Bazel commit: /tmp/benchmark-5-5c8e584e88_HEAD, Project commit: c85ecbed46d4703c7d6bf6b52dab98972dbe9ea1, Project source: None
  metric          mean                median          stddev      pval   
    wall:  226.739s (+39.79%)   224.517s (+38.63%)     7.650s    0.99206 
     cpu:   53.824s ( +0.79%)    54.330s ( +2.39%)     1.454s    0.12698 
  system:   16.730s ( +1.58%)    16.730s ( +1.83%)     0.140s    0.64286 
  memory:   92.200s ( -0.65%)    92.000s ( -1.08%)     0.748s    0.12698 

RESULTS:
Bazel commit: /tmp/benchmark-8-a4251eab69_HEAD_1, Project commit: c85ecbed46d4703c7d6bf6b52dab98972dbe9ea1, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval   
    wall:  222.846s             221.188s              5.016s             
     cpu:   53.466s              54.300s              1.425s             
  system:   16.564s              16.500s              0.177s             
  memory:   92.200MB             92.000MB             0.400MB            

Bazel commit: /tmp/benchmark-8-a4251eab69_HEAD, Project commit: c85ecbed46d4703c7d6bf6b52dab98972dbe9ea1, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval   
    wall:  164.262s  (-26.29%)  162.785s  (-26.40%)   3.078s     0.99206 
     cpu:   56.008s  ( +4.75%)   55.070s  ( +1.42%)   2.373s     0.64286 
  system:   16.352s  ( -1.28%)   16.160s  ( -2.06%)   0.411s     0.64286 
  memory:   92.800MB ( +0.65%)   93.000MB ( +1.09%)   0.400MB    0.64286

@meisterT
Copy link
Member

aha, the first commit brought back the System.gc that @larsrc-google recently removed and the second commit removed it again

@comius
Copy link
Contributor

comius commented Nov 27, 2020

we might want to consider automatic releasing or at least benchmarking the commits that go into java_tools... i almost have a script for that by now

@joeleba
Copy link
Member Author

joeleba commented Nov 27, 2020

It's probably not too challenging to set up a bazel-bench pipeline to benchmark the java_tools commits. We can talk more next week if you're interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-java-toolchains javabase, java_toolchain flags, JDK selection, java_toolchain rules, java_tools repository P1 I'll work on this now. (Assignee required) team-Rules-Java Issues for Java rules
Projects
None yet
Development

No branches or pull requests

3 participants