-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed and accuracy improvements to java benchmark #14229
Conversation
ryanhamilton
commented
Dec 2, 2015
- Increase the number of iterations to let the JIT compile kick-in. (this still isn't optimal, some calls are too quick and the clocks too inaccurate that our measurements are inaccurate.)
- Refactor to call PerfPure static functions from PerfBlas rather than repeating code in two places. Ideally all similar code should also be moved.
- Use parseUnsignedInt rather than valueOf, it's more accurate and faster.
- Remove custom quicksort routine and use Arrays.sort. It's faster, does the same and is more idiomatic.
- printf changed to behave closer to julia code, i.e. use printf rather than concatenation.
- Replace recursive fibonnaci with loop, much faster.
- Put in OS detection to allow tests to run on windows. I recommend not using /dev/null at all as this is a special case on most platforms so the test only shows performance for dev/null not files in general.
- Increase the number of iterations to let the JIT compile kick-in. (this still isn't optimal, some calls are too quick and the clocks too inaccurate that our measurements are inaccurate.) - Refactor to call PerfPure static functions from PerfBlas rather than repeating code in two places. Ideally all similar code should also be moved. - Use parseUnsignedInt rather than valueOf, it's more accurate and faster. - Remove custom quicksort routine and use Arrays.sort. It's faster, does the same and is more idiomatic. - printf changed to behave closer to julia code, i.e. use printf rather than concatenation. - Replace recursive fibonnaci with loop, much faster. - Put in OS detection to allow tests to run on windows. I recommend not using /dev/null at all as this is a special case on most platforms so the test only shows performance for dev/null not files in general.
Thanks for the improvements. However, the entire point of the Fibonacci benchmark is to have some idea for the cost of recursion. |
w.r.t to recursive fib I think the idea is to implement the same algorithm in Julia and across languages and the micro algorithm in Julia is using recursion so that you are comparing apples with apples. Same thing for quicksort. |
Ditto with the quicksort. We want to know how fast user code that shuffles array elements is, not haw fast the hyper-optimized system quicksort is. |
The |
Thanks for working on this @ryanhamilton. Having fair benchmarks is really important. |
Yes, thanks for the improvements, which appear to be these:
The number of JIT iterations is a bit debatable, but arguably not wrong. This is a tricky issue since some languages like C and Fortran require no JIT, Julia has a first-time-only JIT, and other systems like Java and JavaScript have JITs that kick in after an indeterminate amount of time. How many iterations are fair? |
Agreed it's tricky, but I think the only clean answers are "the first iteration" or "in the asymptotic limit." Anything else feels pretty arbitrary. Besides, I imagine julia might go that way someday to decrease the cost of run-time compilation 😉. |
@StefanKarpinski I would say let it run 50 times, calculate time taken per run, do it again for 70 runs, did you get a similar result i.e. 5% change. Keep going until you have a known confidence level. |
I mean this in the most productive way for the Julia project...What is the goal of these benchmarks? Because as a technical user, when I'm asked to evaluate and decide if we should use Julia for part of our solution I am going to dig into the technical details and if I find the technical parts lacking I will be unimpressed. If you don't have benchmarks for such a purpose I advise making them one of your priorities so that:
|
There's a paragraph right after the benchmarks table on the home page explaining their point:
If you find that unconvincing or uninteresting, that's fine – there are lots of examples of real world use cases that we didn't design where Julia is close to C and Fortran in performance. |
Regarding the benchmarks for performance tracking, we used to have an installation of pypy's speed center, but it proved somewhat unreliable and not very useful, so it was discontinued. @jrevels Is currently working on an improved version of that to get performance-tracking CI back. |
@StefanKarpinski That text is extremely upfront and detailed. I'm impressed. I should also read better :) That just leaves:
|
Objections about the purpose of benchmarks keep being reported. Maybe we should put that paragraph before the benchmarks, with a warning in bold and red like "Please read the following disclaimer before interpreting these benchmarks." I'm afraid some people conclude the Julia team is cheating, as has been claimed in some blog posts already. |
Just for reference, the issue tracking the development of our CI performance testing system is #13893 |
Closing as unlikely to get through. Will make smaller PRs instead. |
Benchmark issue popped up on hacker news: https://news.ycombinator.com/item?id=10735840 |
Speaking of Java: Overall the method body is way too big to and JVM will give up on many optimizations. I would not consider most of the code idiomatic. |
We could change this to use
Testing recursion is the explicit purpose of that benchmark. "Tail call optimization" is not usually an optimization at all – it's often slower than just pushing a stack frame. Also, since that algorithm is doubly recursive, you can't eliminate the recursive calls entirely.
All of the languages are doing both, so that's fine.
If that avoids locking overhead, then we should do it. Why would using this make it non-deterministic?
What method body? This is all straightforward purely static code using primitive data types, hardly any objects, so a compiler for a static language like Java should have no difficulty optimizing it. |