Speed and accuracy improvements to java benchmark #14229

ryanhamilton · 2015-12-02T14:27:36Z

Increase the number of iterations to let the JIT compile kick-in. (this still isn't optimal, some calls are too quick and the clocks too inaccurate that our measurements are inaccurate.)
Refactor to call PerfPure static functions from PerfBlas rather than repeating code in two places. Ideally all similar code should also be moved.
Use parseUnsignedInt rather than valueOf, it's more accurate and faster.
Remove custom quicksort routine and use Arrays.sort. It's faster, does the same and is more idiomatic.
printf changed to behave closer to julia code, i.e. use printf rather than concatenation.
Replace recursive fibonnaci with loop, much faster.
Put in OS detection to allow tests to run on windows. I recommend not using /dev/null at all as this is a special case on most platforms so the test only shows performance for dev/null not files in general.

- Increase the number of iterations to let the JIT compile kick-in. (this still isn't optimal, some calls are too quick and the clocks too inaccurate that our measurements are inaccurate.) - Refactor to call PerfPure static functions from PerfBlas rather than repeating code in two places. Ideally all similar code should also be moved. - Use parseUnsignedInt rather than valueOf, it's more accurate and faster. - Remove custom quicksort routine and use Arrays.sort. It's faster, does the same and is more idiomatic. - printf changed to behave closer to julia code, i.e. use printf rather than concatenation. - Replace recursive fibonnaci with loop, much faster. - Put in OS detection to allow tests to run on windows. I recommend not using /dev/null at all as this is a special case on most platforms so the test only shows performance for dev/null not files in general.

jiahao · 2015-12-02T14:30:40Z

Thanks for the improvements. However, the entire point of the Fibonacci benchmark is to have some idea for the cost of recursion.

vchuravy · 2015-12-02T14:31:42Z

w.r.t to recursive fib I think the idea is to implement the same algorithm in Julia and across languages and the micro algorithm in Julia is using recursion so that you are comparing apples with apples. Same thing for quicksort.

StefanKarpinski · 2015-12-02T14:32:27Z

Ditto with the quicksort. We want to know how fast user code that shuffles array elements is, not haw fast the hyper-optimized system quicksort is.

StefanKarpinski · 2015-12-02T14:33:44Z

The parseUnsignedInt change is also invalid since other languages parse the numbers as signed integers.

Keno · 2015-12-02T14:37:15Z

Thanks for working on this @ryanhamilton. Having fair benchmarks is really important.

StefanKarpinski · 2015-12-02T14:41:25Z

Yes, thanks for the improvements, which appear to be these:

PerfPure/PerfBlas code deduplication
printf change
Windows portability

The number of JIT iterations is a bit debatable, but arguably not wrong. This is a tricky issue since some languages like C and Fortran require no JIT, Julia has a first-time-only JIT, and other systems like Java and JavaScript have JITs that kick in after an indeterminate amount of time. How many iterations are fair?

timholy · 2015-12-02T14:49:12Z

How many iterations are fair?

Agreed it's tricky, but I think the only clean answers are "the first iteration" or "in the asymptotic limit." Anything else feels pretty arbitrary.

Besides, I imagine julia might go that way someday to decrease the cost of run-time compilation 😉.

ryanhamilton · 2015-12-02T14:49:37Z

@StefanKarpinski I would say let it run 50 times, calculate time taken per run, do it again for 70 runs, did you get a similar result i.e. 5% change. Keep going until you have a known confidence level.

ryanhamilton · 2015-12-02T14:51:56Z

I mean this in the most productive way for the Julia project...What is the goal of these benchmarks?
Is it to benchmark how code written by a naive programmer in a julia style would perform in other languages?

Because as a technical user, when I'm asked to evaluate and decide if we should use Julia for part of our solution I am going to dig into the technical details and if I find the technical parts lacking I will be unimpressed.

If you don't have benchmarks for such a purpose I advise making them one of your priorities so that:

You know the effect of your changes on real world performance. This system for PyPy is a good example: http://speed.pypy.org/timeline/ It gives me confidence that those guys have put thought into studying their progress at making their system faster.
To convince tech leads in companies to use Julia

StefanKarpinski · 2015-12-02T14:57:47Z

There's a paragraph right after the benchmarks table on the home page explaining their point:

These benchmarks, while not comprehensive, do test compiler performance on a range of common code patterns, such as function calls, string parsing, sorting, numerical loops, random number generation, and array operations. It is important to note that these benchmark implementations are not written for absolute maximal performance (the fastest code to compute fib(20) is the constant literal 6765). Rather, all of the benchmarks are written to test the performance of specific algorithms, expressed in a reasonable idiom in each language. In particular, all languages use the same algorithm: the Fibonacci benchmarks are all recursive while the pi summation benchmarks are all iterative; the “algorithm” for random matrix multiplication is to call LAPACK, except where that’s not possible, such as in JavaScript. The point of these benchmarks is to compare the performance of specific algorithms across language implementations, not to compare the fastest means of computing a result, which in most high-level languages relies on calling C code.

If you find that unconvincing or uninteresting, that's fine – there are lots of examples of real world use cases that we didn't design where Julia is close to C and Fortran in performance.

Keno · 2015-12-02T15:04:23Z

Regarding the benchmarks for performance tracking, we used to have an installation of pypy's speed center, but it proved somewhat unreliable and not very useful, so it was discontinued. @jrevels Is currently working on an improved version of that to get performance-tracking CI back.

ryanhamilton · 2015-12-02T15:05:30Z

@StefanKarpinski That text is extremely upfront and detailed. I'm impressed. I should also read better :)

That just leaves:

The issue of iterations
/dev/null for printfd tests. It's a special case that I think is handled differently on certain platforms compared to standard files.

nalimilan · 2015-12-02T15:09:00Z

Objections about the purpose of benchmarks keep being reported. Maybe we should put that paragraph before the benchmarks, with a warning in bold and red like "Please read the following disclaimer before interpreting these benchmarks." I'm afraid some people conclude the Julia team is cheating, as has been claimed in some blog posts already.

jrevels · 2015-12-02T15:11:30Z

@jrevels Is currently working on an improved version of that to get performance-tracking CI back.

Just for reference, the issue tracking the development of our CI performance testing system is #13893

ryanhamilton · 2015-12-02T16:52:57Z

Closing as unlikely to get through. Will make smaller PRs instead.

ryanhamilton · 2015-12-16T06:35:27Z

Benchmark issue popped up on hacker news: https://news.ycombinator.com/item?id=10735840

bestsss · 2015-12-16T11:16:18Z

Speaking of Java:
quicksort (hi+low)/2 doesn't account for integer overflow.
Fib: Java doesn't have tail call optimizations (and likely won't have them as the stack trace is needed for the security manager)0 So using naive fib in java is ok if you wish to test recursion alone.
parseInt benchmark is actually dominate by int->String conversion NOT parsingInt
A lot of timings are dependent on Random.nextXXX which is tread safe and involves CAS on x86. Using ThreadLocalRandom is the preferred way but that means losing determinism.

Overall the method body is way too big to and JVM will give up on many optimizations. I would not consider most of the code idiomatic.

StefanKarpinski · 2015-12-16T17:58:30Z

Speaking of Java:
quicksort (hi+low)/2 doesn't account for integer overflow.

We could change this to use >>> instead of integer division. I don't think it matters, however, since any compiler worth its salt will optimize integer division by 2 to an arithmetic right shift by one, which is just as fast.

Fib: Java doesn't have tail call optimizations (and likely won't have them as the stack trace is needed for the security manager)0 So using naive fib in java is ok if you wish to test recursion alone.

Testing recursion is the explicit purpose of that benchmark. "Tail call optimization" is not usually an optimization at all – it's often slower than just pushing a stack frame. Also, since that algorithm is doubly recursive, you can't eliminate the recursive calls entirely.

parseInt benchmark is actually dominate by int->String conversion NOT parsingInt

All of the languages are doing both, so that's fine.

A lot of timings are dependent on Random.nextXXX which is tread safe and involves CAS on x86. Using ThreadLocalRandom is the preferred way but that means losing determinism.

If that avoids locking overhead, then we should do it. Why would using this make it non-deterministic?

Overall the method body is way too big to and JVM will give up on many optimizations. I would not consider most of the code idiomatic.

What method body? This is all straightforward purely static code using primitive data types, hardly any objects, so a compiler for a static language like Java should have no difficulty optimizing it.

ryanhamilton closed this Dec 2, 2015

ViralBShah added the potential benchmark Could make a good benchmark in BaseBenchmarks label Dec 16, 2015

jrevels removed the potential benchmark Could make a good benchmark in BaseBenchmarks label Jan 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed and accuracy improvements to java benchmark #14229

Speed and accuracy improvements to java benchmark #14229

ryanhamilton commented Dec 2, 2015

jiahao commented Dec 2, 2015

vchuravy commented Dec 2, 2015

StefanKarpinski commented Dec 2, 2015

StefanKarpinski commented Dec 2, 2015

Keno commented Dec 2, 2015

StefanKarpinski commented Dec 2, 2015

timholy commented Dec 2, 2015

ryanhamilton commented Dec 2, 2015

ryanhamilton commented Dec 2, 2015

StefanKarpinski commented Dec 2, 2015

Keno commented Dec 2, 2015

ryanhamilton commented Dec 2, 2015

nalimilan commented Dec 2, 2015

jrevels commented Dec 2, 2015

ryanhamilton commented Dec 2, 2015

ryanhamilton commented Dec 16, 2015

bestsss commented Dec 16, 2015

StefanKarpinski commented Dec 16, 2015

Speed and accuracy improvements to java benchmark #14229

Speed and accuracy improvements to java benchmark #14229

Conversation

ryanhamilton commented Dec 2, 2015

jiahao commented Dec 2, 2015

vchuravy commented Dec 2, 2015

StefanKarpinski commented Dec 2, 2015

StefanKarpinski commented Dec 2, 2015

Keno commented Dec 2, 2015

StefanKarpinski commented Dec 2, 2015

timholy commented Dec 2, 2015

ryanhamilton commented Dec 2, 2015

ryanhamilton commented Dec 2, 2015

StefanKarpinski commented Dec 2, 2015

Keno commented Dec 2, 2015

ryanhamilton commented Dec 2, 2015

nalimilan commented Dec 2, 2015

jrevels commented Dec 2, 2015

ryanhamilton commented Dec 2, 2015

ryanhamilton commented Dec 16, 2015

bestsss commented Dec 16, 2015

StefanKarpinski commented Dec 16, 2015