-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements #41
Conversation
Neat, thanks! The |
Great idea, I didn't see that :-) |
Neat! That's very... validating :) |
@korny wonder whether you have the latest data on the speed of Rouge 4.0 comparing to others? |
I only have benchmarks up to 0.3.5 right now. Were there any performance-related changes since then? |
"speedup 2" were some experimental changes for Rouge that I played around with. |
Thanks. I am just wondering. :) On Friday, August 23, 2013 at 5:31 AM, Kornelius Kalnbach wrote:
|
One of the reason that jekyll isn't using rouge(instead of pyments) is the performance concern. It seems like rouge is actually comparable in term of speed from your graph. Maybe it is time to revise this issue: jekyll/jekyll#930 |
@korny which version of ruby are you testing against? I know they've focused a lot on performance lately, and I'd be interested to see how it shakes out on f.ex. the falcon fork. Do you have a script to automate that graph? I'd love to play around with it. |
@jayferd Tests were done with Ruby 2.0.0p247 (the latest one). Didn't write a script, just ran some basic commands using Ruby 2 is much faster than 1.8, and marginally faster than 1.9. I think it's fair to compare Ruby 2.0 with Python 2.7 or 3, but I don't know about specific setups. I'm always testing on a 2011 MacBook Pro (i7) which has awesome Ruby (= single-thread) performance. The biggest issue with my stats is that I have not enough knowledge about Python to really tweak it. For example, I only tested with C-Python 2.7 and 3 (3 is slower), but there seem to be faster implementations out there. We should ask the Pygments guys how to properly benchmark it. Also, how about exploring the possibilities of precompilation or some other nice hacks to make Rouge even faster? CodeRay shows that Ruby can scan code much faster using ad-hoc scanners instead of a DSL. If we could somehow make the DSL-based scanners compile into single-method scanners, it would make CodeRay mostly obsolete. |
I should also mention that the ruby lexer in particular catches a few more edge cases than the one in pygments - I don't remember which ones exactly, but it's possible it makes it slower, especially since the rouge code contains so much complicated regex syntax. I'll take a minute soon to make sure pygments actually lexes rouge correctly. |
As for the results, I'd say that Rouge is in the same ballpark as Pygments (and Ruby wrappers around it), but still a bit slower. The most interesting output format would be HTML snippets with CSS classes. Two ways to look at it: speed (input code kilobytes per second) or time (how many seconds it takes to highlight a file). SpeedTime |
Cool! I just pushed a branch I'd like to try this out on ( |
@korny 👍 |
@korny : Awesome, thank you! Could you also tell us what did you use to get these nice graphs please ? |
@robin850 That's just Numbers :-) It's pretty awesome for making some quick graphs, because it's more flexible then Excel when it comes to layout. Here's the file: http://rubychan.de/share/Shootout.numbers |
Cool, I just merged a few commits that tightened things up a bit, the main one being that lexers no longer specify strings as tokens, but constants. So instead of |
Interesting. Did you find this to be faster? |
I benchmarked again with latest rouge master, and it's now faster than Pygments for both HTML and Ruby! Looks very promising. If @jayferd tweaks the lexers a bit more, I'm pretty sure Rouge can finally beat Pygments in all categories… (I also re-benchmarked Pygments.rb and Albino, the numbers were strange. Pygments.rb is supposed to be faster than Albino because it seems to avoid reloading the Python bridge every time, but Albino can't be much faster than the calling |
@korny : Thank you! :-) |
:D. Yep, that's why I waited to merge it until I'd run the shootout. |
I think we're doing this wrong. What I did was benchmarking performance for very large files (200 kB and up). For a blog generator like Jekyll, the most typical input for the syntax highlighter would be a short code example, somewhere between 1 and 20 lines of code. So we're talking about less than 1 kB. This matters a lot because even Pygments.rb still has a small overhead for calling the Python engine. For HTML output, this overhead is ~1ms. Every call takes at least this long. So for a small input (let's take 42 bytes), Rouge beats Pygments by a large margin:
If we look at an input size of 400 bytes, Rouge is still faster:
For 1000 bytes, the winner starts to depend on the language:
(Of course, this all depends heavily on the kind of code you input. For a single multi-line comment you get different numbers than a code golfing competition winner. I tried to use real-world code examples, but YMMV.) At the high end (10 kB and up), Rouge is clearly slower than Pygments (except for Ruby code):
For very large files (1 MB), Pygments wins:
So, instead of looking at raw theoretical performance for large data sets, I suggest we should look at actual real-world cases that would affect Jekyll's performance. |
Nice observation. It's good to know that Rouge is faster in smaller file. But one side effect of improving speed for general file size is that speed for small file size might be improved too. We should definitely bring this up on Jekyll. On Saturday, August 31, 2013 at 6:43 AM, Kornelius Kalnbach wrote:
|
Yeah, perhaps a better test would be to sequentially lex all the demos a number of times (since a blog post is likely to have lots of small snippets). That would test across all the lexers as well, and I think there are some that are pretty inefficient (although honestly ruby is one of the most complex ones). The other angle is eliminating hangs, which certain inputs can cause (see #77 and #78). |
Any further update on these tests guys? Have recent optimizations been done that would cause Rogue to be faster than Pygments in all categories for all file sizes? |
I ran the shootout again with the latest versions of Ruby 2.3, Python 2.7, Rouge, and Pygments.rb. Lower times are better.
Rouge is clearly faster than Pygments.rb in all cases. |
Awesome. Thanks @korny. Just what I was looking for. |
Hey, I'm playing around with Rouge right now. It's awesome to have a really good port of Pygments in Ruby.
The only thing it misses right now is comparable speed. In my tests (Ruby 1.9.3 and Python 2.7.2),
rougify
is about 4-7x slower thanpygmentize
.To help things, I detected two hot spots that are easy to optimize. It should speed up highlighting by about 40%:
code.rb
is the Rouge code (348K)code.json
is a simple JSON log file (216K)Maybe we can find more such optimizations. Based on my own experience, I firmly believe that Ruby code can be as fast as Python code without sacrificing too much beauty.