-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use global regexes (cached) for small speed improvement #2501
Comments
(I added a few browsers there) Curiously it looks like it's dramatically faster only in Chrome :). I guess it's still could be a neat piece of code to use for those regexps that execute the most (e.g. vars, properties till |
V8 likely implements regex literals strictly per ECMAScript 5, which states that regex literals are compiled each time the literal is evaluated, as if it had been declared using the The other engines are probably still implementing the ECMAScript 3 behavior, which states that regex literals are compiled once, before the containing program or function is compiled and executed, and any use of the corresponding literal is treated as a reference to the same compiled object. Or they're using a hybrid form, where the regex state machine is compiled and created only once, but fresh instances of |
You should see what caching can do when the regex is dynamically created from a string based on options or user input etc. For example: var cache = {};
function createRegex(str) {
return cache[str] || (cache[str] = new RegExp('foo' + str + 'bar'));
} I created this not long ago. Really, the code above is all that's needed if the only input is a string, still it's pretty interesting how much of a difference caching can make |
If you look at parsers like acorn (http://marijnhaverbeke.nl/acorn/), and a few others that have reputation for speed, you'll notice that there are barely any regexes at all, and instead use sequences of OR In Less, we often have a series of OR checks for regexes, and those regexes themselves have backtracking OR switches. Pragmatically, it's easy to maintain, but that pattern is more likely to negatively impact performance. I've been thinking about how we can very gradually improve speed of individual node matching. I was reading this - https://github.com/felixge/faster-than-c (a very good read about making JS faster than C), and he advocates benchmark-driven development. I was wondering if we could create some kind of test harness, like our own kind of JSperf, that has sequences of .less input, and tests the performance of matching to a single matching function in the Less library (and verifies that the output is as expected). Then a dev could tweak that function, run more tests, save / graph the results. That way we don't have to KNOW what would be faster so much as try something and see if it's faster. If it's faster on all supported platforms, make a PR for that particular function, badda bing, badda boom. I'm not sure the best way to go about that, but it would be great to be able to quickly test all these great ideas. |
Also, would there be any value at looking at JS-based CSS parsers like https://github.com/css/gonzales to see if there are ideas we could port? Surely there would be relevant pattern matching from something like that. I know each of our entities gets a lot more complex, but just an idea. |
That type of parser formally separates itself into a lexer/tokenizer and a LALR(k) parser. The power and speed of that approach comes from employing generalized regular expression matches on a very small scale (mostly for recognizing identifiers, filtering whitespace, etc.) and limiting yourself to a fixed set of hand-coded lookaheads by hand-writing a parser based on accepting the lexer's tokens via a set of production rules formulated as a BNF grammar. You could do that with Less as well, but you'd need to build on a compliant CSS parser first, which can tokenize and parse CSS selectors, which (if I understood correctly from the original discussion) is really where regex lookahead starts to hurt a lot. I remember CSS 2.1 having a BNF grammar for the language in the W3C recommendation. I assume CSS 3 is no different and does the same. You could still short-circuit a lot of the grammar, since you don't need to know every little detail of it. |
@rjgotten Is this what you meant? http://dev.w3.org/csswg/css-syntax/ They really break it down. Just came across that recently. |
I looked briefly at switching to a parser generator - basically its a
rewrite. Id love to use a parser generator to make it all generated.
|
I've read that parser generators can leave you with a different set of problems (advocating hand-writing parsing), but I don't really know one way or the other. I think as long as we can benchmark the results, however we got there is all good. |
Can you run this through on a few browsers? http://jsperf.com/regex-string-size I'd expect if size of the string is less's performance problem (as shown by chunkInput) I'd expect to see the speed decreasing with size of the string. Any idea what wrong with the test if that isn't happen? Got to run for a flight or I'd have done more testing... |
On my setup, Safari performed best, with all the non-cached operations happening at roughly the same rate, and the cached operations at a different rate. Chrome 41 shows performance issues as the string size increases. Chrome 42 does not show as much issue. Firefox trails way in last with every operation performing the same rate, cached or not. I think we can expect that every JS engine will optimize each regex differently, and even the same regex will perform differently with each input. So, nothing will beat testing. I did another arbitrary test here (http://jsperf.com/regex-string-size/2) , just comparing OR in regex vs. testing two regexes and matching one or the other. I also changed up where the backtracking happens in the regex. And the browsers don't all change performance at the same rate. In fact, Firefox does better at an OR between two regexes than one of the regexes which backtracks in a different place, while Chrome and Safari always do much better at a single regex. At the very least, the data shows caching is generally better than not caching, except Firefox. But then, it's certainly not worse. I think we can start with what @jonschlinkert said. For any place in the code where we're testing against multiple regexes, it should still be faster to dynamically create a regex and test against a single one. And even faster if the regex isn't being redefined every time the function is visited. (Implement caching). And then run benchmarks on changed functions. |
Cached vs non-cached seems to indicate regex setup cost does not increase
with regex complexity so it might be a small win.
I tried adding a few current character checks to avoid running the regexes
used the most. Will post back here next time i work on it...
|
I think it's not just the size of the input string but the "the size + some lookahead regexs". I.e. only certain re patterns timings may increase exponentially as the input size grow and P.S. (a bit offtopic) When making the highlighter I found this tool to be quite useful for debugging of such "bad" patterns (it really helps to detect some problematic parts earlier though you still need to invent/predict those ambiguous inputs that may put the regex down yourself (so it's still kinda tedious thing)). |
I agree, it's a ton of work to try to predict what will work. That's why it would be great if we could staple a benchmarker integrated to the Less lib directly (only loaded for tests / development). Change code, and if the results are accurate, and it's faster, keep code. If we could turn on the benchmarks just for function X and run all the less tests (which would throw a bunch of .less patterns and file sizes, yes?), it seems like it would give a more "real-world" picture. |
@seven-phases-max The highlighter and other tools are cool too, definitely should check those out. Let's hit this from all sides. Maybe we could make "performance" the next major feature (other than features in progress)? |
Ive fixed the benchmark test and made it more accurate - I was using that.
But thats in node / v8... Im most interested in fixing the regexs making
parsing slow in non-v8 with large files.. the hardest part will be finding
which ones that is.
Im not really interested in performance that much - i think code
readability is more important, but at the moment we need to fix non v8
browsers.
Once fixed (if possible) i can permanently remove the chunker.
|
Well, if you want to do something about both: start by separating your parser into a real lexer/tokenizer and a token-based parser. Right now, you still have a lot of tokenization happening inside the actual parsing routines. And if your tokenization is happening mostly based on regexes, that's really like extending an invitation to all the degenerate backtracking cases to come and join the party. Performance-wise, the good thing about having a separate lexer is that you generally only need regexes to recognize a small pool of token types from text and as tokens almost exclusively consist of one word, it means you have very little chance for degenerate backtracking in a long chain to recognize a token. The parser no longer receives a stream of text, but a stream of strongly-typed tokens. Tokens are always the same (even if they can be used in different roles). The 'real' backtracking that disambiguates language constructs will then happen inside the parser only and won't have to involve complex backtracking in regexes anymore. The maintainance/readability upshot of this is that you get named constants for your tokens that are much easier to work with and your parser will contain only the isolated parser logic, being much easier to follow than also receiving the added cognitive load of dealing with the inlined regexes and text stream processing. Also, if you formally separate lexer and parser, you can more easily handle another current issue, namely the one regarding comment parsing. Structured tokens means you can attach debug information like line and column number and a sample of surrounding input directly to each token. That enables you to keep that process intact while choosing to omit comments whenever you 'fetch a next token' from the lexer. I.e. you pretty much get the whole 'selectively allow comments' idea for free. |
yes, it would be nice, but its a big change that I don't really have time to do - I inherited the existing parser and never felt like I had enough time to re-write it. It is a different approach from the current "chunker", which just aims to split the input up into things about 200bytes long, guessing boundaries, but it gets unstuck by things like
I started with something simple - replacing regex's that don't have any logic in them. they aren't the ones causing the problem, but still.. f8de5bc |
We can do that already - that isn't the problem with comments, the comments is how they are represented on the ast is all wrong. |
A way towards a lexer might be to start using save more and slowly convert regex's to only use str and char. At that point a lexer can easily be introduced and the str and char's swapped out for token comparisons. |
Yeah; I kind of glossed over the fact that comments are the one thing that's already handled at the lexer level (i.e. parser-input). My bad. |
That probably means the benchmarking piece may need to be browser-based somehow. But, I would also say that it's reasonable that if we address performance bottlenecks in V8, it's very likely to solve things on other engines. For instance, if there are pieces which cause exponential slowdown, and V8 is starting with a small amount of slowdown, then just addressing that small piece should speed up other engines by orders of magnitude. Not in every case, but I don't really see this as optimizing for a particular JS engine (although that's nice when that happens in the case of V8), but just figuring out more efficient algorithms. So, I'm interested in those other engines too; I just don't think they need to be specifically targeted in order for performance to increase. |
I just came across this relevant quote by programmer Jaime Zawinksi:
lol |
Well, the problem is not in regular expressions but in how one uses them. As a C++ programmer who has never used regexs before ~2012 or so I can assure you ;). Things like "OK, I just replace this regex with a bunch of my spagetti |
I know, no principle in programming is absolute. Just funny because it contains some truth. |
haha, awesome |
Just, FYI, @lukeapage (et al) I decided to follow up with this:
I've been looking at a few libraries and reading on different parser generator methods and algorithms. However, I don't want to clutter this thread here until I find a good candidate (which I may have). Just wanted to mention it on the off chance anyone else was doing that work. Once I find something that MAY work, then I may need to try to generate the parser in order to compare performance. |
Also, 390ms would still be a win if it was 150ms instead, so I'm going to look at parser generators anyway when I have time. I'll share if/when I find something interesting. |
Matthew, ive managed to remove alot if regexes from the current parser -
were those measurements from master or last release?
|
@lukeapage Were you asking @rjgotten? I was referring to his measurements. |
Yes, sorry got a bit muddled replying on my phone while walking outside...
The first bit was for you, the second bit @rjgotten.
|
The bulk of the compiler's work. It's the phase where variables are resolved, mixins are called, etc. (Afaict In the post-visit phase the nested selectors are resolved into their final complete form. And then the complete thing gets handed off to the CSS generation phase, which is simply where
From master. |
Just curious, what's the reason that things are converted to plain but nested rulesets and then again reduced to CSS rulesets? Couldn't the intermediary step be skipped? I mean, I know they're sequential, but just wondering if there would be a way to reduce object creation. After all, there's still the matter of frequent huge GC dumps, although that may have been mitigated with chunkInput turned back on. |
It's not that they are specifically converted to plain rulesets. They just become plain because their internal stuff (e.g. vars and mixins) is "expanded". So, yes, it's just one step ("flatten rulesets") after another ("expand ruleset internals")... |
@matthew-dean Regarding reduction of object creation; I think @lukeapage hit the nail on the head with the scopes involved in mixin call evaluation. I have a rather complex system in a current project that maps a set of names to character codes, for use with font-based icons. The supports leaving a character code undefined for one named icon, at which point it should take the code of the previous icon, incremented by one. Very nice for custom icon font definitions as you can keep a neat and tidy ordering without having to remap codes the whole time. However; it never managed to scale well and gave me ridiculously long compilation times when I got into larger icon sets. Anyway, it has quite a bit of branching-logic inside a tight loop that has to iterate over the list-of-lists I used to model the dictionary / map, which means mixins ... with guard conditions ... and overloaded signature matching ... on each iteration. As a matter of experiment, today I removed the feature where undefined codes are automatically filled based on increments and I wrote a few dedicated custom functions to directly perform key lookups on a list-of-list (I love my new Doing away with the mixin-based branching-logic and iteration this way shaved 7 seconds off of the eval tree phase of the compilation of that project's file set and that was with a fairly limited set of just 50 icons. As an aside: |
It's the problem of the language. I don't think @lukeapage or anyone else can do too much about this, i.e. of course there can be some performance improvement tricks here and there, but nothing can cancel the whole picture: all this stuff just has to be re-evaluated in each scope because every scope may have different variables and mixins that may change the behaviour of another mixin call/match. (Meanwhile there's way to "iterate" over a list w/o any loops at all... It's just has to be different list not one of a variable ;) (of course since this method is also mixin-based it probably can't be as fast as dedicated JS-function). |
I guess the takeaway here is that if you need to do heavy-lifting stuff and need to do it fast, you really need the JS escape hatch. ;-) |
WebGL to go! (who needs these stone-age text based icon descriptions when you can iterate and render right in GPU :) |
@lukeapage Would lazy scope generation be an option? The disadvantage is that the code may end up more complicated. Where is in the code is the scope generated? |
About parser tokenization/generation: less4j is written that way and I am not sure benefits are worth complete rewrite. Adding features like detached rulests is somewhat faster I think, but you are loosing complete control you have now. Css/less does not break down into nice strongly typed tokens, so I have to rely on semantic predicates and "tricks" more then I like. Less.js knows the context, the code it uses to parse selector is completely unrelated to code used for property names or anything else. Less4j on the other hand has to use the same tokenizer for everything which leads to very small almost weakly typed tokens. Similar problem happen with whitespaces. They are ignorable almost everywhere except some places. It was solvable of course, but somewhat easier in hand made parser. A lot of what is in css is like that - many general rules have at least some exceptions. If I would be doing third compiler, I would likely go with generation again (depending on grammar), but it is pretty tight. It can be that current less.js architecture has downsides I do not know about, I did not looked that much into parser part. What I seen looked straightforward. |
But that's what a tokenizer (or rather: lexer) is and does: it generates tokens. It's then up to a parser to decide how particular sequences of tokens are allowed to fit together and what meaning is attributed to those sequences. You may think those tokens 'weakly typed', but infact they are very specifically typed: identifier, opening-brace, closing-brace, etc. The wider and weaker you make that definition, the harder it becomes to manage token extraction correctly without complicating the lexer into a full-blown parser with need for complex backtracking etc. implemented directly on the character stream with regexes. Tokenization is meant as a pre-processing step that 'pre-shrinks' the input set, if you will and prevents you from having to run painfully large regular expressions over huge input strings. If fully automated generation isn't your thing; you could always write a lexer/parser separation by hand. Full control and all the maintainance (and theoretical performance) benefits. |
Some grammars are better fit for that than others. Parsing java (minus error handling) or python this way is likely to be easy and straightforward - they were designed in clean way that makes it easy. Css and thus less have features that makes it less perfect fit. The thing about css/less is that it sometimes resembles multiple grammars merged into one file. I seriously considered having special grammar for selectors where I would parse them only on superficial blob and then run through secondary parser. Take identifiers - one way or the other they are pushed into parser instead of lexer. Less does not have keywords, so Selector inside extend needs to be parsed almost as normal one, except that all in the end. Nothing unsolvable, but if I would be able to re-tokenize part of grammar that would be awesome. ANTLR actually supports something like that - you can switch tokenizers on the fly, but not in a way needed here. In java, you can throw away most whitespaces. In python, whitespaces generarely matters. In less, they dont except few exceptional places - so you end up sacrificing performance or doing tricks on those. As I told, it is nothing unsolvable. Les4j is very close to less.js and has everything I am aware of minus options and new features. But, it does slow you down every time something like that pops up and it forces me to do maintainability or performance trade offs I would prefer to avoid. Side-note: it happened to me twice that I googled a problem and answer on antlr mailing list was: "best to redesign your grammar". They were right, but we can not really do that. |
The other thing to be considered is error handling and error reporting. I have read about teams switching from generator to hand-made parser for that reason - it turned out easier that way. If you really want to get it right with parser, your grammar can become big/convoluted due to error handling rules. I think that ANTLR 4 was trying to make it easier, but I did not looked at it yet. (ANTLR 4 should be also faster and more programmer friendly.) It is something to be considered before switch, because you do not want to end up switching to other generator or hand-made again because of that. |
One problem with ANTLR is that it seems to make extremely verbose parsers. The grammar definitions seem fairly complicated to design / maintain. An auto-generated parser may be the way, but I'm not sure ANTLR is that solution IMO. |
What about something like this: https://github.com/Hardmath123/nearley |
You're not thinking small enough and as a result you're still making the tokenizer too powerful. For at-rules and variables the When the parser asks the lexer for the next token it does so with explicit instruction to not allow whitespace inbetween. The parser then expects the lexer to respond with an If the lexer doesn't reply with an If you work with generalized tokens this way, then when the parsing context is popped you can also shift all previously lexed tokens back onto the front of the token 'stream' as the tokens themselves are free of context. This means you don't waste cycles on relexing. You only waste a bit of memory in keeping processed tokens around in a memory buffer until you can be 100% sure they can be discarded. |
Doesn't that require semantic predicate e.g. js/java/whatever condition? They were supposed to be used as last resort, since slow down algorithm and can mess up with prediction (from what I remember plus probably depends on what generator you use). I was doing it that semantic predicates way in less4j (minus last part - generator control that not me) and just finished removing them - moving as much as possible into "pure" grammar. |
A remark not related to the original @lukeapage goal, but to note something important before you go too far along that "parser refactoring" road (as of the recent posts here). Before there were already rumours that for In other words if |
I'm going to second what @seven-phases-max is stating; the results I can get from a few of my own (quite large) less codebases also indicate that the bottleneck on anything of substantial size and complexity is in the ruleset/mixin eval phase. I notice there's a lot of array cloning (via And ofcourse; type stability, type stability, type stability. Can't really state that one enough. |
Well, let's attack it from all sides, shall we? But in order to do that, we need to be able to quantify improvements: that is, being able to hook in timers (in a dev process) for the time spent in each function. I think we could do something like have a dev function that monkey patches selective Less functions. That way you wouldn't have any code left in production builds. It could even be a Less plugin. Basically, you override each function with a function of the same name that takes a Performance.now() measurement before and after the original function call, and increment the total count as you go. Output the totals per function at the end, easy peasy. Once we have that, then we can quantify improvements with compiling different Less stylesheets, at least how it runs in current versions of V8. Otherwise we'd just be guessing if replacing an array slice or concat makes any measurable difference. I know we have total time in our benchmarks, but it would be nice to have function-level benchmarking, so that retooling a specific function is easier to measure. |
That's more or less what the JS profiler in both Firefox and Chrome's developer tools are already designed to do. (Sidenote: Firefox's is also pretty nice in that it supports an 'inverted call tree' setting. It makes it easy to identify the biggest spenders at the lower levels of the call stacks without having to drill down through a huge graph manually.) There's a NodeJS package that creates bindings to the V8 profiler that's active in Node 0.11, which I think is exactly the kind of telemetry data you're looking for: |
@rjgotten Oh nice! I'll take a look. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I wonder if this will help with speed given how much we use regex's
http://jsperf.com/regex-caching-lp
The text was updated successfully, but these errors were encountered: