Improve TokenUtils performance through caching #9964

marcelgerber · 2014-11-18T22:22:31Z

Testing

Test 1
Using this file: https://gist.github.com/MarcelGerber/5c11f1031af292044708, place your cursor in the <script> tag and press Ctrl-Space.

Test 2
Using same file, place your cursor in the <body> tag and press Cmd/Ctrl-E.

redmunds · 2014-12-06T00:48:49Z

src/language/HTMLUtils.js


-        while (TokenUtils.moveNextToken(ctx, false)) {


@marcelgerber Thanks for digging into this one. That's an impressive improvement.

I'm wondering if the main slow down is in TokenUtils.moveNextToken() because it has to get the full line text for every token. I wonder if the line text can be cached in the "context" object to improve performance? Brackets seems to choke on files with long lines in many places so that would provide a benefit for many other functions and not just this one. Care to give that a try?

@redmunds is not getting the line, is mainly ctx.editor.getTokenAt() which has to parse the entire line up to the given position every time, and essentially parsing the same content many times.

What we could do is, run cm.getLineTokens(line), save the tokens in the ctx and make moveNextToken use those tokens until it gets to a new line, where it will call cm.getLineTokens(line), save it and return the first one.

BTW, why are the TokenUtils function that require a ctx and not a class that can be created one with methods that will not require the ctx?

marcelgerber · 2014-12-06T22:05:08Z

@redmunds @TomMalbran I managed to come up with a caching solution in TokenUtils, which seems to work out quite well.
I'd appreciate your feedback!

redmunds · 2014-12-08T17:18:44Z

@marcelgerber This is a great start! I verified that it fixes #9717, but it will need lots more testing.

I think the (1 sec) timeout should be replaced by listening for the Document "change" event. If there's a change to the line that's cached, then clear it.

Also, there can be multiple contexts at one time, so it might be worth having a cache for each context object. This is usually used to looking ahead to next (or back at previous) token when parsing without affecting current context, so it will usually use the same cached line, but this is something to keep in mind as a future improvement.

redmunds · 2014-12-08T17:24:53Z

src/utils/TokenUtils.js

+            return editor.getTokenAt(pos, precise);
+        }
+        var cachedTokens = _manageCache(editor, pos.line),
+            token = _.find(cachedTokens, function (token) {


_.find() iterates over list which is a sequential search. For really long lines of text (e.g. minified files), it's probably worth implementing a binary search here.

Another idea is to keep track of last token, and pass in a first/last/prev/next/random flag to provide a hint where to start search. But, this may only work if each context has it's own cache.

Binary search sounds promising. I'll try that once I got the time.

I think it might be worth it for the future to have a Token class, with move methods and an internal cache, since it looks like after having a context we want to move it searching for something.

I actually found out that Lodash has a binary search function (_.sortedIndex - is there anything that's not in Lodash?!) and well, I used it :)

redmunds · 2014-12-08T17:31:05Z

The Editor.getModeForRange() method calls TokenUtils.getModeAt() for every token in a range, so this is another opportunity for improvement using this cache.

redmunds · 2014-12-08T17:35:08Z

Done with code review.

marcelgerber · 2014-12-08T17:53:09Z

Ugh, didn't notice there's a JSHint error.
@redmunds Could you just look for a case where multiple contexts are used at the same time?
The problem with caching multiple is that it can get memory-heavy real quick. A minimized file can easily have 15,000 tokens a line, and I don't think we want have really big arrays gambling around...

Ah, and the problem with getModeAt() is that it wants precise results, which is not something we usually want to have cached.

redmunds · 2014-12-08T18:56:03Z

Could you just look for a case where multiple contexts are used at the same time?

Search both language/CSSUtils.js and language/HTMLUtils.js and for moveNextToken and movePrevToken you will see many cases.

The problem with caching multiple is...

As I said, this just needs to be kept in mind -- we might not need this. Seems like the worst performance is in minified files where everything is in (mostly) 1 line, so a single cache should work.

marcelgerber · 2014-12-09T20:14:52Z

Just switched getModeAt to use, ehm, cm.getModeAt. I wonder why we hadn't used it before. It's way more performant.

About doc.change events:
The problem here is that we are only passed a CM editor, so AFAIK there's no way to get the corresponding Brackets doc.

redmunds · 2014-12-09T22:14:31Z

About doc.change events: The problem here is that we are only passed a CM editor

That's only true for getModeAt(). All of the other funcs seem to have editor so use editor.document.

_.sortedIndex

The problem with _.sortedIndex is that it matches an exact end value. I think this code needs to find a token whose range contains a given pos.

UPDATE: it looks like you can pass a function to _.sortedIndex so that should work.

redmunds · 2014-12-09T22:25:52Z

src/utils/TokenUtils.js

@@ -158,14 +204,10 @@ define(function (require, exports, module) {
     * @return {mode:{Object}, name:string}
     */
    function getModeAt(cm, pos) {
-        var outerMode = cm.getMode(),


This function was written this way for nested mode support, so we need to make sure that cm.getModeAt(pos) supports that. I'm seeing the HTMLUtils InlineEditorProviders "should find style blocks" unit test fail.

I haven't tested it, but the CM manual said it returns the mode at the given pos.
I'll look into the unit tests and the Travis failure soon.

For me, it works.
I checked it in a file with mixed mode (HTML/JS) and the unit tests pass for me as well.

I only brought up TokenUtils.getModeAt() because I thought that it may benefit from the cache, but that does not appear to be the case. To keep these improvements isolated so they can be measured separately, this change should be moved to another pull request.

Reverted the change.

marcelgerber · 2014-12-10T06:20:22Z

It's actually not a Brackets Editor object, but these are actually only CodeMirror instances - so I can't use editor.document.
And yeah, I verified that _.sortedIndex returns an index even for numbers not exactly in the array.

redmunds · 2014-12-11T17:08:29Z

these are actually only CodeMirror instances - so I can't use editor.document.

You can listen to the CodeMirror instance "changes" event:

        cm.on("changes", function (instance, changeList) {

        });

marcelgerber · 2014-12-11T18:25:30Z

I have implemented the (still somewhat experimental) CM changes handler.

redmunds · 2014-12-19T19:37:44Z

@marcelgerber This looks good. I found another scenario where this helps and added a test case in description. Squash commits so I can merge this.

marcelgerber · 2014-12-19T21:09:27Z

@redmunds Ready for merge.

redmunds · 2014-12-19T21:26:15Z

Travis CI is failing. It does not appear to be related to this code, but I want to make sure this passes before merging.

redmunds · 2014-12-19T22:58:20Z

Merging.

Improve TokenUtils performance through caching

redmunds added the PR Triage Complete label Dec 4, 2014

redmunds self-assigned this Dec 4, 2014

redmunds reviewed Dec 6, 2014
View reviewed changes

marcelgerber force-pushed the find-blocks-perf branch from 3a1e205 to 7869026 Compare December 6, 2014 22:03

marcelgerber changed the title ~~Improve HTMLUtils.findBlocks performance~~ Improve TokenUtils performance through caching Dec 6, 2014

redmunds reviewed Dec 8, 2014
View reviewed changes

redmunds mentioned this pull request Dec 9, 2014

Opening big minimized css files freezes Brackets #10032

Closed

redmunds reviewed Dec 9, 2014
View reviewed changes

marcelgerber force-pushed the find-blocks-perf branch 2 times, most recently from 01c1925 to 2d920f5 Compare December 10, 2014 17:58

marcelgerber force-pushed the find-blocks-perf branch from 898fe35 to 23774c2 Compare December 11, 2014 17:42

marcelgerber force-pushed the find-blocks-perf branch 2 times, most recently from 18af3ec to 03bd448 Compare December 19, 2014 21:08

Improve TokenUtils performance through caching

792467c

marcelgerber force-pushed the find-blocks-perf branch from 03bd448 to 792467c Compare December 19, 2014 22:29

redmunds added a commit that referenced this pull request Dec 19, 2014

Merge pull request #9964 from MarcelGerber/find-blocks-perf

5ecc4ff

Improve TokenUtils performance through caching

redmunds merged commit 5ecc4ff into adobe:master Dec 19, 2014

marcelgerber deleted the find-blocks-perf branch December 19, 2014 23:03

redmunds mentioned this pull request Dec 27, 2014

Enhance QuickEdit performance #9541

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve TokenUtils performance through caching #9964

Improve TokenUtils performance through caching #9964

marcelgerber commented Nov 18, 2014

redmunds Dec 6, 2014

TomMalbran Dec 6, 2014

marcelgerber commented Dec 6, 2014

redmunds commented Dec 8, 2014

redmunds Dec 8, 2014

redmunds Dec 8, 2014

marcelgerber Dec 8, 2014

TomMalbran Dec 8, 2014

marcelgerber Dec 9, 2014

redmunds commented Dec 8, 2014

redmunds commented Dec 8, 2014

marcelgerber commented Dec 8, 2014

redmunds commented Dec 8, 2014

marcelgerber commented Dec 9, 2014

redmunds commented Dec 9, 2014

redmunds Dec 9, 2014

marcelgerber Dec 10, 2014

marcelgerber Dec 10, 2014

redmunds Dec 11, 2014

marcelgerber Dec 11, 2014

marcelgerber commented Dec 10, 2014

redmunds commented Dec 11, 2014

marcelgerber commented Dec 11, 2014

redmunds commented Dec 19, 2014

marcelgerber commented Dec 19, 2014

redmunds commented Dec 19, 2014

redmunds commented Dec 19, 2014

Improve TokenUtils performance through caching #9964

Improve TokenUtils performance through caching #9964

Conversation

marcelgerber commented Nov 18, 2014

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcelgerber commented Dec 6, 2014

redmunds commented Dec 8, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

redmunds commented Dec 8, 2014

redmunds commented Dec 8, 2014

marcelgerber commented Dec 8, 2014

redmunds commented Dec 8, 2014

marcelgerber commented Dec 9, 2014

redmunds commented Dec 9, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcelgerber commented Dec 10, 2014

redmunds commented Dec 11, 2014

marcelgerber commented Dec 11, 2014

redmunds commented Dec 19, 2014

marcelgerber commented Dec 19, 2014

redmunds commented Dec 19, 2014

redmunds commented Dec 19, 2014