Skip to content
This repository has been archived by the owner on Sep 6, 2021. It is now read-only.

Improve TokenUtils performance through caching #9964

Merged
merged 1 commit into from
Dec 19, 2014

Conversation

marcelgerber
Copy link
Contributor

For #9717.

Testing

Test 1
Using this file: https://gist.github.com/MarcelGerber/5c11f1031af292044708, place your cursor in the <script> tag and press Ctrl-Space.

Test 2
Using same file, place your cursor in the <body> tag and press Cmd/Ctrl-E.

@redmunds redmunds self-assigned this Dec 4, 2014

while (TokenUtils.moveNextToken(ctx, false)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcelgerber Thanks for digging into this one. That's an impressive improvement.

I'm wondering if the main slow down is in TokenUtils.moveNextToken() because it has to get the full line text for every token. I wonder if the line text can be cached in the "context" object to improve performance? Brackets seems to choke on files with long lines in many places so that would provide a benefit for many other functions and not just this one. Care to give that a try?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@redmunds is not getting the line, is mainly ctx.editor.getTokenAt() which has to parse the entire line up to the given position every time, and essentially parsing the same content many times.

What we could do is, run cm.getLineTokens(line), save the tokens in the ctx and make moveNextToken use those tokens until it gets to a new line, where it will call cm.getLineTokens(line), save it and return the first one.

BTW, why are the TokenUtils function that require a ctx and not a class that can be created one with methods that will not require the ctx?

@marcelgerber
Copy link
Contributor Author

@redmunds @TomMalbran I managed to come up with a caching solution in TokenUtils, which seems to work out quite well.
I'd appreciate your feedback!

@marcelgerber marcelgerber changed the title Improve HTMLUtils.findBlocks performance Improve TokenUtils performance through caching Dec 6, 2014
@redmunds
Copy link
Contributor

redmunds commented Dec 8, 2014

@marcelgerber This is a great start! I verified that it fixes #9717, but it will need lots more testing.

I think the (1 sec) timeout should be replaced by listening for the Document "change" event. If there's a change to the line that's cached, then clear it.

Also, there can be multiple contexts at one time, so it might be worth having a cache for each context object. This is usually used to looking ahead to next (or back at previous) token when parsing without affecting current context, so it will usually use the same cached line, but this is something to keep in mind as a future improvement.

return editor.getTokenAt(pos, precise);
}
var cachedTokens = _manageCache(editor, pos.line),
token = _.find(cachedTokens, function (token) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_.find() iterates over list which is a sequential search. For really long lines of text (e.g. minified files), it's probably worth implementing a binary search here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea is to keep track of last token, and pass in a first/last/prev/next/random flag to provide a hint where to start search. But, this may only work if each context has it's own cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binary search sounds promising. I'll try that once I got the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth it for the future to have a Token class, with move methods and an internal cache, since it looks like after having a context we want to move it searching for something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually found out that Lodash has a binary search function (_.sortedIndex - is there anything that's not in Lodash?!) and well, I used it :)

@redmunds
Copy link
Contributor

redmunds commented Dec 8, 2014

The Editor.getModeForRange() method calls TokenUtils.getModeAt() for every token in a range, so this is another opportunity for improvement using this cache.

@redmunds
Copy link
Contributor

redmunds commented Dec 8, 2014

Done with code review.

@marcelgerber
Copy link
Contributor Author

Ugh, didn't notice there's a JSHint error.
@redmunds Could you just look for a case where multiple contexts are used at the same time?
The problem with caching multiple is that it can get memory-heavy real quick. A minimized file can easily have 15,000 tokens a line, and I don't think we want have really big arrays gambling around...

Ah, and the problem with getModeAt() is that it wants precise results, which is not something we usually want to have cached.

@redmunds
Copy link
Contributor

redmunds commented Dec 8, 2014

Could you just look for a case where multiple contexts are used at the same time?

Search both language/CSSUtils.js and language/HTMLUtils.js and for moveNextToken and movePrevToken you will see many cases.

The problem with caching multiple is...

As I said, this just needs to be kept in mind -- we might not need this. Seems like the worst performance is in minified files where everything is in (mostly) 1 line, so a single cache should work.

@marcelgerber
Copy link
Contributor Author

Just switched getModeAt to use, ehm, cm.getModeAt. I wonder why we hadn't used it before. It's way more performant.

About doc.change events:
The problem here is that we are only passed a CM editor, so AFAIK there's no way to get the corresponding Brackets doc.

@redmunds
Copy link
Contributor

redmunds commented Dec 9, 2014

About doc.change events: The problem here is that we are only passed a CM editor

That's only true for getModeAt(). All of the other funcs seem to have editor so use editor.document.

_.sortedIndex

The problem with _.sortedIndex is that it matches an exact end value. I think this code needs to find a token whose range contains a given pos.

UPDATE: it looks like you can pass a function to _.sortedIndex so that should work.

@@ -158,14 +204,10 @@ define(function (require, exports, module) {
* @return {mode:{Object}, name:string}
*/
function getModeAt(cm, pos) {
var outerMode = cm.getMode(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was written this way for nested mode support, so we need to make sure that cm.getModeAt(pos) supports that. I'm seeing the HTMLUtils InlineEditorProviders "should find style blocks" unit test fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested it, but the CM manual said it returns the mode at the given pos.
I'll look into the unit tests and the Travis failure soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, it works.
I checked it in a file with mixed mode (HTML/JS) and the unit tests pass for me as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only brought up TokenUtils.getModeAt() because I thought that it may benefit from the cache, but that does not appear to be the case. To keep these improvements isolated so they can be measured separately, this change should be moved to another pull request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the change.

@marcelgerber
Copy link
Contributor Author

It's actually not a Brackets Editor object, but these are actually only CodeMirror instances - so I can't use editor.document.
And yeah, I verified that _.sortedIndex returns an index even for numbers not exactly in the array.

@marcelgerber marcelgerber force-pushed the find-blocks-perf branch 2 times, most recently from 01c1925 to 2d920f5 Compare December 10, 2014 17:58
@redmunds
Copy link
Contributor

these are actually only CodeMirror instances - so I can't use editor.document.

You can listen to the CodeMirror instance "changes" event:

        cm.on("changes", function (instance, changeList) {

        });

@marcelgerber
Copy link
Contributor Author

I have implemented the (still somewhat experimental) CM changes handler.

@redmunds
Copy link
Contributor

@marcelgerber This looks good. I found another scenario where this helps and added a test case in description. Squash commits so I can merge this.

@marcelgerber marcelgerber force-pushed the find-blocks-perf branch 2 times, most recently from 18af3ec to 03bd448 Compare December 19, 2014 21:08
@marcelgerber
Copy link
Contributor Author

@redmunds Ready for merge.

@redmunds
Copy link
Contributor

Travis CI is failing. It does not appear to be related to this code, but I want to make sure this passes before merging.

@redmunds
Copy link
Contributor

Merging.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants