let scanner check for whitespace #1077

GitMensch · 2021-07-06T10:39:35Z

removing the need to create a substring just for the check after which it will be thrown away

This should lead to a minor speedup and less work for the garbage collector and removes a FIXME; but may break the API because there's no default implementation - @angelozerr if you want me to add it I'll do it, using the getTokenText and issue the whitespace-check on this.
It will also bring me to the next entry in the memory handling which currently points to that place

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.Arrays.copyOfRange(Arrays.java:3664)
	at java.lang.String.<init>(String.java:207)
	at java.lang.String.substring(String.java:1969)
	at org.eclipse.lemminx.dom.parser.XMLScanner.getTokenText(XMLScanner.java:1015)
	at org.eclipse.lemminx.dom.DOMParser.parse(DOMParser.java:371)
	at org.eclipse.lemminx.XMLTextDocumentService.lambda$new$0(XMLTextDocumentService.java:162)
	at org.eclipse.lemminx.XMLTextDocumentService$$Lambda$1/1854731462.apply(Unknown Source)
	at org.eclipse.lemminx.commons.ModelTextDocument.lambda$getModel$0(ModelTextDocument.java:66)
	at org.eclipse.lemminx.commons.ModelTextDocument$$Lambda$21/1909505440.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
	... 6 more

removing the need to create a substring just for the check after which it will be thrown away

angelozerr · 2021-07-06T11:30:40Z

Thank so much @GitMensch for your PR. Do you see some improvement with your big XML file?

I think the correct fix is to move the code with whitespace compute in the scanner, because the DOM document is dirty as soon as user type somthing. In otherwords,I think we should remove all code which compute whitespace and remove DOMCharacter#setWhitespace(boolean w) method and implement the DOMCharacter#isWhitespace() which have a Boolean for isWhitespace. When isWhitespace() methosis called, isWhitespace is null for the first time,so you compute in this case if there are some whitespace and you cachethe result inisWhitespace Boolean.

GitMensch · 2021-07-06T11:40:05Z

Do you see some improvement with your big XML file?

Haven't checked anything else but the junit tests to still work and checked "on paper" that we do exactly the same just with removing some function calls and most important the substring. It should at least present another crash :-)

I guess I may be able to place a jar file somewhere and test that. How to create that (I guess the org.eclipse.lemminx-uber.jar)?

The approach you mention sounds like the correct thing to do and improve the overall performance much more if that function is called often.

Can I suggest you to apply your changes (ignoring this PR then) and I test the result?
If you don't see the necessary time for that then I'd suggest to pull this and come back to the DOMCharacter class later (I've not inspected that so far).

angelozerr · 2021-07-06T16:15:28Z

I agrée with you your patch avoid creating string instance. Since tests ate working i merged your pr. Thank you!

GitMensch · 2021-07-06T18:32:18Z

Thanks, someone should have a look at the approach you've mentioned someday, so I'll create an issue for that.

Now tested against the 32MB XML file with -Xmx256, not stopping there but on the substring to get resolve the element tag:

			case StartTag: {
				DOMElement element = (DOMElement) curr;
				element.tag = scanner.getTokenText();
				curr.end = scanner.getTokenEnd();
				break;
			}

And that is likely not a reasonable place to optimize...
One related part that may be reasonable is to do cache the Token text in the XMLScanner, set it to null when the tokenOffset is changed, but I don't know if that would be of any use...

Insights?

let scanner check for whitespace

1083b50

removing the need to create a substring just for the check after which it will be thrown away

angelozerr merged commit a5c533c into eclipse-lemminx:master Jul 6, 2021

angelozerr added the debt label Jul 6, 2021

angelozerr added this to the 0.18.0 milestone Jul 6, 2021

GitMensch deleted the blank-token branch July 6, 2021 18:32

GitMensch mentioned this pull request Jul 6, 2021

Move check for whitespace to DOMCharacterData #1080

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

let scanner check for whitespace #1077

let scanner check for whitespace #1077

GitMensch commented Jul 6, 2021

angelozerr commented Jul 6, 2021 •

edited

Loading

GitMensch commented Jul 6, 2021

angelozerr commented Jul 6, 2021

GitMensch commented Jul 6, 2021

let scanner check for whitespace #1077

let scanner check for whitespace #1077

Conversation

GitMensch commented Jul 6, 2021

angelozerr commented Jul 6, 2021 • edited Loading

GitMensch commented Jul 6, 2021

angelozerr commented Jul 6, 2021

GitMensch commented Jul 6, 2021

angelozerr commented Jul 6, 2021 •

edited

Loading