Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

let scanner check for whitespace #1077

Merged
merged 1 commit into from
Jul 6, 2021

Conversation

GitMensch
Copy link
Contributor

removing the need to create a substring just for the check after which it will be thrown away

This should lead to a minor speedup and less work for the garbage collector and removes a FIXME; but may break the API because there's no default implementation - @angelozerr if you want me to add it I'll do it, using the getTokenText and issue the whitespace-check on this.
It will also bring me to the next entry in the memory handling which currently points to that place

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.Arrays.copyOfRange(Arrays.java:3664)
	at java.lang.String.<init>(String.java:207)
	at java.lang.String.substring(String.java:1969)
	at org.eclipse.lemminx.dom.parser.XMLScanner.getTokenText(XMLScanner.java:1015)
	at org.eclipse.lemminx.dom.DOMParser.parse(DOMParser.java:371)
	at org.eclipse.lemminx.XMLTextDocumentService.lambda$new$0(XMLTextDocumentService.java:162)
	at org.eclipse.lemminx.XMLTextDocumentService$$Lambda$1/1854731462.apply(Unknown Source)
	at org.eclipse.lemminx.commons.ModelTextDocument.lambda$getModel$0(ModelTextDocument.java:66)
	at org.eclipse.lemminx.commons.ModelTextDocument$$Lambda$21/1909505440.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
	... 6 more

removing the need to create a substring just for the check after which it will be thrown away
@angelozerr
Copy link
Contributor

angelozerr commented Jul 6, 2021

Thank so much @GitMensch for your PR. Do you see some improvement with your big XML file?

I think the correct fix is to move the code with whitespace compute in the scanner, because the DOM document is dirty as soon as user type somthing. In otherwords,I think we should remove all code which compute whitespace and remove DOMCharacter#setWhitespace(boolean w) method and implement the DOMCharacter#isWhitespace() which have a Boolean for isWhitespace. When isWhitespace() methosis called, isWhitespace is null for the first time,so you compute in this case if there are some whitespace and you cachethe result inisWhitespace Boolean.

@GitMensch
Copy link
Contributor Author

Do you see some improvement with your big XML file?

Haven't checked anything else but the junit tests to still work and checked "on paper" that we do exactly the same just with removing some function calls and most important the substring. It should at least present another crash :-)

I guess I may be able to place a jar file somewhere and test that. How to create that (I guess the org.eclipse.lemminx-uber.jar)?

The approach you mention sounds like the correct thing to do and improve the overall performance much more if that function is called often.

Can I suggest you to apply your changes (ignoring this PR then) and I test the result?
If you don't see the necessary time for that then I'd suggest to pull this and come back to the DOMCharacter class later (I've not inspected that so far).

@angelozerr angelozerr merged commit a5c533c into eclipse-lemminx:master Jul 6, 2021
@angelozerr
Copy link
Contributor

I agrée with you your patch avoid creating string instance. Since tests ate working i merged your pr. Thank you!

@angelozerr angelozerr added the debt This issue or enhancement is related to technical debt label Jul 6, 2021
@angelozerr angelozerr added this to the 0.18.0 milestone Jul 6, 2021
@GitMensch GitMensch deleted the blank-token branch July 6, 2021 18:32
@GitMensch
Copy link
Contributor Author

Thanks, someone should have a look at the approach you've mentioned someday, so I'll create an issue for that.

Now tested against the 32MB XML file with -Xmx256, not stopping there but on the substring to get resolve the element tag:

			case StartTag: {
				DOMElement element = (DOMElement) curr;
				element.tag = scanner.getTokenText();
				curr.end = scanner.getTokenEnd();
				break;
			}

And that is likely not a reasonable place to optimize...
One related part that may be reasonable is to do cache the Token text in the XMLScanner, set it to null when the tokenOffset is changed, but I don't know if that would be of any use...

Insights?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
debt This issue or enhancement is related to technical debt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants