Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SetDocument runs every time you switch the document #292

Open
plusplusben opened this issue Oct 29, 2014 · 14 comments
Open

SetDocument runs every time you switch the document #292

plusplusben opened this issue Oct 29, 2014 · 14 comments

Comments

@plusplusben
Copy link

If sizzle switches between parsing a dom document and an xml document, it calls SetDocument every time it switches. Do you think it would make sense to cache more than just the current document settings?

I created a quick jsfiddle here http://jsfiddle.net/5j9xxocs/ . If you run a profile and then click on the "ClickMe" button, you can see that Sizzle's SetDocument is pretty expensive and runs twice for every iteration. I work on an application that uses Sizzle and switches between parsing XML and DOM frequently. I made a rudimentary fix on my local version of Sizzle that caches the settings for both document types and was able to pretty massively improve the performance of the app.

Would this be something worth adding to Sizzle? Or is my use case pretty edge?

@timmywil
Copy link
Member

I would suspect this is not a common issue, but if we did want to cache all setDocument results, I'd want to limit the number of documents cached (to something like 5). The majority of cases where the document switches could probably be covered by a cache size of 2-5.

@dmethvin
Copy link
Member

Are there any risks to caching a document, especially an XML one? I could see there being some downsides if you ended up caching a large XML document that would otherwise be garbage collected. With a single doc cache that's not very likely. If it's truly rare, another solution would be to load a second copy of Sizzle.

@timmywil
Copy link
Member

There aren't risks to caching one document. The risk comes when there are many documents.

@zachelrath
Copy link

It seems like it would be more effective to have a cache based on "type" of Document, with separate entries only for each unique type of Document actually encountered, since a given browser should handle each type of Document consistently. For instance, If you're rapidly switching between separate XML documents, they're all going to have the same feature supports, but they're different Documents. So if the caching was along the lines of "if isXML, use XML doc support cache, if its a DOM fragment, use DOM fragment support cache, etc...".

With this approach, you don't have to deal with an arbitrary size for the number of Documents to cache, or the garbage collection problem, since actual Documents are not being cached.

Back to the Document-based caching approach, though, another possibility to address the garbage collection problem would be to remove cache entries whenever a given node that is an ownerDocument is removed or appended to another document, at which point that node can no longer be an ownerDocument.

@dmethvin
Copy link
Member

It seems like it would be more effective to have a cache based on "type" of Document, with separate entries only for each unique type of Document actually encountered, since a given browser should handle each type of Document consistently.

You'd still have the potential issue of holding onto an XML document that would otherwise be garbage collectable, for example something fetched via AJAX. The last fetched XML doc would never be released until a new one arrived, if ever.

For instance, If you're rapidly switching between separate XML documents, they're all going to have the same feature supports, but they're different Documents. So if the caching was along the lines of "if isXML, use XML doc support cache, if its a DOM fragment, use DOM fragment support cache, etc...".

If there were really only two document implementations (XML and HTML) the setDocument() overhead could be reduced by caching feature detects the first time the document type was encountered. I don't think that's always the case though. Different implementations of XML documents can occur within the same page (for example, MSXML.DOMDocument, XML+XHTML, and XMLDocument).

another possibility to address the garbage collection problem would be to remove cache entries whenever a given node that is an ownerDocument is removed or appended to another document, at which point that node can no longer be an ownerDocument.

How would Sizzle know that happened? Also, someone could parse an XML string, select some nodes, and let the XML root go out of scope. This in particular seems like a hairy problem to tackle for an edge case.

@gibson042
Copy link
Member

This in particular seems like a hairy problem to tackle for an edge case.

Exactly. There are lots of good ideas and good intentions here, but in practice it's thorny. Pull requests would be evaluated on size, speed, memory consumption, and client compatibility, but I don't foresee myself taking this on.

@timmywil
Copy link
Member

Consensus seems to be patch welcome. We would review a PR, but will not tackle this in the immediate future.

@gibson042
Copy link
Member

A thought for when this is picked up: cache setDocument output by documentElement nodeName/namespaceURI/etc. It's likely that implementations only privilege HTML and maybe SVG anyway, so we should be able to easily support cases that aren't absolutely bonkers.

@gibson042 gibson042 reopened this Mar 24, 2015
avernet added a commit to orbeon/orbeon-forms that referenced this issue Apr 9, 2015
- Works around jQuery jquery/sizzle#292
- Which happens when using jQuery to get values from the DOM
avernet added a commit to orbeon/orbeon-forms that referenced this issue Apr 10, 2015
- Works around jQuery jquery/sizzle#292
- Which happens when using jQuery to get values from the DOM
@timespinner
Copy link

timespinner commented Oct 14, 2016

Any chance this can be reviewed? We are seeing that this dramatically affects performance in an application we are building that alternates reading/writing to a xml document and using DOM elements.

@wojwal
Copy link

wojwal commented Oct 18, 2016

Hi guys,

I also faced this huge performance downgrade when Sizzle is constantly switching the document it works with (via setDocument). As a quick-fix I just applied a way how this behaviour can be disabled. That would solve the problem at least in some cases.

Hopefully this pull request will be accepted? :)

@mgol
Copy link
Member

mgol commented Aug 26, 2019

Copying my comments from #390:

If anyone wants to tackle the issue, this comment from Richard Gibson would be a good start: #292 (comment).

Note: in jQuery 4.0 that no longer uses Sizzle (instead inlining it & simplifying greatly) setDocument will not do as much as it does right now in Sizzle. This is the current code on master, without comments it's just:

function setDocument( node ) {
	var subWindow,
		doc = node ? node.ownerDocument || node : preferredDoc;

	if ( doc === document || doc.nodeType !== 9 ) {
		return;
	}

	document = doc;
	documentElement = document.documentElement;
	documentIsHTML = !jQuery.isXMLDoc( document );

	if ( preferredDoc !== document &&
		( subWindow = document.defaultView ) && subWindow.top !== subWindow ) {

		subWindow.addEventListener( "unload", unloadHandler );
	}
}

(code from https://github.com/jquery/jquery/blob/29a9544a4fb743491a42f827a6cf8627b7b99e0f/src/selector.js#L419-L445)
and it may even get smaller before the final release, who knows.

@mgol
Copy link
Member

mgol commented Jan 21, 2020

To show an example of the performance difference above, this is the test case provided in this issue using jQuery 3.4.1:
https://jsbin.com/cozegop/1/edit
If you open the browser console, you'll see the time measured; on my machine in Firefox 72.0.2 on macOS Catalina it's about 100 ms each time. The same test case with the jQuery version from master (i.e. future jQuery 4.0):
https://jsbin.com/pubobej/1/edit?html,js,output
prints times between 10 & 30 ms.

@mgol
Copy link
Member

mgol commented Dec 15, 2022

An update: jQuery 3.7.0 will not include Sizzle anymore, it has its own internal copy that's smaller than Sizzle.

I run the above test case with the current 3.x-git version and it's also pretty fast. This is mainly because DOM manipulation in jQuery uses jQuery.contains internally (in the isAttached util used by buildFragment) - an util previously being an alias of Sizzle.contains - which calls setDocument in Sizzle as it needs that to detect support for various native methods needed in contains. In jQuery 3.x-git there's only a single implementation of jQuery.contains so setDocument no longer needs to be called.

Test case: https://jsbin.com/yubaket/1/edit?html,js,output

@mgol mgol removed the patchwelcome label Apr 17, 2023
@mgol
Copy link
Member

mgol commented Sep 7, 2023

We're going to sunset Sizzle soon. That said, I won't migrate this issue to jQuery as in jQuery 4.x it will not be an issue anymore due to how simple setDocument is there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

8 participants