Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[idea] Remove data-reactid and traverse immediately after insertion instead (nope, still lazy!) #1570

Closed
wants to merge 4 commits into from

Conversation

syranide
Copy link
Contributor

Part 1 of #1550

This PR removes data-reactid from the DOM and reconstructs it on-demand, then storing it as a hidden property __reactID on the node instead. Nodes and instances are traversed as pairs, in sync, when reconstructing.

Peformance comparison:
We have the option to choose between keeping the current on-demand traversal of nodes, or switching to immediate traversal after insertion into the DOM. Running my old super synthetic benchmark, creating 11 level deep trees of plain divs, measuring mount+unmount. The lazy method is roughly 15% faster it seems, adding just one attribute to all nodes reduces it to about 9%, adding one more reduces it to about 5% (for IE8 and Chrome).

However, it is important to note that these numbers are significantly inflated as they do not include reflow (which seems to increase running time of my tests by about 75%), actual rendering nor any other common React overhead such as composite components, handling event listeners, etc. It should also be noted that as of current, React will immediately traverse to any node containing an onClick-listener or an image with onLoad/onError-listeners or if you call getDOMNode in componentDidMount. Additionally, any updates applied to the DOM also require traversal to each affected node. None of which are included in this benchmark.

So while end-user measurable performance impact is hard to quantify, I think it's safe to say that at worst it may be effectively increased by 5% (excluding rendering), add a bunch of attributes, add some React overhead, onClick-listeners and we're down to perhaps 1-3%? In that context, I think it's important to consider that there are important real-life implications of on-demand traversal:

For immediate traversal; testing whether a node is a React-node is always O(1). We also avoid a lot of complex code that is otherwise there to ensure that a node exists in the cache, which means that going from node to instance to node is super cheap, which may open up some interesting potential down the road. We also detect invalid DOM nesting on insertion (for free). The size of React decreases "measurably" and code complexity is decreased.

For on-demand traversal; we must be able to traverse up the component instance parent tree and each node instance must immediately register itself with ReactMount. Testing whether a node is a React node is O(1) for cached React nodes and "up to" O(depth) for uncached React nodes or non-React nodes. We also lose the guaranteed/immediate validation of DOM nesting (although we can enable it for DEV).

There are a lot of improvements that this PR enables that is independent of immediate or on-demand traversal, but lets not make this wall of text any longer. I can see the benefit of both methods, but I think immediate traversal intrigues me the most given its modest cost, it's also uniform and deterministic during insertion and over time (not that it really matters given the modest performance implications, perhaps).


#1568 and #1569 must be fixed before this PR can be taken (or simple temporary workarounds have to be put in place). Add #1578 too and React leaves no lasting print in the visible DOM.

This PR has commits for both the immediate and on-demand traversal, both are fairly final implementations (obviously pending some decisions / feedback though).


@chenglou
Copy link
Contributor

Oh man =D
For 1: possibility of removing renderToStaticMarkup, which is only there because of dangling checksum and react-id from renderComponentToString. By removing react-id I think there's enough benefit to remove the renderToStaticMarkup API altogether.

@syranide
Copy link
Contributor Author

@chenglou You bet! I saw the discussion on IRC and while data-checksum may keep it from going away. However, we can actually get rid of data-checksum as we now traverse the DOM and can guarantee that an "incompatible DOM" cannot be reused, but we cannot guarantee that it's identical (different styles, texts, etc) which at the very least means that we probably want to keep it for the dev-build (but possibly drop it for the prod build, or make it optional for renderComponentToString perhaps?).

string renderComponentToString(ReactComponent component, boolean noChecksum)

Is one possibility.

@chenglou
Copy link
Contributor

string renderComponentToString(ReactComponent component, boolean noChecksum)

That was actually the initial API for renderToStaticMarkup, but boolean args wasn't encouraged so I changed it. I was more interested in killing the API altogether.

@syranide
Copy link
Contributor Author

I did a simple test of unboxing ReactTextComponent spans, and I couldn't even measure a noticeable difference (artificial test tree with tons of ReactTextComponents). A neater solution is to not generate the spans at all but to split/insert TextNodes during traversal instead.

However, knowing of the (by now very old and fixed) FF bug where it automatically split TextNodes (hard to test different FF versions) I'm unsure if there may be issues and if it's perhaps preferable to solve this higher up in React (i.e, not manage the TextNodes individually, but rather as a single TextNode).

@sophiebits
Copy link
Collaborator

I'm pretty sure I want to solve this higher up. At any rate, it should be a separate diff.

@syranide
Copy link
Contributor Author

@spicyj Yeah, the more I think about it, that seems like the right way.

@syranide
Copy link
Contributor Author

I did some preliminary benchmarking with the perf-tests and it looks really good! Even with just this one piece in place. It's hard to give firm numbers, but it's everything from 0-25% improvement, averaging at about 5-15% in tests that actually seem affected.

So, the current implementation is lazy again, it adds 3 new properties:

  1. _parentNodeID is added to all React component instances, since we must be able to traverse up the instance tree to support lazy evaluation. A parentID is simply passed along-side rootID to component constructors and stored on the instance.
  2. reactDOMInstancesByReactRootID is required by getNode, it is used to translate a React ID to an instance to a React node component, which is required to lazy evaluate its parents. This is currently in mountComponent.
  3. owningInstancesForNodeByReactRootID is required only for DEV, it's used for constructing the "human readable path" to the error, as opposed to our old method of displaying the React ID, which is no longer visible (perhaps React DevTools could have a feature for finding a React ID?).

There are some ugly hacks while we wait for #1568 and #1569, all tests pass.

I have currently left some commented code in ReactMount. I can happily delete it, I wrote it under the assumption that _renderedChildren is an ordered list, which it wasn't, but given that _mountIndex is on the kill list I left it there for now. Perhaps the best solution would be to get rid of the dependence on _mountIndex and instead call evaluateNode/InstanceParents before processing updates for a node/instance. That way it could still rely on _renderedChildren being the correct order (which it is before the DOM is mutated).

registerDOMInstance is currently run on mountComponent which means it's run for renderComponentToString and renderComponentToStaticMarkup, which is bad! What's the right solution here? Simply add a check in mountComponent for transaction.renderingToString?

Something to consider; if we immediately traverse the DOM the user gets instant feedback on invalid nesting, not when you happen to access or modify it. It would be super easy to make it do that in DEV but keep the current behavior in PROD, but I fear that having separate behaviors for DEV and PROD could be dangerous. A possibility is to have DEV immediately only traverse the DOM and not cache the node/instance pairs, traversing the DOM is not significantly expensive (up to 10% of DOM construction cost it seems, not including rendering and layouting).

There are many more improvements I want to follow up with, but one step at a time.

   raw     gz Compared to master @ 46de927a28eb788b4329589658dbd31a014e4edd
 +3204   +375 build/react-with-addons.js
  +474   +132 build/react-with-addons.min.js
 +3204   +375 build/react.js
  +474   +140 build/react.min.js

@syranide syranide changed the title Remove data-reactid and traverse immediately after insertion instead Remove data-reactid and traverse immediately after insertion instead (nope, still lazy!) May 24, 2014
@syranide
Copy link
Contributor Author

Pushed another commit that uses Map when available instead of adding a property to the DOM nodes. Measurably faster for IE11, couldn't measure any difference for Chrome (with ES6).

@sebmarkbage
Copy link
Collaborator

Interesting stuff. I do like the simplicity of this. I wonder if there's a way we can bail out quickly (sort of a shouldComponentUpdate thing) for subtrees we know that we won't need to touch.

We should also evaluate just using document.createElement instead of batch generation, or a mix depending on heuristics/browser.

@syranide
Copy link
Contributor Author

@sebmarkbage I believe @petehunt experimented a bit with cloning bare subtrees and reusing them. I think he managed to get some positive/interesting results (lists of similar/identical components are ideal and common).

I'm not sure what you mean about "bail out quickly", the latest commit in this PR currently implements the same lazy approach that React already uses (some earlier commit implements the non-lazy approach). There's only one practical disadvantage and it's that events originating on non-React nodes must always traverse up to the nearest React component or the document element (for the lazy approach). However as I mention/detail in the OP, the non-lazy approach is perhaps only 1-3% slower in the practical worst-case and brings a lot of other potentially interesting benefits, so that is worth considering as well IMHO (I haven't really done extensive benchmarking, but I really don't think I'm far off).

While I got you on the hook, #1569 is that something you may have fixed in your "rumored" refactor?

@syranide syranide changed the title Remove data-reactid and traverse immediately after insertion instead (nope, still lazy!) [not yet] Remove data-reactid and traverse immediately after insertion instead (nope, still lazy!) Sep 29, 2014
@syranide syranide changed the title [not yet] Remove data-reactid and traverse immediately after insertion instead (nope, still lazy!) [idea] Remove data-reactid and traverse immediately after insertion instead (nope, still lazy!) Sep 29, 2014
@sebmarkbage
Copy link
Collaborator

It's a pain to rebase these things and risky to make the switch immediately. I think we need feature flags for stuff like this. Probably build flags that can be stripped out.

Flagging as needing revision so that we can add build flags in a different PR and then this can go in under a new build flag.

@syranide
Copy link
Contributor Author

syranide commented Oct 1, 2014

@sebmarkbage I'm not expecting this to be merged any time soon, however, when _parentInstance makes its official appearance... It should then be quite simple to implement the full range of alternatives between today and no react ID (even behind flags I'm quite confident), some of which are far "less controversial" than this. But until there's a _parentInstance (one way or another) this is a no-go.

@syranide
Copy link
Contributor Author

syranide commented Oct 2, 2014

Hmm, pondering this some more, I'm sure I can make this happen without _parentInstance quite easily by simply relying on ReactInstanceHandles instead. It won't be as superficially neat, but it should allow us to test the full range of alternatives in relative isolation; data-react and no React ID, including lazy and non-lazy traversal.

@sebmarkbage How does that sound to you?

@syranide
Copy link
Contributor Author

I went ahead and implemented it using ReactInstanceHandles for fun, it worked out really quite well, but the current core architecture is not conducive keeping it lazy (VDOM/DOM are inconsistent when ReactDOMIDOperations calls getNode).

But immediat traversal is possible as-is and easy and I'm quite sure is better than what we have today (as evidenced by previous perf tests), returning to lazy traversal is possible down the line... but I'm personally quite convinced that the benefits of immediate traversal outweigh the supposedly minor traverse overhead (which is also easier to hide than stuttering animation).

The implementation is actually quite isolated but I imagine you guys don't want to go ahead with until you can test performance with confidence (including lazy traversal).

@syranide
Copy link
Contributor Author

Most of the useful stuff in this PR has already been dissected into other PRs and what's left isn't very useful.

@gaearon
Copy link
Collaborator

gaearon commented May 30, 2015

What is the status of this? Are there plans to eventually get rid of data-reactid? Is this on hold due to more pressing issues? Thanks!

@syranide
Copy link
Contributor Author

@gaearon There's a PR (can't find it) for adding internal parent references which is necessary for the event system to work without data-reactid and there is one or two bugs in React which conflicts (but they can be workaround if they are willing). There is one alternative method that can be implemented without parent references, but... they expressed a desire to extensively benchmark and compare the various approaches before moving ahead.

As for server-side rendering without data-reactid, reconstructing data-reactid client-side is doable today and shouldn't come at a significant cost (and certainly favorable if you can drop data-reactid from the server-side generated markup). I've brought this up before but no one really seemed to care much so I never moved ahead with it.

@callmevlad
Copy link

I've brought this up before but no one really seemed to care much so I never moved ahead with it.

@syranide I care a lot and willing to put up a sizable bounty to get this implemented. I'd love to hear your thoughts on the scope of changes still required to remove data-reactid from the DOM. (My email is in my profile just in case.)

@sophiebits
Copy link
Collaborator

@callmevlad Curious: Why do you care?

@callmevlad
Copy link

@spicyj At Webflow we have a design interface that wraps web inspector-type tools around an iframe that contains a website's actual DOM (as it will be published to the final hosted production environment). A huge selling point of Webflow has been that the contents of that iframe are 100% what-you-see-is-what-you-get and the-same-code-your-developer-would-write, and we routinely demo this in pitches by inspecting elements live as they are modified with Webflow's tools. It goes a long way to convince stubborn frontend developers at larger companies who believe that generating great code is not possible - by directly demonstrating it to them by looking at the DOM.

Now that we're shifting the rendering of the site canvas to React, that "WYSIWYG" property no longer holds true. Inspecting the DOM is decorated heavily with data-reactids, which now makes the code "look generated" and we start hearing things like "that's not how I would write the HTML, therefore we should stick to hand-coding our websites" from potential users/customers. We'd love to avoid that as much as possible.

Also, it just looks much nicer without all the IDs inside DevTools since you can see much more of the DOM in the same amount of real estate :)

@sophiebits
Copy link
Collaborator

Cool, thanks for the context. Webflow looks sweet. :)

@syranide
Copy link
Contributor Author

syranide commented Jun 1, 2015

@callmevlad What can be done somewhat non-invasively (i.e. an easy to maintain fork) at the moment is the "immediate traversal" version of this PR. There are two alternatives:

  1. The very simplest being to simply immediately traverse any rendered nodes, copy data-reactid into node.reactid, remove data-reactid and update internalGetID to read from node.reactid instead (more or less). You still have to ship data-reactid with server-side markup though.
  2. Again, immediately traverse any rendered nodes but reconstruct data-reactid (quite easy too) instead and store it in node.reactid. The rest is same as above, but this doesn't require you to ship data-reactid with the server-side markup.

I can't imagine the devs being OK with having this "hack" in the release version so it would have to be in a separate fork. But as I said earlier I could imagine the server-side part of alternative 2 being shipped though (but not my call) which would make the client-side fork near trivial. I don't imagine the performance impact of this being significant.

For reference, most of the changes in this PR is because of the lazy traversal and updated tests. It's possible I overlooked something minor above, but this PR was functional so it's definitely doable.

@natew
Copy link

natew commented Oct 20, 2015

@syranide Sorry to bring up a possible dead issue, but this is also desirable to me / my team. It seems like your summary is that it's doable, why would devs not be ok with this? Anyway, just wanted to voice more support for this, for a similar reason to Webflow.

@syranide
Copy link
Contributor Author

@natew Good news #5190 #5205

@chyzwar
Copy link

chyzwar commented Jan 25, 2016

Is there is any chance this will land in 0.15 or 0.16?

@zpao
Copy link
Member

zpao commented Jan 25, 2016

It will be a part of 0.15

@callmevlad
Copy link

So glad this finally landed - thanks @syranide @spicyj @gaearon @zpao for helping push this through! 👏 👏 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants