-
Notifications
You must be signed in to change notification settings - Fork 47k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fizz] Improve text separator byte efficiency #24630
Conversation
Previously text separators were inserted following any Text node in Fizz. This increases bytes sent when streaming and in some cases such as title elements these separators are not interpretted as comment nodes and leak into the visual aspects of a page as escaped text. The reason simple tracking on the last pushed type doesn't work is that Segments can be filled in asynchronously later and so you cannot know in a single pass whether the preceding content was a text node or not. This commit adds a concetp of TextEmbedding which provides a best effort signal to Segments on whether they are embedded within text. This allows the later resolution of that Segment to add text separators when possibly necessary but avoid them when they are surely not. The current implementation can only "peek" head if the segment is a the Root Segment or a Suspense Boundary Segment. In these cases we know there is no trailing text embedding and we can eliminate the separator at the end of the segment if the last emitted element was Text. In normal Segments we cannot peek and thus have to assume there might be a trailing text embedding and we issue a separator defensively. This should be rare in practice as it is assumed most components that will cause segment creation will also emit some markup at the edges.
Comparing: 05c34de...277beb8 Critical size changesIncludes critical production bundles, as well as any change greater than 2%:
Significant size changesIncludes any change greater than 0.2%: Expand to show
|
You can see if you can remove this hack and see what happens to renderToString tests. |
@@ -279,6 +279,7 @@ function encodeHTMLTextNode(text: string): string { | |||
} | |||
|
|||
const textSeparator = stringToPrecomputedChunk('<!-- -->'); | |||
let lastPushedText = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use ResponseState
for this instead of module scope. It's a little safer because you can technically call renderToString
from within renderToString
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call
The method by which we get segment markup into the DOM differs depending on when the Segment resolves. If a Segment resovles before flushing begins for it's parent it will be emitted inline with the parent markup. In these cases separators may be necessary because they are how we clue the browser into breakup up text into distinct nodes that will later match up with what will be hydrated on the client. If a Segment resolves after flushing has happened a script will be used to patch up the DOM in the client. when this happens if there are any text nodes on the boundary of the patch they won't be "merged" and thus will continue to have distinct representation as Nodes in the DOM. Thus we can avoid doing any separators at the boudnaries in these cases. After applying these changes the only time you will get text separators as follows * in between serial text nodes that emmit at the same time - these are necessary and cannot be eliminated unless we stop relying on the browser to automatically parse the correct text nodes when processing this HTML * after a final text node in a non-boundary segment that resolves before it's parent has flushed - these are sometimes extraneous, like when the next emitted thing is a non-Text node. In all other cases text separators should be omitted which means the general byte efficiency of this approach should be pretty good
return NO_TEXT_EMBED; | ||
} | ||
|
||
export function textEmbeddingForSegment(): TextEmbedding { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this whole feature is DOM specific and ideally we wouldn't punish other environments - it's also really DOM that's our main (and currently only target). I wouldn't be too hurt if this feature was just built into ReactFizzServer instead. Including the lastPushedText part.
Since otherwise we leak a lot of concepts and have to make up flags etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I had this thought as I kept adding more functions to the format config. glad you see it the same way and makes sense. I'll move things over so we can clean up the interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to just use the segment.textEmbedding flag as the running tally for lastPushedText?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not totally, for boundaries, we need to track lastPushedText internally but we never need to push a separator at the end b/c we always have the suspense comment node. I suppose if I could know I was in a segment that was a boundary or the root I could use that as an alternative way to discriminating those cases
and actually for delayed segments we don't want to emit a final separator but they aren't boundaries so I need a way to know if it is delayed or not when I get to the final push
but I can just track that all as another property on the Segment
@@ -1652,6 +1653,7 @@ function flushSubtree( | |||
// We're emitting a placeholder for this segment to be filled in later. | |||
// Therefore we'll need to assign it an ID - to refer to it by. | |||
const segmentID = (segment.id = request.nextSegmentId++); | |||
segment.textEmbedding = textEmbeddingForDelayedSegment(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it always the case that nothing has been written to this segment yet? I think maybe yes, but not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If some extra separators come along for the ride I don't think it should matter. As long as we are using the patchup in DOM we can handle separators but don't need them. so in this path we know we are getting a placeholder and thus will also get a patchup. Changing the embedding here just means that we have an opportunity to drop them if they have not yet been emitted.
That said... I do not know how it could have had anything written without being complete and thus avoiding this path
works beautifully without the hack |
// Adopt the parent segment's leading text embed | ||
segment.lastPushedText, | ||
// Assume we are text embedded at the trailing edge | ||
true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is where it gets non-deterministic because if you didn't suspend here it would not do this and would eliminate the unnecessary trailing edge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I had thought of solving this was to track a flag on both side of the segment. Like if the segment ends with text and another flag if the parent has written text right after that segment and another flag if a segment starts with text.
Then in the flush phase you can insert a separator during the flush if you're inlining two segments that ends and starts with text that are next to each other, or if there's a parent chunk in between you can tell if the parent wrote text after the segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I think I could make that work but I think I'll merge this and think about it some more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is non-deterministic but it's a better tradeoff than what's in main right now.
you nerdsniped me: #24637 |
Summary: This sync includes the following changes: - **[dd4950c90](facebook/react@dd4950c90 )**: [Flight] Implement useId hook ([#24172](facebook/react#24172)) //<Josh Story>// - **[26a5b3c7f](facebook/react@26a5b3c7f )**: Explicitly set `highWaterMark` to 0 for `ReadableStream` ([#24641](facebook/react#24641)) //<Josh Larson>// - **[aec575914](facebook/react@aec575914 )**: [Fizz] Send errors down to client ([#24551](facebook/react#24551)) //<Josh Story>// - **[a2766387e](facebook/react@a2766387e )**: [Fizz] Improve text separator byte efficiency ([#24630](facebook/react#24630)) //<Josh Story>// - **[f7860538a](facebook/react@f7860538a )**: Fix typo in useSyncExternalStore main entry point error ([#24631](facebook/react#24631)) //<François Chalifour>// - **[1bed20731](facebook/react@1bed20731 )**: Add a module map option to the Webpack Flight Client ([#24629](facebook/react#24629)) //<Sebastian Markbåge>// - **[b2763d3ea](facebook/react@b2763d3ea )**: Move hydration code out of normal Suspense path ([#24532](facebook/react#24532)) //<Andrew Clark>// - **[357a61324](facebook/react@357a61324 )**: [DevTools][Transition Tracing] Added support for Suspense Boundaries ([#23365](facebook/react#23365)) //<Luna Ruan>// - **[2c8a1452b](facebook/react@2c8a1452b )**: Fix ignored setState in Safari when iframe is touched ([#24459](facebook/react#24459)) //<dan>// - **[62662633d](facebook/react@62662633d )**: Remove enableFlipOffscreenUnhideOrder ([#24545](facebook/react#24545)) //<Ricky>// - **[34da5aa69](facebook/react@34da5aa69 )**: Only treat updates to lazy as a new mount in legacy mode ([#24530](facebook/react#24530)) //<Ricky>// - **[46a6d77e3](facebook/react@46a6d77e3 )**: Unify JSResourceReference Interfaces ([#24507](facebook/react#24507)) //<Timothy Yung>// - **[6cbf0f7fa](facebook/react@6cbf0f7fa )**: Fork ReactSymbols ([#24484](facebook/react#24484)) //<Ricky>// - **[a10a9a6b5](facebook/react@a10a9a6b5 )**: Add test for hiding children after layout destroy ([#24483](facebook/react#24483)) //<Ricky>// - **[b4eb0ad71](facebook/react@b4eb0ad71 )**: Do not replay erroring beginWork with invokeGuardedCallback when suspended or previously errored ([#24480](facebook/react#24480)) //<Josh Story>// - **[99eef9e2d](facebook/react@99eef9e2d )**: Hide children of Offscreen after destroy effects ([#24446](facebook/react#24446)) //<Ricky>// - **[ce1386028](facebook/react@ce1386028 )**: Remove enablePersistentOffscreenHostContainer flag ([#24460](facebook/react#24460)) //<Andrew Clark>// - **[72b7462fe](facebook/react@72b7462fe )**: Bump local package.json versions for 18.1 release ([#24447](facebook/react#24447)) //<Andrew Clark>// - **[22edb9f77](facebook/react@22edb9f77 )**: React `version` field should match package.json ([#24445](facebook/react#24445)) //<Andrew Clark>// - **[6bf3deef5](facebook/react@6bf3deef5 )**: Upgrade react-shallow-renderer to support react 18 ([#24442](facebook/react#24442)) //<Michael サイトー 中村 Bashurov>// Changelog: [General][Changed] - React Native sync for revisions bd4784c...d300ceb jest_e2e[run_all_tests] Reviewed By: cortinico, kacieb Differential Revision: D36874368 fbshipit-source-id: c0ee015f4ef2fa56e57f7a1f6bc37dd05c949877
[Fizz] Improve text separator byte efficiency
Previously text separators were inserted following any Text node in Fizz. This increases bytes sent when streaming and in some cases such as title elements these separators are not interpreted as comment nodes and leak into the visual aspects of a page as escaped text.
The reason simple tracking on the last pushed type doesn't work is that Segments can be filled in asynchronously later and so you cannot know in a single pass whether the preceding content was a text node or not. This commit adds a concept of TextEmbedding which provides a best effort signal to Segments on whether they are embedded within text. This allows the later resolution of that Segment to add text separators when possibly necessary but avoid them when they are surely not.
The current implementation can only "peek" head if the segment is a the Root Segment or a Suspense Boundary Segment. In these cases we know there is no trailing text embedding and we can eliminate the separator at the end of the segment if the last emitted element was Text. In normal Segments we cannot peek and thus have to assume there might be a trailing text embedding and we issue a separator defensively. This should be rare in practice as it is assumed most components that will cause segment creation will also emit some markup at the edges.
Another strategy employed is to take advantage of the two methods by which we get Nodes into the DOM. The method by which we get segment markup into the DOM differs depending on when the Segment resolves.
If a Segment resovles before flushing begins for it's parent it will be emitted inline with the parent markup. In these cases separators may be necessary because they are how we clue the browser into breakup up text into distinct nodes that will later match up with what will be hydrated on the client.
If a Segment resolves after flushing has happened a script will be used to patch up the DOM in the client. when this happens if there are any text nodes on the boundary of the patch they won't be "merged" and thus will continue to have distinct representation as Nodes in the DOM. Thus we can avoid doing any separators at the boudnaries in these cases.
After applying these changes the only time you will get text separators as follows
In all other cases text separators should be omitted which means the general byte efficiency of this approach should be pretty good