-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blocks: Remember raw source block for invalid blocks. #38923
Conversation
Size Change: +55 B (0%) Total Size: 1.16 MB
ℹ️ View Unchanged
|
a779af7
to
8a73c1c
Compare
cc: @youknowriad thanks for reviewing the first draft of this. this one I believe is ready for merging plus proposes a more stable API that will serve us better as we clean up all the places that corrupt or erase existing content. |
// Preserve the original unprocessed version of the block | ||
// that we received so that we can preserve it in its | ||
// existing state when we save. | ||
updatedBlock.source = rawBlock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what other places we will need the rawBlock
? Could we just pass the serialized raw block here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I am not understanding is: how is this different from updatedBlock.originalContent
? Indeed, this value is already available in BlockListBlock
at props.block.originalContent
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my first attempts were to pass the serialized raw block but we actually use various forms of the original block when performing comparisons of an "invalid block" and various ways of resolving it. further we have various ways of resolving it, some involving splitting that block into multiple blocks.
originalContent
is fundamentally flawed in that it only looks at the innerHTML
property of the raw block (to the neglect of inner blocks) which is the cause for a host of data loss and corruption issues and is also one of the reasons we can't convert non-nested blocks to nested blocks without creating those opportunities for data loss. technically it's the combination of our reliance on originalContent
, innerHTML
, and the way we call getSaveContent(…)
to re-generate markup for invalid blocks; this PR is one step in undoing that so that we can preserve that unrecognized content and resolve issues in less risky ways. in other words the difference is that this is fix to a defect that originalContent
introduced and unfortunately we can't fix originalContent
without changing everything that depends on it.
I understand it's making this part of exposed surface of the block but we use this information, as we do with originalContent
, in different places around the editor and I've personally concluded we have to carry it around if we want to avoid the corruption, data loss, and UX problems (such as demonstrated in the screenshot above where it looks like the list lost all its bullet points). Ultimately I would like to get rid of originalContent
but as noted in #38922 (and in the PR description above) that's a much broader change and I couldn't get it in a comprehensive yet focused PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way to access rawBlocks on save if it's stored somewhere. Let's make the following distinction: It could be a part of the block object, or it could live outside of the block object.
I pondered a different variations of the latter, such as saving rawBlocks in a redux store, but I don't see a reliable way to retrieve them later. We would need a function such as ( block ) => lookupKey
that returns the same result throughout the entire lifecycle of the block
object. However, the clientId
may change just like the block props and about any other detail. I don't think such function exists at the moment.
Generating a stable id like block.stableLifecycleId = uuid()
would make it possible, but it would then require additional machinery like maintaining a lookup table, garbage collection, and such. It would buy us some memory, but the rawBlock would still become a public API.
And so, to me it seems like a sensible start. Perhaps we could call it __unstableSource
instead of source
to make the stakes lower in case it doesn't pan out somewhere down the road, but other than that I like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I missed this comment @adamziel.
honestly I am fairly persuaded by the source
/__unstableSource
method, but I want to make sure we exhaust our search of other options that I may be less persuaded at initially.
in this discussion I think it's become even more clear that the block node itself is the place where this should live, and I'll rename to __unstableSource
Thanks for testing @scruffian. can you share the markup you used to break those columns? I recognized what's happening, but at least in my head it's a different problem. I believe that the block in the picture is throwing a JS error (can you check the console). If it's related to the new fallback serialization I bet I'm assuming a field is present that isn't. |
8a73c1c
to
ee6b540
Compare
…lue. While working on #38923 with invalid blocks it happened that a JavaScript error surfaced when working with a block that specified an incorrect layout type. In the case at hand the block asked for the `flerx` alignment instead of the `flex` alignment. It's reasonable to expect blocks to call for unrecognized layouts, especially if a plugin introduces new layouts and then blocks are loaded without that plugin's support. When `getLayoutType` is unable to find a recognized layout it returns `undefined` which causes numerous issues with calling code that expects a valid layout. In this patch we're guarding for those cases and if no known layout can be found for the specified layout type then we return the default layout instead. This introduces data corruption so it could be that this solution is more of a coverup than a fix. There is no way however to ignore that layout and preserve the behavior while still editing the block. On the other hand a block should not fail validation due to an implementation bug in the core editor and if this fix truly resolves the issue we should find invalidated blocks and prevent editing anyway (unfortunately this is not the case in this patch).
Thanks for your review and patience with me @scruffian; it took me a while to get back and investigate this. I think I did see this on my own when testing but I overlooked it because it wasn't related to the breakage that this PR addresses. In fact that does already exist in |
Actually it's fairly trivial to go ahead and show the preview even with JS crashes but it does require that we always save a copy of Right now there's no way of telling at the parsing stage if a JS error will later hide a block preview. We can either show that preview anyway (and keep around a reference to the raw parsed block) or maintain the current behavior which is to hide everything in that block (and let the garbage collector get around to clearing up that now-unused reference - it's always located when parsing anyways). |
I'm still unclear about how this approach is better than #38460 In terms of performance (memory), both add bytes to the block object, and if we're keeping the property for just invalid block (which I think is the right thing to do), I'd argue that preserving the string is probably better than preserving the objects (more data). In terms of APIs, I think keeping the original HTML makes sense, keeping an intermediary block format that is confusing because we can end up with two different block formats in different parts of our code base. If that's limited to the parsing function, that's fine. For the use-case of the JS error of a valid block, I don't think we should anything special there, the block author needs to be aware of the JS error to solve it, and I'm not sure that showing a preview is a good idea. In all cases, I still prefer to only add the property for invalid blocks to keep the impact small. |
I wonder if #39183 would help with the UI aspect of making the invalid blocks non-editable |
Thanks @youknowriad for the review!
The text-only approach led to multiple dead-ends particularly around resolving block errors. It's complicated to describe right now because so many pieces are broken (for example, we currently wipe out inner blocks when running certain block resolutions). There's certain need in comparing while ignoring inner blocks, but only if we ignore them after re-generating their content. I did explore a number of options including passing the raw strings around and then running the parser on those snippets as needed, but it definitely wasn't that elegant and I feel like it exposed even more new API surface area. Ultimately the reason it's not just better, but workable where #38460 isn't, is that we use the structural form of a broken block for various contexts where we'd end up needing to store multiple text copies of the block in memory if that's all we stored:
Further now we have the recursive question: what to do about those inner blocks. We descend and store all these strings and inner blocks and then recurse in another way if any of those inner blocks also happens to be broken. By storing the raw parsed content we get this:
I'm still not sold on the performance argument without any measurement. Let's zoom out a bit and ask where the impact is likely to accrue: only in posts with broken blocks; only in posts with lots of broken blocks. Put in context:
So I view this as the cost of that safety. Is an object we already have worse than a string we have to generate and process? I don't know but I'm open to discuss any data we can find that speaks to it.
I hear that but I think we can't avoid this. We'll carry around that string and have different forms of the string plus inner blocks if we go the text route. Notable points of safety here:
We don't have to swallow the error, but I'm curious why we wouldn't want to show a preview of the block. There's nothing wrong with the blocks in these cases and the markup is possibly valid. If we don't show a preview we uphold the current state of showing very different versions of the post in the editor vs. when rendered. Would be very easy to mess up and publish something you didn't mean to because large amounts of content was hidden inside a block whose JS threw.
Regardless this PR is only addressing the case for blocks that fail to validate during parse. I think eventually though we will have to decide if we want to preserve blocks whose implementations are buggy or let one error cascade into data loss. At that point we'll probably have to made a tradeoff between potential performance impact and trust and data security.
"invalid" blocks are currently non-editable. there are a couple of problems though revolving around the fact that corruption is introduced during the loading of blocks out of the parser (meaning the defects are introduced before locking) and also that for some "invalid" blocks the editor doesn't realize they're invalid because it ignores inner blocks and so it wouldn't have any reason to lock them either. |
ee6b540
to
22cb97e
Compare
Thanks for expanding on your reasoning @dmsnell I feel like there's no perfect solution here, maybe the "source" is just something internal to the framework and not exposed to the user (at least for now) which would ease some of my concerns. In that sense, doing what @adamziel suggest could be a potential approach, aka
I might have misunderstood the solution proposed, I think the error should be prominent as the JS error can be anything and we shouldn't just replace it with the preview at the risk of confusing the user. |
One thing I thought about but didn't explore further is storing the source separately, either in a global or in the data stores. A block implementation could request it, or the The problem with that approach in my head was just the complexity it introduced in comparison to the more pragmatic approach of storing this data on the block node itself. It doesn't seem unreasonable to me to communicate this source information through some global in the parser module, but then we'd have to be more careful about purging that store once a block is deleted, something we don't have to do when storing it on the block itself. Happy to explore other approaches if you want me to. At least at this point I know how to start fitting it together once we track the source in whichever manner we do.
yeah I agree, and I look at this from the perspective that we're already in a broken condition by the point where this code matters. things are already broken, how can we stop the bleeding?
what were you thinking of for this? something like I mentioned with globals/stores? or just |
@dmsnell I believe that the
This bit us in the widgets editor, see #28379:
It says "outside of a block editor instance," so maybe it wouldn't be a problem for this PR after all – I'm not sure, more investigation is needed to decide. Either way, this PR is the simplest way forward, and if the API is unstable then the door to using a global store remain open. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more change is needed (__unstableSource
) to get this PR across the finish line, but since the ideas and the implementation seem solid, I'm approving preemptively.
22cb97e
to
602c44f
Compare
Updated to While doing this I wondered about our usage of the We have a few instances of Other than that, any objections to this PR? After this one we can start improving the handling of block resolution, JS errors in the block implementations, and invaliding based on |
No objections left for me here 👍 |
Part of #38922 When the editor is unable to validate a block it should preserve the broken content in the post and show an accurate representation of that underlying markup in the absence of being able to interact with it. Currently when showing a preview of an invalid block in the editor we attempt to re-generate the save output for a block given the attributes we originally parsed. This is a flawed approach, however, because by the nature of being invalid we know that there is a problem with those attributes as they are. In this patch we're introducing the `source` attribute on a block which only exists for invalid blocks at the time of this patch. That `source` carries the original un-processed data for a block node and can be used to reconstruct the original markup without using garbage data and without inadvertently changing it through the series of autofixes, deprecations, and the like that happen during normal block loading. The noticable change is in `block-list/block` where we will be showing that reconstruction rather than the re-generated block content. Previously it was the case that the preview might represent a corrupted version of the block or show the block as if emptied of all its content. Now, however, the preview sould accurately reflect the HTML in the source post even when it's invalid or unrecognized according to the editor. Further work should take advantage of the `source` property to provide a more consistent and trusting experience for working with unrecognized content.
Potentially this is a conflict with #38794 and once `isValidBlockContent` is gone we might be able to remove this change.
14a881a
to
0b721f8
Compare
Following the work of #38923 where we started preserving the source content for blocks that the editor fails to recognize, this patch uses that information in the Code Editor to ensure that we don't accidentally corrupt or lose content when making edits there. Previously the `serializeBlock` function, which generates the HTML for the Code Editor, would try to re-generated the saved content for blocks that we know are invalid. Because we know the input is invalid we can guarantee that the output will have even more problems. Now that `serializeBlock` function is aware of the source content for a block, and if the block is marked invalid _and_ that source exists it will pass that along to the output and skip the processing and re-generation of the saved content. This ensures that errors don't cascade and that unrelated edits to other blocks from the Code Editor won't destroy content in the marked-invalid blocks.
…lue. While working on #38923 with invalid blocks it happened that a JavaScript error surfaced when working with a block that specified an incorrect layout type. In the case at hand the block asked for the `flerx` alignment instead of the `flex` alignment. It's reasonable to expect blocks to call for unrecognized layouts, especially if a plugin introduces new layouts and then blocks are loaded without that plugin's support. When `getLayoutType` is unable to find a recognized layout it returns `undefined` which causes numerous issues with calling code that expects a valid layout. In this patch we're guarding for those cases and if no known layout can be found for the specified layout type then we return the default layout instead. This introduces data corruption so it could be that this solution is more of a coverup than a fix. There is no way however to ignore that layout and preserve the behavior while still editing the block. On the other hand a block should not fail validation due to an implementation bug in the core editor and if this fix truly resolves the issue we should find invalidated blocks and prevent editing anyway (unfortunately this is not the case in this patch).
Following the work of #38923 where we started preserving the source content for blocks that the editor fails to recognize, this patch uses that information in the Code Editor to ensure that we don't accidentally corrupt or lose content when making edits there. Previously the `serializeBlock` function, which generates the HTML for the Code Editor, would try to re-generated the saved content for blocks that we know are invalid. Because we know the input is invalid we can guarantee that the output will have even more problems. Now that `serializeBlock` function is aware of the source content for a block, and if the block is marked invalid _and_ that source exists it will pass that along to the output and skip the processing and re-generation of the saved content. This ensures that errors don't cascade and that unrelated edits to other blocks from the Code Editor won't destroy content in the marked-invalid blocks.
Following the work of #38923 where we started preserving the source content for blocks that the editor fails to recognize, this patch uses that information in the Code Editor to ensure that we don't accidentally corrupt or lose content when making edits there. Previously the `serializeBlock` function, which generates the HTML for the Code Editor, would try to re-generated the saved content for blocks that we know are invalid. Because we know the input is invalid we can guarantee that the output will have even more problems. Now that `serializeBlock` function is aware of the source content for a block, and if the block is marked invalid _and_ that source exists it will pass that along to the output and skip the processing and re-generation of the saved content. This ensures that errors don't cascade and that unrelated edits to other blocks from the Code Editor won't destroy content in the marked-invalid blocks.
…#39523) Following the work of #38923 where we started preserving the source content for blocks that the editor fails to recognize, this patch uses that information in the Code Editor to ensure that we don't accidentally corrupt or lose content when making edits there. Previously the `serializeBlock` function, which generates the HTML for the Code Editor, would try to re-generated the saved content for blocks that we know are invalid. Because we know the input is invalid we can guarantee that the output will have even more problems. Now that `serializeBlock` function is aware of the source content for a block, and if the block is marked invalid _and_ that source exists it will pass that along to the output and skip the processing and re-generation of the saved content. This ensures that errors don't cascade and that unrelated edits to other blocks from the Code Editor won't destroy content in the marked-invalid blocks.
Previously when running "Convert to Blocks" on the invalid-block resolution dialog we have been converting only a portion of the invalid block due to the way we examine the `originalContent.` Since #38923 we have had `__unstableBlockSource` available which tracks the entire contents of the original block that failed to validate. In this patch we're using that source information in order to split apart the invalid block and then separately parse each of its constituent components. The result of this change is that we're able to preserve more block content when resolving an invalid block than we were before. For example, supposing we have a broken container block full of valid inner blocks, we are now able to extract all of those inner blocks and preserve them whereas before we would lose all block information and the stack would turn into an empty classic block.
…lue. While working on #38923 with invalid blocks it happened that a JavaScript error surfaced when working with a block that specified an incorrect layout type. In the case at hand the block asked for the `flerx` alignment instead of the `flex` alignment. It's reasonable to expect blocks to call for unrecognized layouts, especially if a plugin introduces new layouts and then blocks are loaded without that plugin's support. When `getLayoutType` is unable to find a recognized layout it returns `undefined` which causes numerous issues with calling code that expects a valid layout. In this patch we're guarding for those cases and if no known layout can be found for the specified layout type then we return the default layout instead. This introduces data corruption so it could be that this solution is more of a coverup than a fix. There is no way however to ignore that layout and preserve the behavior while still editing the block. On the other hand a block should not fail validation due to an implementation bug in the core editor and if this fix truly resolves the issue we should find invalidated blocks and prevent editing anyway (unfortunately this is not the case in this patch).
…lue. While working on #38923 with invalid blocks it happened that a JavaScript error surfaced when working with a block that specified an incorrect layout type. In the case at hand the block asked for the `flerx` alignment instead of the `flex` alignment. It's reasonable to expect blocks to call for unrecognized layouts, especially if a plugin introduces new layouts and then blocks are loaded without that plugin's support. When `getLayoutType` is unable to find a recognized layout it returns `undefined` which causes numerous issues with calling code that expects a valid layout. In this patch we're guarding for those cases and if no known layout can be found for the specified layout type then we return the default layout instead. This introduces data corruption so it could be that this solution is more of a coverup than a fix. There is no way however to ignore that layout and preserve the behavior while still editing the block. On the other hand a block should not fail validation due to an implementation bug in the core editor and if this fix truly resolves the issue we should find invalidated blocks and prevent editing anyway (unfortunately this is not the case in this patch).
Status
While this PR looks like it introduces a property that doesn't get much
use (
source
) I'd like to go ahead and merge it in to get the most basicfix (invalid block previews are notoriously broken) while setting the stage
for the more thorough fixes to come later.
Everything that relies on
originalContent
and most of what relies oninnerHTML
is broken but fixing those issues is so expansive that I cannotfocus on them in one mega PR. Thankfully, because they are all separate
problems I do believe that we can fix them piecemeal and let "fixes one
thing" take priority over "fixes everything."
Description
Part of #38922
Replaces #38922
When the editor is unable to validate a block it should preserve the
broken content in the post and show an accurate representation of that
underlying markup in the absence of being able to interact with it.
Currently when showing a preview of an invalid block in the editor we
attempt to re-generate the save output for a block given the attributes
we originally parsed. This is a flawed approach, however, because by
the nature of being invalid we know that there is a problem with those
attributes as they are.
In this patch we're introducing the
source
attribute on a block whichonly exists for invalid blocks at the time of this patch. That
source
carries the original un-processed data for a block node and can be used
to reconstruct the original markup without using garbage data and without
inadvertently changing it through the series of autofixes, deprecations,
and the like that happen during normal block loading.
The noticable change is in
block-list/block
where we will be showingthat reconstruction rather than the re-generated block content. Previously
it was the case that the preview might represent a corrupted version of the
block or show the block as if emptied of all its content. Now, however,
the preview sould accurately reflect the HTML in the source post even
when it's invalid or unrecognized according to the editor.
Further work should take advantage of the
source
property to providea more consistent and trusting experience for working with unrecognized
content.
Testing Instructions
To test this you will want to compare the block preview for invalid
blocks in the editor. This change limits itself to that preview and
leaves other broken UI bits broken, to be addressed in follow-up work
for the sake of focus during review and testing.
Two primary cases to asses are when the block attributes fail to
properly parse and when there are inner blocks involved.
Failed attribute parsing
We've removed the
<ul>
tag from the following list. The threebullet points render fine as HTML but are hidden in the editor
because it loads an empty string for the
values
attribute, sinceit was unable to find
LI
content inside aUL
as the attributesourcing describes it should.
trunk
Inner blocks on invalid block
Because inner blocks are ignored by the existing
originalContent
attribute they will disappear in the preview for invalid blocks even if they are themselves valid.trunk
Types of changes
Introduces a new property on a loaded block which refers to the
un-processed block data when it was originally loaded and which
is used to better represent the unrecognized content in the editor.
Checklist:
*.native.js
files for terms that need renaming or removal).