-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blocks: Replace js-beautify with element beautification #6716
Conversation
An example of where whitespace can be seen as problematic on master is for the List block, where there are whitespace nodes within The two options for solutions I have in mind are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice!
@@ -1,4 +1,5 @@ | |||
<!-- wp:pullquote --> | |||
<blockquote class="wp-block-pullquote alignnone"> | |||
<p>Testing pullquote block...</p><cite>...with a caption</cite></blockquote> | |||
<p>Testing pullquote block...</p><cite>...with a caption</cite> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated. Ugh, maybe we should wrap cite elements in a footer or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should wrap cite elements in a footer or something?
Would the benefit there be merely to put cite
on its own line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it doesn't make sense to have phrasing content next to block level content (cite
and p
on the same level). I guess that's just a personal thing.
element/serialize.js
Outdated
* https://developer.mozilla.org/en-US/docs/Web/HTML/Inline_elements | ||
* | ||
* [ ...document.querySelectorAll( '.threecolumns code' ) ] | ||
* .map( ( el ) => el.textContent.replace( /(^<|>$)/g, '' ) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite following this comment and how it is related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite following this comment and how it is related.
In case anyone is curious how the set is determined, or in case it ever needs to be updated in the future (assuming the structure of the referenced documentation page doesn't change).
I'll add a bit more clarity to the comment, as I did with "Extracted from" in the block validator which has similar comments:
gutenberg/blocks/api/validation.js
Lines 84 to 91 in 7ad3fa4
* Extracted from: https://html.spec.whatwg.org/multipage/indices.html#attributes-3 | |
* | |
* Object.keys( [ ...document.querySelectorAll( '#attributes-1 > tbody > tr' ) ] | |
* .filter( ( tr ) => /^("(.+?)";?\s*)+/.test( tr.lastChild.textContent.trim() ) ) | |
* .reduce( ( result, tr ) => Object.assign( result, { | |
* [ tr.firstChild.textContent.trim() ]: true | |
* } ), {} ) ).sort(); | |
* |
@@ -1,3 +1,5 @@ | |||
<!-- wp:video --> | |||
<figure class="wp-block-video"><video controls src="https://awesome-fake.video/file.mp4"></video></figure> | |||
<figure class="wp-block-video"> | |||
<video controls src="https://awesome-fake.video/file.mp4"></video> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice!
@@ -1,3 +1,3 @@ | |||
<!-- wp:paragraph --> | |||
<p>This is an old-style text block. Changed to <code>paragraph</code> in #2135.</p> | |||
<p>This is an old-style text block. Changed to <code>paragraph</code> in #2135.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. So beautify is also dropping multiple spaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. So beautify is also dropping multiple spaces?
Yes, presumably because in a browser there would be no rendered difference (whitespace collapsing).
element/serialize.js
Outdated
content += '<' + type + attributes; | ||
|
||
if ( SELF_CLOSING_TAGS.has( type ) ) { | ||
return content + '/>'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a space for self closing tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a space for self closing tags?
Heh, I went back and forth on this a few times. I don't feel particularly strongly about it, and ultimately weighed consistency of this "beauty" before deciding to remove it: Why is there a space after the last attribute when a tag is self-closing, but not when it has a separate explicit closing tag? Why do we otherwise not include the space before slash in a closing tag, e.g. < /figure>
?
I'd agree that at least by anecdotal observation, the space before the self-closing slash is common for well-formatted HTML, though I couldn't personally come to a conclusion as to why this is objectively the case.
I'll see if I can add it without conflicts, at least so that we're consistent with the previous output, and if future debate is warranted, I'm open to discussing further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't matter to me. Just thought it would avoid some changes in the tests. Maybe we can also drop the /
if it only needs to be HTML5 compatible. I guess not since not all themes will be HTML5.
element/serialize.js
Outdated
'dfn', | ||
'em', | ||
'i', | ||
'img', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is strange to me that img
is included in this list but not audio
, video
, iframe
etc. Should it also be dropped here? Maybe we should limit this list to phrasing content excluding embedded content? This would also mean dropping form elements as it's not phrasing content. https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Content_categories
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is strange to me that
img
is included in this list but notaudio
,video
,iframe
etc. Should it also be dropped here?
As noted in above comment, this list was generated from MDN's documentation and aligns precisely with the list described here:
https://developer.mozilla.org/en-US/docs/Web/HTML/Inline_elements#Elements
More specifically, the specification describing the processing of whitespace makes a distinction on "inline formatting context": https://www.w3.org/TR/css-text-3/#white-space-rules
Trying to dig into where the basis for img
as inline originated, I couldn't find the exact specification, and it may instead be on the basis of inline
being the initial value for the display
CSS property and most user-agent stylesheets ([1], [2]) not otherwise specifying a display
value for img
. That said, these stylesheets also don't tend to include elements like audio
and such, and indeed in my testing browsers report their display
as inline
:
https://codepen.io/aduth/pen/WJBNBM?editors=1111
Maybe instead then we should invert this to be inclusive only on block content? Since this will be easier to maintain into the future, since new elements will be added whose default is not set and will assume the initial value of inline
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to follow the lead of the specification here, unless I've misinterpreted it, or there's some other reason js-beautify chose to use phrasing content (what's the relation between phrasing content and inline elements?).
* @return {string} Indented content. | ||
*/ | ||
export function getIndentedContent( string ) { | ||
return string.replace( /(.*\S)/g, '\t$1' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite getting this regular expression. Why not /\S/.test( string )
?
c4b8ff5
to
9707d21
Compare
I took a pass at bringing this up-to-date, and ran into a number of other frustrations, largely around how we re-parse beautified content, since the added whitespace is considered as meaningful nodes.
with ( wp.blocks ) parse( serialize( createBlock( 'core/heading', { content: [ 'Lorem ', { type: 'div', props: { children: [ 'ipsum' ] } } ] } ) ) )[ 0 ].attributes.content I would expect the parsed
expect( wp.blocks.parse( serialize( block ) )[ 0 ].attributes.content ).toBe( 'Invalid' ); But in fact it fails:
It's obvious then the presence of inner blocks can affect the parsed attributes of its parent. The test has yet another result if the created block has no children.
See the included changes for I chatted briefly with @mtias a bit about the value of beautifying the markup. I'm beginning to think it's causing more trouble than its worth, at least for beautification within a block. The original issue for adding whitespace (#633) really only sought to address what, at the time, was a giant blob of text for the entire post. Maybe it's enough to have some newlines between blocks, and no other beautification within, if that alone doesn't cause much trouble. Will have to see how well this works with inner blocks. It seems like if this will make parsing more predictable, and respect existing whitespace like raised in #4456 and #633 (comment) . (cc also @youknowriad re: #663) |
@aduth So is the plan to drop beautification? |
Closing in favor of #7892 |
Fixes #4456
This pull request seeks to replace
js-beautify
with a beautification option built-in towp.element.renderToString
. This reduces the bundle size ofbuild/blocks/index.js
from 242kb (87kb gzipped) to 194kb (68.9kb gzipped), a savings of 48kb (18.1kb gzipped). Further, it should have a runtime performance benefit, as the save content generated for a post no longer needs to be re-parsed and re-generated by the beautifier as a post-processing step after serialization, and is instead built into the serialization itself. Furtherer, it should hold us more accountable to legitimate issues which were made non-obvious by prior beautification behavior obscuring (e.g. whitespace collapsing).In-Progress: There are some issues with raw transforms resulting in failing tests. This is specifically caused by the behaviors of
children
matchers in respecting HTML whitespace (vianode.childNodes
). This is considered a blocker, but should be explored separately: Either ignoring whitespace-only nodes fromchildren
, or ensuring that whitespace is collapsed before passing HTML into achildren
matcher.