Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocks: Replace js-beautify with element beautification #6716

Closed
wants to merge 2 commits into from

Conversation

aduth
Copy link
Member

@aduth aduth commented May 11, 2018

Fixes #4456

This pull request seeks to replace js-beautify with a beautification option built-in to wp.element.renderToString. This reduces the bundle size of build/blocks/index.js from 242kb (87kb gzipped) to 194kb (68.9kb gzipped), a savings of 48kb (18.1kb gzipped). Further, it should have a runtime performance benefit, as the save content generated for a post no longer needs to be re-parsed and re-generated by the beautifier as a post-processing step after serialization, and is instead built into the serialization itself. Furtherer, it should hold us more accountable to legitimate issues which were made non-obvious by prior beautification behavior obscuring (e.g. whitespace collapsing).

In-Progress: There are some issues with raw transforms resulting in failing tests. This is specifically caused by the behaviors of children matchers in respecting HTML whitespace (via node.childNodes). This is considered a blocker, but should be explored separately: Either ignoring whitespace-only nodes from children, or ensuring that whitespace is collapsed before passing HTML into a children matcher.

Whitespace nodes

@aduth aduth added Framework Issues related to broader framework topics, especially as it relates to javascript [Status] In Progress Tracking issues with work in progress labels May 11, 2018
@aduth
Copy link
Member Author

aduth commented May 14, 2018

An example of where whitespace can be seen as problematic on master is for the List block, where there are whitespace nodes within values caused purely by the beautification of the generated markup. I don't think that these should be considered as part of the children result.

image

The two options for solutions I have in mind are:

  • Ignore whitespace-only nodes in children.
    • What about when those nodes are meaningful? Like a paragraph with <strong>Foo</strong> <span>Bar</span>, the whitespace between the two nodes is important.
  • Allow a block to specify childSelector in its children source configuration.

cc @iseulde @youknowriad

Copy link
Member

@ellatrix ellatrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice!

@@ -1,4 +1,5 @@
<!-- wp:pullquote -->
<blockquote class="wp-block-pullquote alignnone">
<p>Testing pullquote block...</p><cite>...with a caption</cite></blockquote>
<p>Testing pullquote block...</p><cite>...with a caption</cite>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated. Ugh, maybe we should wrap cite elements in a footer or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should wrap cite elements in a footer or something?

Would the benefit there be merely to put cite on its own line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it doesn't make sense to have phrasing content next to block level content (cite and p on the same level). I guess that's just a personal thing.

* https://developer.mozilla.org/en-US/docs/Web/HTML/Inline_elements
*
* [ ...document.querySelectorAll( '.threecolumns code' ) ]
* .map( ( el ) => el.textContent.replace( /(^<|>$)/g, '' ) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite following this comment and how it is related.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite following this comment and how it is related.

In case anyone is curious how the set is determined, or in case it ever needs to be updated in the future (assuming the structure of the referenced documentation page doesn't change).

I'll add a bit more clarity to the comment, as I did with "Extracted from" in the block validator which has similar comments:

* Extracted from: https://html.spec.whatwg.org/multipage/indices.html#attributes-3
*
* Object.keys( [ ...document.querySelectorAll( '#attributes-1 > tbody > tr' ) ]
* .filter( ( tr ) => /^("(.+?)";?\s*)+/.test( tr.lastChild.textContent.trim() ) )
* .reduce( ( result, tr ) => Object.assign( result, {
* [ tr.firstChild.textContent.trim() ]: true
* } ), {} ) ).sort();
*

@@ -1,3 +1,5 @@
<!-- wp:video -->
<figure class="wp-block-video"><video controls src="https://awesome-fake.video/file.mp4"></video></figure>
<figure class="wp-block-video">
<video controls src="https://awesome-fake.video/file.mp4"></video>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice!

@@ -1,3 +1,3 @@
<!-- wp:paragraph -->
<p>This is an old-style text block. Changed to <code>paragraph</code> in #2135.</p>
<p>This is an old-style text block. Changed to <code>paragraph</code> in #2135.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. So beautify is also dropping multiple spaces?

Copy link
Member Author

@aduth aduth May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. So beautify is also dropping multiple spaces?

Yes, presumably because in a browser there would be no rendered difference (whitespace collapsing).

content += '<' + type + attributes;

if ( SELF_CLOSING_TAGS.has( type ) ) {
return content + '/>';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a space for self closing tags?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a space for self closing tags?

Heh, I went back and forth on this a few times. I don't feel particularly strongly about it, and ultimately weighed consistency of this "beauty" before deciding to remove it: Why is there a space after the last attribute when a tag is self-closing, but not when it has a separate explicit closing tag? Why do we otherwise not include the space before slash in a closing tag, e.g. < /figure> ?

I'd agree that at least by anecdotal observation, the space before the self-closing slash is common for well-formatted HTML, though I couldn't personally come to a conclusion as to why this is objectively the case.

I'll see if I can add it without conflicts, at least so that we're consistent with the previous output, and if future debate is warranted, I'm open to discussing further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter to me. Just thought it would avoid some changes in the tests. Maybe we can also drop the / if it only needs to be HTML5 compatible. I guess not since not all themes will be HTML5.

'dfn',
'em',
'i',
'img',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is strange to me that img is included in this list but not audio, video, iframe etc. Should it also be dropped here? Maybe we should limit this list to phrasing content excluding embedded content? This would also mean dropping form elements as it's not phrasing content. https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Content_categories

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is strange to me that img is included in this list but not audio, video, iframe etc. Should it also be dropped here?

As noted in above comment, this list was generated from MDN's documentation and aligns precisely with the list described here:

https://developer.mozilla.org/en-US/docs/Web/HTML/Inline_elements#Elements

More specifically, the specification describing the processing of whitespace makes a distinction on "inline formatting context": https://www.w3.org/TR/css-text-3/#white-space-rules

Trying to dig into where the basis for img as inline originated, I couldn't find the exact specification, and it may instead be on the basis of inline being the initial value for the display CSS property and most user-agent stylesheets ([1], [2]) not otherwise specifying a display value for img. That said, these stylesheets also don't tend to include elements like audio and such, and indeed in my testing browsers report their display as inline:

https://codepen.io/aduth/pen/WJBNBM?editors=1111

Maybe instead then we should invert this to be inclusive only on block content? Since this will be easier to maintain into the future, since new elements will be added whose default is not set and will assume the initial value of inline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to follow the lead of the specification here, unless I've misinterpreted it, or there's some other reason js-beautify chose to use phrasing content (what's the relation between phrasing content and inline elements?).

* @return {string} Indented content.
*/
export function getIndentedContent( string ) {
return string.replace( /(.*\S)/g, '\t$1' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite getting this regular expression. Why not /\S/.test( string )?

@aduth aduth force-pushed the add/element-beautify branch from c4b8ff5 to 9707d21 Compare June 1, 2018 18:50
@aduth
Copy link
Member Author

aduth commented Jun 1, 2018

I took a pass at bringing this up-to-date, and ran into a number of other frustrations, largely around how we re-parse beautified content, since the added whitespace is considered as meaningful nodes.

  1. With pasting the following into the console on the editor screen (even on master branch):
with ( wp.blocks ) parse( serialize( createBlock( 'core/heading', { content: [ 'Lorem ', { type: 'div', props: { children: [ 'ipsum' ] } } ] } ) ) )[ 0 ].attributes.content

I would expect the parsed content should match more-or-less what we had passed in for the created block, but instead it produces: ["Lorem↵ ", {…}, "↵"]

  1. For inner blocks, in a test case in InnerBlocks, we should be able to add an assertion:
expect( wp.blocks.parse( serialize( block ) )[ 0 ].attributes.content ).toBe( 'Invalid' );

But in fact it fails:

Expected value to be:
      "Invalid"
    Received:
      "Invalid
    	
    "

It's obvious then the presence of inner blocks can affect the parsed attributes of its parent. The test has yet another result if the created block has no children.

  1. We lean on beautification in the raw handler where we should probably have more cleaning logic built-in, e.g. whitespace collapsing.

See the included changes for ms-word-out.html, specifically the table markup. These changes are expected only because when we process the HTML, we're not doing anything to trim or collapse whitespace, so they remain in the full lifecycle of the application.


I chatted briefly with @mtias a bit about the value of beautifying the markup. I'm beginning to think it's causing more trouble than its worth, at least for beautification within a block. The original issue for adding whitespace (#633) really only sought to address what, at the time, was a giant blob of text for the entire post. Maybe it's enough to have some newlines between blocks, and no other beautification within, if that alone doesn't cause much trouble. Will have to see how well this works with inner blocks.

It seems like if this will make parsing more predictable, and respect existing whitespace like raised in #4456 and #633 (comment) .

(cc also @youknowriad re: #663)

@ellatrix
Copy link
Member

@aduth So is the plan to drop beautification?

@aduth
Copy link
Member Author

aduth commented Jul 11, 2018

Closing in favor of #7892

@aduth aduth closed this Jul 11, 2018
@aduth aduth deleted the add/element-beautify branch July 11, 2018 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Framework Issues related to broader framework topics, especially as it relates to javascript [Status] In Progress Tracking issues with work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants