Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to minify HTML output #391

Closed
alexgleason opened this issue Oct 13, 2020 · 9 comments · Fixed by #394
Closed

Option to minify HTML output #391

alexgleason opened this issue Oct 13, 2020 · 9 comments · Fixed by #394
Assignees

Comments

@alexgleason
Copy link

We're using Earmark in Pleroma, a self-hosted social media platform, to allow users to write posts in Markdown.

Pleroma is federated, so it has to handle posts from other ActivityPub implementations. Sometimes these implementations will use newlines instead of <br> tags to indicate line breaks, so we apply white-space: pre-wrap; on the text content to preserve this formatting. There's more info in this issue.

This is a problem when using Earmark, since it will always output prettified HTML. Eg for this input: hello\n\nworld! we get this output: <p>\nhello</p>\n<p>\nworld!</p>\n. With white-space: pre-wrap; applied it treats these \n very literally and creates a lot of unwanted whitespace.

I'd like the ability to pass an option, maybe like %{minify: true}, to format the output like so: <p>hello</p><p>world!</p>

For now I've created a function that seems to handle most cases:

def minify(text, "text/html") do
  text
  |> String.replace(">\n", ">")
  |> String.replace(">  ", ">")
  |> String.replace("  <", "<")
end

(There also seems to be issues with extraneous spaces, which the last two lines try to resolve, eg:)

iex(4)> Earmark.as_html! "<ul><li>one</li><li>two</li><li>three</li><li>four</li></ul>"
"<ul>\n  <li>one</li><li>two</li><li>three</li><li>four</li></ul>\n"
@RobertDober
Copy link
Collaborator

Sorry for this inconvenience, this will be fixed but I had foreseen to do that later because so far nobody was really impacted (ex_doc does not use the transformer anymore).

I cannot promise to get this into 1.4, it really depends on my upcoming schedule.

A PR would be great the goal is to have quite compact, but reasonable HTML, something like not indented with line breaks before and after block tags (but not all, e.g. not <code>)

@RobertDober
Copy link
Collaborator

Oh BTW I was not thinking about an option, this should be the default behavior IMHO

@RobertDober
Copy link
Collaborator

However I was talking about something completely different, 😓

I was talking about rendering Markdown to HTML, not HTML, please note that, that kind of input is not even supported (it kinda works by chance)

Earmark.as_html! "<ul><li>one</li><li>two</li><li>three</li><li>four</li></ul>"

Please see #390 and https://github.com/pragdave/earmark#html-blocks

As mentioned this shall be handled RobertDober/earmark_parser#7

I will keep this issue open and change the rendering to a more compact (and possibly minified) output.

But before the related EarmarkParser issue, nothing can be done

@alexgleason
Copy link
Author

alexgleason commented Oct 14, 2020

Solving it with Markdown to HTML is what we're trying to do. My example above was just to demonstrate a case with extraneous spaces, but the issue is mainly about newlines added from Markdown to HTML output. Thank you!

@RobertDober
Copy link
Collaborator

RobertDober commented Oct 14, 2020

IIUC we have two issues

  • Rendering is bogus because of two too (cheating 😉) much whitespace (which is a bug)
  • HTML is not totally parsed (which is specified) and might lead to unexpected results, will go away with EarmarkParser's issue Parse HTML recursively RobertDober/earmark_parser#7

Therefor in theory I can fix the first bug if I have time 😨 (or could accept a bug request)

Can you confirm please?

@alexgleason
Copy link
Author

alexgleason commented Oct 14, 2020

Rendering is bogus because of two too (cheating 😉) much whitespace (which is a bug)

Yes, you're correct. The issue is extraneous newlines. Thank you!

@feld
Copy link

feld commented Nov 18, 2020

Hello, checking in on this

@rinpatch
Copy link
Contributor

I have a draft commit for this rinpatch@2e8b4dc, it works perfectly in my testing, however causes 110 tests to fail due to expected fixtures containing newlines. Unfortunately my attempts to automate the fixture rework ran into some edge cases and modifying it all manually is not my idea of fun to say the least. I might get back to it at some point, but would appreciate if someone who has a smarter idea could attempt it.

@RobertDober
Copy link
Collaborator

@rinpatch can you please make a PR against branch i391_minify_html, I will gladly have a look

rinpatch pushed a commit to rinpatch/earmark that referenced this issue Nov 25, 2020
As discussed in
pragdave#393 (comment).
I did not name the option "single_line", since newlines in code blocks
are preserved even with it on and since Earmark passes
the single newlines from the input to the final output.

Also removes some string concatenations and replaces single-character
binaries with character codes.

I did not find a file the tests for this feature would fit in, so
I created a new one at `test/acceptance/html/compact_mode_test.exs`

Closes pragdave#391
rinpatch pushed a commit to rinpatch/earmark that referenced this issue Nov 25, 2020
As discussed in
pragdave#393 (comment).
I did not name the option "single_line", since newlines in code blocks
are preserved even with it on and since Earmark passes
the single newlines from the input to the final output.

Also removes some string concatenations and replaces single-character
binaries with character codes.

I did not find a file the tests for this feature would fit in, so
I created a new one at `test/acceptance/html/compact_output_test.exs`

Closes pragdave#391
rinpatch pushed a commit to rinpatch/earmark that referenced this issue Nov 25, 2020
As discussed in
pragdave#393 (comment).
I did not name the option "single_line", since newlines in code blocks
are preserved even with it on and since Earmark passes
the single newlines from the input to the final output.

Also removes some string concatenations and replaces single-character
binaries with character codes.

I did not find a file the tests for this feature would fit in, so
I created a new one at `test/acceptance/html/compact_output_test.exs`

Closes pragdave#391
rinpatch pushed a commit to rinpatch/earmark that referenced this issue Nov 25, 2020
As discussed in
pragdave#393 (comment).
I did not name the option "single_line", since newlines in code blocks
are preserved even with it on and since Earmark passes
the single newlines from the input to the final output.

Also removes some string concatenations and replaces single-character
binaries with character codes.

I did not find a file the tests for this feature would fit in, so
I created a new one at `test/acceptance/html/compact_output_test.exs`

Closes pragdave#391
RobertDober pushed a commit that referenced this issue Nov 27, 2020
As discussed in
#393 (comment).
I did not name the option "single_line", since newlines in code blocks
are preserved even with it on and since Earmark passes
the single newlines from the input to the final output.

Also removes some string concatenations and replaces single-character
binaries with character codes.

I did not find a file the tests for this feature would fit in, so
I created a new one at `test/acceptance/html/compact_output_test.exs`

Closes #391
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants