Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add syntax highlighting to code tags #2015

Closed

Conversation

Plastikmensch
Copy link

@Plastikmensch Plastikmensch commented Dec 16, 2022

Fixes #2011

Important changes:

  • Adds a new dependency: highlight.js
  • Changes advanced_text_formatter to allow data attribute on code tags
  • Adds data and title attribute to code blocks
  • Modifies MASTODON_STRICT

Preview:
Screenshot_2022-12-16_22-45-17

Some considerations:
As this adds a title attribute, when hovering it shows the specified language.
I think that's a good idea, as it might not always be obvious which language code is actually written in.
This also means it has to be sanitised to not show anything a user specifies.
Therefore it is checked, whether it's a valid language or not and if it isn't, the title attribute gets removed.
I didn't want to change MASTODON_OUTGOING directly, as I don't know where it's all used and I didn't want to accidentally mess with other stuff.
I chose the github-dark theme as it's what's most people will be familiar with.
Ideally more styles would be included and users can choose in which style code is shown, but I considered this out of scope for the initial implementation.

@Plastikmensch
Copy link
Author

Plastikmensch commented Dec 16, 2022

Ah, I see one issue with this approach:
title attribute doesn't federate, therefore highlighting only works on the instance it was posted to :/
It actually does, but is missing on remote toots, that's why highlighting doesn't work on them.

Edit:
Seems like I figured it out. 'code' => %w(title) needs to be added to MASTODON_STRICT instead, as it is used in html_aware_formatter.rb

@prplecake
Copy link

Contrast needs to be fixed in light themes.

image

@prplecake
Copy link

And in fact this doesn't fix #2011 as line highlighting has been so far forgotten.

@Plastikmensch
Copy link
Author

Plastikmensch commented Dec 17, 2022

Seems like I figured it out. 'code' => %w(title) needs to be added to MASTODON_STRICT instead, as it is used in html_aware_formatter.rb

I'm not too happy about modifying MASTODON_STRICT, but the alternative would be to create another modified constant in html_aware_formatter.rb and at that point I might as well edit it directly.

Contrast needs to be fixed in light themes.

How people can use light themes without their eyes burning is beyond me, but fixed.

And in fact this doesn't fix #2011 as line highlighting has been so far forgotten.

"Fixes" is just a closing keyword used to link issues. I could've also used close(s|d)?|resolve(s|d)?, but that's semantics and irrelevant.

<<~HTML
<pre><code>#{ERB::Util.h(code).gsub("\n", '<br/>')}</code></pre>
<pre><code title="#{language}">#{ERB::Util.h(code).gsub("\n", '<br/>')}</code></pre>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure title is the best option here, but I don't know what would be better. I thought about lang but programming languages are explicitly out of scope.

Copy link
Author

@Plastikmensch Plastikmensch Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also considering using a style class attribute instead, but as mentioned, using the title attribute has the advantage of showing which code language a code block was written in when hovering it.
I think title is the best option.

Copy link
Author

@Plastikmensch Plastikmensch Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmh... Now that highlighting requires the class attribute anyway, I think it's actually better to also change title to class. I would still like the element to have a title attribute though.

Ah, that doesn't work, because vanilla also sanitises class names. So while setting a title in the backend, converting it to class and then removing it again in the frontend isn't ideal, that's the only approach that works and doesn't allow arbitrary classes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a data-codelang or something? Do you know if other fediverse software have such a feature, and if so, how they communicate the language being used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a data attribute sounds good.
It seems Misskey has syntax highlighting, but looking at their implementation, I can't figure out which attribute they use, if any.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it looks like Misskey just uses a class attribute: class="language-<langcode>" on both pre and code elements, which gets then interpreted by their highlighter prism.js, and a "lang" prop internally.
I'm not sure that's what you wanted to know though.

@github-actions
Copy link

This pull request has merge conflicts that must be resolved before it can be merged.

@Plastikmensch Plastikmensch force-pushed the add-syntax-highlighting branch 2 times, most recently from 862b738 to cde4ad3 Compare January 18, 2023 16:39

Outdated base version: https://github.com/primer/github-syntax-dark
Current colors taken from GitHub's CSS
*/.hljs{color:#c9d1d9;background:#0d1117}.hljs-doctag,.hljs-keyword,.hljs-meta .hljs-keyword,.hljs-template-tag,.hljs-template-variable,.hljs-type,.hljs-variable.language_{color:#ff7b72}.hljs-title,.hljs-title.class_,.hljs-title.class_.inherited__,.hljs-title.function_{color:#d2a8ff}.hljs-attr,.hljs-attribute,.hljs-literal,.hljs-meta,.hljs-number,.hljs-operator,.hljs-selector-attr,.hljs-selector-class,.hljs-selector-id,.hljs-variable{color:#79c0ff}.hljs-meta .hljs-string,.hljs-regexp,.hljs-string{color:#a5d6ff}.hljs-built_in,.hljs-symbol{color:#ffa657}.hljs-code,.hljs-comment,.hljs-formula{color:#8b949e}.hljs-name,.hljs-quote,.hljs-selector-pseudo,.hljs-selector-tag{color:#7ee787}.hljs-subst{color:#c9d1d9}.hljs-section{color:#1f6feb;font-weight:700}.hljs-bullet{color:#f2cc60}.hljs-emphasis{color:#c9d1d9;font-style:italic}.hljs-strong{color:#c9d1d9;font-weight:700}.hljs-addition{color:#aff5b4;background-color:#033a16}.hljs-deletion{color:#ffdcd7;background-color:#67060c}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have permission to use this CSS bit?

Copy link
Author

@Plastikmensch Plastikmensch Jan 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of highlight.js, which is licensed under BSD 3-Clause, which permits usage.
GitHub's CSS itself is licenced under MIT.
I'm not sure if the license file included in the module is sufficient compliance or if we would need to add an additional copyright notice.

I copied the styles from the module, because I couldn't import them directly.
Actually I tried it again and importing from the module worked this time.
Maybe I had the wrong path or something when I initially tried it.
But this also means including the style files is unnecessary.

<<~HTML
<pre><code>#{ERB::Util.h(code).gsub("\n", '<br/>')}</code></pre>
<pre><code data-codelang="#{language}">#{ERB::Util.h(code).gsub("\n", '<br/>')}</code></pre>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't suppose language is guaranteed to be escaped?

Copy link
Author

@Plastikmensch Plastikmensch Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what kind of escaping would be suitable here.
At least in my testing I couldn't cause any unexpected behaviour, nevertheless I agree accepting user input as is is never a good idea.
Ideally, I would check here if language is a valid code language, but this would create overhead.
What I could do though is checking that language only contains allowed characters via regex: /^[a-zA-Z0-9.#+-]+$/

@@ -81,6 +81,7 @@ module Config
'blockquote' => %w(cite),
'ol' => %w(start reversed),
'li' => %w(value),
'code' => [:data],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to only allow data-codelang, I can imagine arbitrary data- attributes causing other issues.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally yes and I tried to limit it to only data-codelang, but nothing I tried worked, it just removed all data- attributes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm I must have had a typo in last time I tried or there was some kind of other issue which prevented it from working.

@github-actions
Copy link

This pull request has merge conflicts that must be resolved before it can be merged.

@github-actions
Copy link

This pull request has resolved merge conflicts and is ready for review.

@github-actions
Copy link

This pull request has merge conflicts that must be resolved before it can be merged.

@github-actions
Copy link

This pull request has resolved merge conflicts and is ready for review.

@github-actions
Copy link

This pull request has merge conflicts that must be resolved before it can be merged.

@github-actions
Copy link

This pull request has resolved merge conflicts and is ready for review.

@Plastikmensch
Copy link
Author

Plastikmensch commented Mar 17, 2023

Just noticed that highlighting is missing in the reply-indicator, because it doesn't use the status_content component and while I could move the highlightCode function to a util and reuse it in reply_indicator.jsx, the reply indicator uses a lighter background by default which messes with the styling, but this also means invalid codelangs are not removed.

@github-actions
Copy link

This pull request has merge conflicts that must be resolved before it can be merged.

@github-actions
Copy link

github-actions bot commented Apr 4, 2023

This pull request has merge conflicts that must be resolved before it can be merged.

Adds a new node dependency.
Also adds required scss.

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
This creates a new constant used to sanitize the html tree, which allows title attributes on code tags.

Adds title attribute specifying language to code tags.

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Adds a new html util which converts html entities to their symbols

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Adds a new function which runs on contentHtml to add styling to code blocks.

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Adds new check which makes sure title attribute is set correctly

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Removes modified sanitation and adds it directly to `MASTODON_STRICT`

This is needed so that the title attritube is not being stripped for remote toots.

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Removes the ".scss" extension from the import of the highlight style

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Removed convertToChar from `utils/html.js`

Let highlight.js run directly on DOM instead of using RegEx.

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Instead of using RegEx to replace br tags, use DOM

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Use data-codelang instead of title attribute for user set code language.

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Little oversight.
Changes the wording of the test case to use language instead of title, as title isn't used anymore.

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Having user specified input display anywhere without sanitizing and removing it when invalid is a bad idea.

This removes the data attribute when it's not a valid or supported code language.

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
These are third-party styles, which introduce lots of lint problems as they don't adhere to style rules.

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Removes highlight.scss files.
Removes entries in ignoreFiles.

Stylesheets are imported from highlight.js module directly.

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
This style change was previously skipped and I missed it while rebasing.

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Instead of passing arbitrary user input to the frontend and checking if it's valid there, use Sanitize to strip invalid code languages.

For that a yaml file was added containing all valid language codes, which needs to be kept in sync with highlight.js.
This is inconvenient, but shouldn't be a problem.

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
@Plastikmensch Plastikmensch force-pushed the add-syntax-highlighting branch from 17c35ce to 9c48ed6 Compare July 17, 2023 14:37
Messed up while rebasing...

Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
Signed-off-by: Plastikmensch <plastikmensch@users.noreply.github.com>
@github-actions
Copy link

This pull request has resolved merge conflicts and is ready for review.

@github-actions
Copy link

This pull request has merge conflicts that must be resolved before it can be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support syntax highlighting for fenced code block
3 participants