Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent use of plaintext code fencing in docs #32938

Closed
5 tasks
zeke opened this issue Apr 20, 2020 · 15 comments
Closed
5 tasks

Inconsistent use of plaintext code fencing in docs #32938

zeke opened this issue Apr 20, 2020 · 15 comments
Labels
doc Issues and PRs related to the documentations. tools Issues and PRs related to the tools directory.

Comments

@zeke
Copy link
Contributor

zeke commented Apr 20, 2020

Hey party people!

I'm working on a project that parses all the markdown files in the doc directory of this repo, and I've come across a number of instances of inconsistent naming of the plaintext info string (i.e. the language) in GitHub Flavored Markdown code fencing:

language instances
text 44
txt 21
fundamental 1
markdown 1
raw 1

I'm currently using this hack to sanitize all the incoming markdown before converting it to HTML:

markdown = markdown
  .replace(/```txt/gm, '```plaintext')
  .replace(/```text/gm, '```plaintext')
  .replace(/```fundamental/gm, '```plaintext')
  .replace(/```raw/gm, '```plaintext')
  .replace(/```markdown<.*$/gm, '```plaintext')
  .replace(/```plaintext<.*$/gm, '```plaintext')

Here's what the current documentation style guide says about code blocks:

  • For code blocks:
    • Use language aware fences. ("```js")
    • Code need not be complete. Treat code blocks as an illustration or aid to
      your point, not as complete running programs. If a complete running program
      is necessary, include it as an asset in assets/code-examples and link to
      it.

There's no mention of what the canonical language should be for plaintext, but I'd suggest standardizing on one of the following:

  • text is the most commonly used plaintext info string in this project
  • plaintext is what the popular highlight.js syntax highlighting library expects

I don't have a strong opinion about what the info string should be as long as it's used consistently across the docs. I'd be happy to open a pull request to clean these up, but wanted to open an issue first for discussion.

Things that might need to be done to resolve this:

  • Choose an info string for plain text
  • Update docs to all use the same info string
  • Update the style guide
  • Add a linting rule?
  • What else?

cc @nodejs/documentation 👋

@aduh95
Copy link
Contributor

aduh95 commented Apr 21, 2020

If someone feels strong about this and want to open a PR to add a linting rule to enforce one form (preferably text given the data we have), they are welcome.

I think there are 2 instances of markdown I could find:

IMHO both should stay as is, it is valid markdown syntax and therefore should be syntax highlighted as markdown (supported by highlight.js BTW).

Regarding the use of raw and fundamental, I don't know if it brings any more value than just using text. For reference:

node/doc/api/report.md

Lines 506 to 512 in 46ec9ab

```raw
$ node
> process.report.writeReport();
Writing Node.js report to file: report.20181126.091102.8480.0.001.json
Node.js report completed
>
```

node/doc/api/fs.md

Lines 3470 to 3474 in 46ec9ab

```fundamental
- txtDir
-- file.txt
- app.js
```

@DerekNonGeneric
Copy link
Contributor

There's a discussion going about code fences in GitHub Flavored Markdown (GFM) going at #32916. I also have a PR open about adding more GFM over at #32944.

/to @Trott

Progress on this issue would be valuable to inform us on what lint rules (if any) should be added to the Markdown files located in the .github directory of this repo.

Specs:
https://github.github.com/gfm/
https://spec.commonmark.org/0.29/

@DerekNonGeneric
Copy link
Contributor

DerekNonGeneric commented Apr 21, 2020

This issue seems to need label:doc and label:tools. Can someone please add them?

@richardlau richardlau added doc Issues and PRs related to the documentations. tools Issues and PRs related to the tools directory. labels Apr 21, 2020
@Trott
Copy link
Member

Trott commented Apr 23, 2020

I think markdown should stay, but the others should be consolidated into a single label unless someone is able to articulate a display and/or semantic difference. text seems good to me, as it's the most common one.

@DerekNonGeneric
Copy link
Contributor

I think markdown should stay [...]

Agreed. Markdown is a distinct language that benefits from syntax highlighting as opposed to Plaintext, which does not. Here are all the languages supported by highlight.js:

https://github.com/highlightjs/highlight.js/blob/master/SUPPORTED_LANGUAGES.md

Therefore, markdown should not be replaced with plaintext, which is happening in OP's hack.

@DerekNonGeneric
Copy link
Contributor

[...] but the others should be consolidated into a single label unless someone is able to articulate a display and/or semantic difference.

There is a semantic difference and I do not believe they should be consolidated into a single label (info string). Allow me to demonstrate.

Regarding the use of raw and fundamental, I don't know if it brings any more value than just using text. For reference:

Rather than raw, that should be console. Look at the differences in syntax highlighting.


Demo

Using raw as the info string:

 $ node 
 > process.report.writeReport(); 
 Writing Node.js report to file: report.20181126.091102.8480.0.001.json 
 Node.js report completed 
 > 

Using console as the info string:

$ node 
> process.report.writeReport(); 
Writing Node.js report to file: report.20181126.091102.8480.0.001.json 
Node.js report completed 
> 

Therefore, console would be preferable as the syntax highlighting is significant and report.md's use of raw is erroneous. I can provide screenshots to help clarify my point if necessary (let me know).

@zeke
Copy link
Contributor Author

zeke commented Apr 23, 2020

Thanks @DerekNonGeneric. I've incorporated your feedback in #33028

@DerekNonGeneric
Copy link
Contributor

DerekNonGeneric commented Apr 26, 2020

I had commented in the PR w/ my pre-deep-dive thoughts. With a better understanding, it turns out they're more relevant to this thread and we had been overlooking several pertinent details.

  1. We’re not currently targeting highlight.js (unsure why it was mentioned) since that’s not what the API docs are using. The Node.js API docs are presently using SHJS for syntax highlighting. I don’t know what GitHub uses, but ensuring consistency with the rendered Markdown pages that are viewable on the GitHub website would be preferable.
  2. Unless we simply want to achieve consistency, there does not appear to be justification for using text over nothing at all (I still need to check whether SHJS does anything with this). Perhaps it should be an empty (blank) info string instead.
  3. My suggestion about using console was for its benefit to rendered Markdown pages that are viewable on the GitHub website. I’d personally like to know what GitHub uses for the syntax highlighting for these pages to determine if it would be feasible to replace SHJS with it.
  4. Something we had yet to consider is that the Markdown files in the docs/api directory are not GFM, but CommonMark (at least that appears to be what the intention is).

I agree that there is an issue here and that the PR will fix it, although I normally see this type of thing handled in a single commit that also prevents it from re-occurring. These Markdown lint rules aren't defined in this repo though (they're in remark-preset-lint-node), so I imagine this will play out differently than what I'm accustomed to. Looking forward to seeing this come to fruition!

@aduh95
Copy link
Contributor

aduh95 commented Apr 26, 2020

Unless we simply want to achieve consistency, there does not appear to be justification for using text over nothing at all.

My 2 cents on this: I think we should enforce using text, it seems to be an easy way to spot when a contributor forgot to add the language (maybe because a line return slip through, maybe the collaborator is not familiar with this feature of Markdown, etc.), and it makes it explicit when the syntax highlighting just don't make sense.

@Trott
Copy link
Member

Trott commented Apr 26, 2020

Maybe we can start small and do this incrementally? I imagine it would be uncontroversial as a first step to standardize on text or txt rather than having both.

@Trott
Copy link
Member

Trott commented Apr 26, 2020

Maybe we can start small and do this incrementally? I imagine it would be uncontroversial as a first step to standardize on text or txt rather than having both.

(This would also allow us to implement lint restrictions incrementally.)

@DerekNonGeneric
Copy link
Contributor

DerekNonGeneric commented May 10, 2020

Believe it or not …

  • the remark-rehype module converts the info string in the code fences (e.g.,```xxx) to an HTML <pre> tag w/ a nestested <code> tag that has a language- class (e.g., language-xxx in this case)

.use(remark2rehype, { allowDangerousHTML: true })

function highlight(prefix, suffix, tag) {

  • all code fences containing an info string get parsed as JavaScript regardless of language specified

if(!this.sh_languages){this.sh_languages={}}sh_languages.javascript=[[[/\/\/\//g,"sh_comment",1],[/\/\//g,"sh_comment",7],[/\/\*\*/g,"sh_comment",8],[/\/\*/g,"sh_comment",9],[/\b(?:abstract|break|case|catch|class|const|continue|debugger|default|delete|do|else|enum|export|extends|false|final|finally|for|function|goto|if|implements|in|instanceof|interface|native|new|null|private|protected|prototype|public|return|static|super|switch|synchronized|throw|throws|this|transient|true|try|typeof|var|volatile|while|with)\b/g,"sh_keyword",-1],[/(\+\+|--|\)|\])(\s*)(\/=?(?![*\/]))/g,["sh_symbol","sh_normal","sh_symbol"],-1],[/(0x[A-Fa-f0-9]+|(?:[\d]*\.)?[\d]+(?:[eE][+-]?[\d]+)?)(\s*)(\/(?![*\/]))/g,["sh_number","sh_normal","sh_symbol"],-1],[/([A-Za-z$_][A-Za-z0-9$_]*\s*)(\/=?(?![*\/]))/g,["sh_normal","sh_symbol"],-1],[/\/(?:\\.|[^*\\\/])(?:\\.|[^\\\/])*\/[gim]*/g,"sh_regexp",-1],[/\b[+-]?(?:(?:0x[A-Fa-f0-9]+)|(?:(?:[\d]*\.)?[\d]+(?:[eE][+-]?[\d]+)?))u?(?:(?:int(?:8|16|32|64))|L)?\b/g,"sh_number",-1],[/"/g,"sh_string",10],[/'/g,"sh_string",11],[/~|!|%|\^|\*|\(|\)|-|\+|=|\[|\]|\\|:|;|,|\.|\/|\?|&|<|>|\|/g,"sh_symbol",-1],[/\{|\}/g,"sh_cbracket",-1],[/\b(?:Math|Infinity|NaN|undefined|arguments)\b/g,"sh_predef_var",-1],[/\b(?:Array|Boolean|Date|Error|EvalError|Function|Number|Object|RangeError|ReferenceError|RegExp|String|SyntaxError|TypeError|URIError|decodeURI|decodeURIComponent|encodeURI|encodeURIComponent|eval|isFinite|isNaN|parseFloat|parseInt)\b/g,"sh_predef_func",-1],[/(?:[A-Za-z]|_)[A-Za-z0-9_]*(?=[ \t]*\()/g,"sh_function",-1]],[[/$/g,null,-2],[/(?:<?)[A-Za-z0-9_\.\/\-_~]+@[A-Za-z0-9_\.\/\-_~]+(?:>?)|(?:<?)[A-Za-z0-9_]+:\/\/[A-Za-z0-9_\.\/\-_~]+(?:>?)/g,"sh_url",-1],[/<\?xml/g,"sh_preproc",2,1],[/<!DOCTYPE/g,"sh_preproc",4,1],[/<!--/g,"sh_comment",5],[/<(?:\/)?[A-Za-z](?:[A-Za-z0-9_:.-]*)(?:\/)?>/g,"sh_keyword",-1],[/<(?:\/)?[A-Za-z](?:[A-Za-z0-9_:.-]*)/g,"sh_keyword",6,1],[/&(?:[A-Za-z0-9]+);/g,"sh_preproc",-1],[/<(?:\/)?[A-Za-z][A-Za-z0-9]*(?:\/)?>/g,"sh_keyword",-1],[/<(?:\/)?[A-Za-z][A-Za-z0-9]*/g,"sh_keyword",6,1],[/@[A-Za-z]+/g,"sh_type",-1],[/(?:TODO|FIXME|BUG)(?:[:]?)/g,"sh_todo",-1]],[[/\?>/g,"sh_preproc",-2],[/([^=" \t>]+)([ \t]*)(=?)/g,["sh_type","sh_normal","sh_symbol"],-1],[/"/g,"sh_string",3]],[[/\\(?:\\|")/g,null,-1],[/"/g,"sh_string",-2]],[[/>/g,"sh_preproc",-2],[/([^=" \t>]+)([ \t]*)(=?)/g,["sh_type","sh_normal","sh_symbol"],-1],[/"/g,"sh_string",3]],[[/-->/g,"sh_comment",-2],[/<!--/g,"sh_comment",5]],[[/(?:\/)?>/g,"sh_keyword",-2],[/([^=" \t>]+)([ \t]*)(=?)/g,["sh_type","sh_normal","sh_symbol"],-1],[/"/g,"sh_string",3]],[[/$/g,null,-2]],[[/\*\//g,"sh_comment",-2],[/(?:<?)[A-Za-z0-9_\.\/\-_~]+@[A-Za-z0-9_\.\/\-_~]+(?:>?)|(?:<?)[A-Za-z0-9_]+:\/\/[A-Za-z0-9_\.\/\-_~]+(?:>?)/g,"sh_url",-1],[/<\?xml/g,"sh_preproc",2,1],[/<!DOCTYPE/g,"sh_preproc",4,1],[/<!--/g,"sh_comment",5],[/<(?:\/)?[A-Za-z](?:[A-Za-z0-9_:.-]*)(?:\/)?>/g,"sh_keyword",-1],[/<(?:\/)?[A-Za-z](?:[A-Za-z0-9_:.-]*)/g,"sh_keyword",6,1],[/&(?:[A-Za-z0-9]+);/g,"sh_preproc",-1],[/<(?:\/)?[A-Za-z][A-Za-z0-9]*(?:\/)?>/g,"sh_keyword",-1],[/<(?:\/)?[A-Za-z][A-Za-z0-9]*/g,"sh_keyword",6,1],[/@[A-Za-z]+/g,"sh_type",-1],[/(?:TODO|FIXME|BUG)(?:[:]?)/g,"sh_todo",-1]],[[/\*\//g,"sh_comment",-2],[/(?:<?)[A-Za-z0-9_\.\/\-_~]+@[A-Za-z0-9_\.\/\-_~]+(?:>?)|(?:<?)[A-Za-z0-9_]+:\/\/[A-Za-z0-9_\.\/\-_~]+(?:>?)/g,"sh_url",-1],[/(?:TODO|FIXME|BUG)(?:[:]?)/g,"sh_todo",-1]],[[/"/g,"sh_string",-2],[/\\./g,"sh_specialchar",-1]],[[/'/g,"sh_string",-2],[/\\./g,"sh_specialchar",-1]]];

@DerekNonGeneric
Copy link
Contributor

DerekNonGeneric commented May 10, 2020

all code fences containing an info string get parsed as JavaScript regardless of language specified

Correction: there is a way to prevent code blocks from being highlighted. 😌

if (htmlClass === 'sh_none') {
donthighlight = true
continue;
}

Assuming …

  1. info string of none
  2. gets converted to pre tag w/ class of language-none by remark-rehype [it DOES]
  3. gets converted to sh_none by SHJS on the frontend [it DOES NOT]

@Trott
Copy link
Member

Trott commented May 19, 2020

(Previous comment deleted. Wrong issue. Obviously. Sorry.)

@DerekNonGeneric
Copy link
Contributor

DerekNonGeneric commented May 20, 2020

Since the Markdown lint rules are in remark-preset-lint-node, we are going to need to normalize the info strings prior to version bumping node-lint-md-cli-rollup to include the latest ruleset.

These version bumps have been happening independently and would not include changes that make the documents comply w/ the newest version of the ruleset. Therefore, the PR to resolve this issue (#33028) should be good to go at this point. We will just need to wait for #33442 to land until code blocks are actually preventable from being highlighted.

@Trott Trott closed this as completed in f4e805c Jun 10, 2020
codebytere pushed a commit that referenced this issue Jun 18, 2020
Prior to this commit, a handful of misc. code blocks were in need
of fixup. Corrections are according to predetermined conventions.

Closes: #32938

PR-URL: #33548
Fixes: #32938
Reviewed-By: Rich Trott <rtrott@gmail.com>
codebytere pushed a commit that referenced this issue Jun 30, 2020
Prior to this commit, a handful of misc. code blocks were in need
of fixup. Corrections are according to predetermined conventions.

Closes: #32938

PR-URL: #33548
Fixes: #32938
Reviewed-By: Rich Trott <rtrott@gmail.com>
codebytere pushed a commit that referenced this issue Jul 10, 2020
Prior to this commit, a handful of misc. code blocks were in need
of fixup. Corrections are according to predetermined conventions.

Closes: #32938

PR-URL: #33548
Fixes: #32938
Reviewed-By: Rich Trott <rtrott@gmail.com>
codebytere pushed a commit that referenced this issue Jul 14, 2020
Prior to this commit, a handful of misc. code blocks were in need
of fixup. Corrections are according to predetermined conventions.

Closes: #32938

PR-URL: #33548
Fixes: #32938
Reviewed-By: Rich Trott <rtrott@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Issues and PRs related to the documentations. tools Issues and PRs related to the tools directory.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants