Skip to content

JSDoc comment parsing supports multiline backticks #50677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sandersn
Copy link
Member

@sandersn sandersn commented Sep 7, 2022

Previously, backticks in JSDoc ended at the end of a line. This PR changes that so that backticks continue parsing until another backtick. I'm not sure this is a good change, and I left the code a bit inelegant until we decide.

Note that triple backticks and single backticks are treated the same way: both have to be closed. The code change is already very clunky, and I haven't thought about how much clunkier it would be to track backtick history in the parser.

I would appreciate opinions on whether this is a sensible change in behaviour, as well as ideas on how hard it would be to distinguish triple from single backticks.

Fixes #47679

Previously, backticks in JSDoc ended at the end of a line. This PR
changes that so that backticks continue parsing until another backtick.

Note that triple backticks and single backticks are treated the same
way. The change is already very clunky, and I haven't thought about how
much clunkier it would be to track backtick history in the parser.
@typescript-bot typescript-bot added Author: Team For Milestone Bug PRs that fix a bug with a specific milestone labels Sep 7, 2022
@jakebailey
Copy link
Member

I haven't read the PR yet, but I enabled the "tsdoc" ESLint linter plugin and there are even quite a few comments in our repo that have unclosed backtick blocks, so I'm half worried what we might break out there.

@sandersn
Copy link
Member Author

sandersn commented Sep 12, 2022

I agree. After some discussion with @andrewbranch , this can be summarised as improvement in correctness at the cost of breaking existing code. I'm inclined to close this PR, but other opinions are welcome. @DanielRosenwasser you might want to weigh in.

Edit: Probably we need an estimate of how common unbalanced backticks are.

@trusktr
Copy link
Contributor

trusktr commented Oct 1, 2022

I haven't read the PR yet, but I enabled the "tsdoc" ESLint linter plugin and there are even quite a few comments in our repo that have unclosed backtick blocks, so I'm half worried what we might break out there.

Why are there unclosed back ticks? I've never seen that in my entire life, and it is an interesting coincidence that the TS repo (of all places) has them.

@jakebailey
Copy link
Member

jakebailey commented Oct 1, 2022

Just a typo (which I've already fixed). I'm only worried because JSDoc has semantic meaning in some codebases where an unclosed backtick might eat an @param or @deprecated or something later.

@DanielRosenwasser
Copy link
Member

#52398 is a related issue - any recollection whether \ escaping was discussed around this?

@DanielRosenwasser
Copy link
Member

Let's revisit this and consider whether or not we can give suggestion diagnostics for plain JS users, and errors for checkJs users.

@sandersn
Copy link
Member Author

@DanielRosenwasser Here's a summary of the discussion from the editor meeting. No conclusions, just more issues raised:

  • In addition to quoting tags and decorators, we would like to have a way to quote jsdoc comments so they too could be embedded. The ending token */ is the problem and would need a quoting mechanism, perhaps *\/
  • This PR breaks on incorrectly closed backticks, and the breaks in our codebase indicated that breaks would happen all over.
  • (One thing I forgot to mention: distinguishing single backticks from triple backticks is a difficult change given the current jsdoc tokeniser.)
  • The breaks, while common, might not be too bad since it just results in badly formatted jsdoc for .ts, and some missing information for .js.
  • In .js we could detect and issue an error when the parser ends in a quoted state, although there may be some linters that already do this. (@jakebailey ran some tool to detect unclosed backticks on our code.)

So one possible path forward is to add an "unclosed backtick" error to this PR, for JS files only. We'd need to check what linters issue similar errors, however.

@rbuckton
Copy link
Member

rbuckton commented Feb 3, 2023

I am not sure the JSDoc parsing changes are sufficient to correctly handle embedded markdown. If we want backtick parsing to be reliable, we should probably ensure we support the following cases as well:

Code spans in markdown aren't interpreted as code spans if there is a hard line break (e.g. 2+ newlines) anywhere after the starting backtick, and only include soft line breaks if it can find a trailing backtick.

`b
c`

is interpreted as a code span:

b c

while

`b

c`

is interpreted as literal characters:

`b

c`

Code spans must also be balanced so that they are not terminated improperly:

```abc```

is interpreted as:

abc

while

```abc``

is interpreted as:

```abc``

Code spans can contain backticks as long as the starting and ending backtick runs are greater in number than the number of sequential backticks to include in the code span:

```a `b` c```

is interpreted as:

a `b` c

If the code text of code span needs to include a backtick at the start or end, a space is required:

`` `b``

is interpreted as:

`b

while

`` ` ``

is interpreted as:

`

Code blocks begin with three or more backticks, followed by any non-backtick characters (primarily used for language selection), and end when the parser encounters the same number of backticks on a separate line, or the end of the input:

```abc
def
```

is interpreted as:

def

while

```abc
def```

is interpreted as (note the backticks are included in the code block):

def```

Also, code blocks aren't only detected from backticks. Fenced code blocks can also be written using 3+ sequential tilde characters:

~~~
abc
~~~

is interpreted as:

abc

goTo.marker("x");
verify.quickInfoIs("(parameter) x: string", "hi there `@param`");
goTo.marker("y");
verify.quickInfoIs("(parameter) y: string", "hi there `@ * param\nthis is the margin");
verify.quickInfoIs("(parameter) y: string", "hi there `@ * param\nthis is the margin\n@param {string} z OOPS, unclosed backtick!");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct. Markdown won't detect this as a code span because there's no closing backtick, so it will treat the backtick as a literal character.

@rbuckton
Copy link
Member

rbuckton commented Feb 3, 2023

I was experimenting with better markdown parsing for TSDoc a few years ago. The parsing rules I pursued were that any @tag at the start of a line would be considered a markdown block element, following the rest of the markdown rules for everything else. So:

@param x `a
@param y `b

Would treat the code span starting at `a` as unterminated, because blocks are parsed before spans. This also means the code span starting at b would be unterminated.

This gets a bit tricker when JSDoc parameter descriptions flow onto the next line, since JSDoc rules don't align with markdown's block-level indentation rules. Consider an ordered list:

1. ```abc
def
```

is interpreted as:

def


while

2. ```abc
   def
   ```
3. abc
```def
ghi
```
4. abc
   ```def
   ghi
   ```

is interpreted as:

  1. def
    
  2. abc
ghi
  1. abc
    ghi
    

Ordered list indentation expects the text to flow at the same indentation level. So for (1), the ```abc block ends immediately because the content that follows is part of a different block. (2) is properly indented, (3) only contains abc in its description, the code block is a different paragraph, and (4) is properly indented.

This makes the rules around @param a bit weird and open to interpretation. Due to varying commenting styles, we probably want to use the indentation of the 2nd line of the tag (if present) up to the offset where the parameter description would start. If the first line starts with a code fence, we would use the offset of the parameter description as the indentation:

@param a this is a multiline parameter
whose initial indentation is 0.

@param b this is a multiline parameter
         whose initial indentation is 9.

in the case of code blocks whose initial indentation is 0:

@param a this has an unterminated code block at indentation 0
```
the initial indentation here is also 0
@param b since the code block shares the indentation of this param tag,
this content is considered part of the code block.

if the indentation of the code block is greater than the block tag that follows:

@param a this has an unterminated code block at indentation 0
  ```
  the indentation here is 2
@param b this is not considered part of the code block.

if the code block starts the parameter description, its initial indentation should be the offset of the start of the description:

@param a ```
         the code block has an indentation of 9.
@param b this is not considered part of the preceding code block.

@sandersn
Copy link
Member Author

I'd rather avoid parsing real markdown code blocks, since our JSDoc parsing is already quite slow. But as @rbuckton shows, any improvement less than real parsing is unlikely to improve things. I'm going to close this PR.

@sandersn sandersn closed this Jul 25, 2023
@dreamorosi
Copy link

@sandersn is there going to be any alternative to solve the issue?

@sandersn
Copy link
Member Author

@dreamorosi I haven't come up with anything yet. If you have some ideas, please do propose them on the original issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Author: Team For Milestone Bug PRs that fix a bug with a specific milestone
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

TypeScript class method decorator not rendering properly in version >= 1.63.2
7 participants