Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anchor links for header with link are broken on pub.dev #563

Open
SandroMaglione opened this issue Nov 1, 2023 · 2 comments
Open

Anchor links for header with link are broken on pub.dev #563

SandroMaglione opened this issue Nov 1, 2023 · 2 comments
Labels
P2 A bug or feature request we're likely to work on type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@SandroMaglione
Copy link

On the fpdart pub.dev page the anchor to some headers are broken.

The broken headers include a link:

### [Task](/packages/fpdart/lib/src/task.dart)

On the Github repository the anchor works correctly.

On pub.dev instead the anchor id added on click #task does not work. Instead, the correct id should be #taskpackagesfpdartlibsrctaskdart.

Issue on fpdart's repository

@lrhn
Copy link
Member

lrhn commented Nov 3, 2023

TL;DR: Seems like a bug in the const HeaderWithIdSyntax() extension. It doesn't convert the content to text before creating the ID, so the resulting ID contains text that doesn't appear visibly in the header. And it differs from what GitHub does.

The source code for the link to the header, in packages/fpdart/README.md, is

  - [Task](#task)

In CommonMark syntax, that's an external link to #task, which means clicking it work like navigating to <currentUrl>#task.

GitHub flavored markdown (GFM) makes it work as a link to the internal header with test "task", which it gives a name/id of "task".
(Or something to that effect, using scripting.)

The target is

### [Task](/packages/fpdart/lib/src/task.dart)

So, this sounds like a bug in the GFM-web extension of package:markdown, which generates the wrong ID.
The ID should be based on the ASCII text content of the header, not the source. Links should be removed.
(Image links are apparently entirely removed, which can cause the link to contain a -- sequence.)

The generated link of #taskpackagefpdartlibsrctaskdart has taken all the words of the header source, but shouldn't have included the words inside the (...).

Example of what GFM does:

# TOC

* Goto [simple header](#a-simple-header)
* Goto [text header](#a-link-text-header)
* Goto [text and image header](#a-link-text--header)
* Goto [silly header](#a-link---with-1--multiple%CC%81-spaces_and-2--int%C3%A9rnal-punctuation-and-3--html--face-header)

### A simple header

Simple, not?

### A [Link text](http://example.com) header

Text, not?

### A [Link text](http://example.com) ![with image](http://example.com/favicon.gif) header

More text?

###  &#x41; link - __with__ (1)  multiple&#x0301;-**spaces**_and (2)  _int&eacute;rnal_-p/u/nc+tua!tion <sup>*and*</sup><a name="xx"/> ($3)  <span color="red">html 😝 face</span> header   

Silly, yes!

So, algorithm seems to be, something like, for a header line #+ (.*), extract an ID from the (.*) as:

  • Remove leading and trailing whitespace.
  • Parse as inline markdown, as normal
    • This expands HTML entities.
  • Convert that inline content to plain text by removing all formatting.
    • A matched _italic_ or __bold__ becomes just italic and bold.
    • <sup>foo</sup> or <span color="red">foo</span> becomes just foo.
    • All links removed, leaving only their link text.
    • All image links ![title text](link) removed entirely (leaves no text).
  • Remove all remaining non-Unicode-letter-digit-underscore-or-space characters (no emoji "☃"!)
  • Convert all spaces to - and all letters to lower-case
  • Encode all non-ASCII characters as UTF8, %-escape the bytes.

That's not a universal rule, it's GitHub specific, but that's what we should assume for the README.md of a pub package, especially if it has a Github repo, but probably in general.
(For example, Gerrit seems to use a different algorithm, whose description is also silent on internal markdown.
Some strategies remove accents from letters, GitHub does not.)

@lrhn lrhn transferred this issue from dart-lang/sdk Nov 3, 2023
@lrhn lrhn added the type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) label Nov 3, 2023
@kevmoo
Copy link
Member

kevmoo commented Nov 5, 2023

@srawlins ?

@kevmoo kevmoo removed their assignment Nov 5, 2023
@srawlins srawlins added the P2 A bug or feature request we're likely to work on label Nov 5, 2023
@jonasfj jonasfj removed their assignment Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 A bug or feature request we're likely to work on type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests

5 participants