-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure URL fragments to named anchors also validate #363
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
The failure in the CI build is sbt/sbt#5049. Not sure why it's using different versions in scripted, but I assume that means that if sbt-paradox is published with sbt 1.3.0 it no longer works in sbt 1.2.8 projects? |
Hmm, though that would also prevent projects like Akka gRPC, that themselves produce sbt plugins and might not be ready to require sbt 1.3.0, from upgrading. WDYT? |
Can't we just change the version of sbt that we build against? There's no reason that sbt plugins need to be built against the same version of sbt that they use to build themselves. So we should just set |
You're right of course. Then I'd be fine with just sticking with sbt 1.3.0 for paradox and setting |
Another fragment check that doesn't work is to the datadog docs: https://docs.datadoghq.com/tracing/guide/trace_sampling_and_storage/?tab=java#trace-storage Looks like the id attribute is not quoted in this case: |
And single quotes of id/name attributes would also not work. Should the validation support unquoted, single-quoted, and double-quoted? |
Hmm... so we can keep coming up with edge cases etc and adding support for them, or we could do it properly, by parsing the document and then looking for named anchors or ids. This would mean using HtmlUnit (I'm not aware of any other HTML parser available on the JVM - there are plenty of xml parsers of course, but many web pages, such as the datadog docs example, are not valid XML so can't be parsed by an XML parser). Thoughts? |
I forgot about jsoup. |
Yes, my next thought as well. Switching over to jsoup sounds good to me. |
d4e4cdd
to
6e1c352
Compare
Ok, I added jsoup. Also, I made a small improvement, links to the same page with different fragments only result in that page being downloaded once. Plus the code around grouping common links is simpler now. |
Also, modified link validation so links to the same page with different fragments only load/download that page once.
6e1c352
to
28e907a
Compare
Cool, I'll try it out again. |
private def validateFragments(path: String, content: Document, fragments: List[CapturedLinkFragment], errorContext: ErrorContext): Unit = { | ||
fragments.foreach { | ||
case CapturedLinkFragment(Some(fragment), sources) => | ||
if (content.getElementById(fragment) == null && content.select(s"a[name=$fragment]").isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't sure what the best way to lookup named a tags was. Could probably lookup all a tags, and then iterate through to see if any have a name of fragment. This is simpler, though with the possibility of escaping issues, for example, it definitely won't work for fragments with ]
in them, there may also be other characters that will cause a problem. People don't usually put special characters that won't work in names since they're meant to appear in URLs, and URLs with percent encoding looks ugly, so it may just not be an issue. I think if we find any actual problems with this, we can switch to something that searches for it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, and tried it on Cinnamon docs. There's one fragment that didn't work:
https://kafka.apache.org/documentation.html#producerconfigs
Looks like it's embedded in a handlebars script or something (didn't look closely). We'll just ignore or change that.
Yeah you're right, it is. To validate that we'd have to use htmlunit (which is likely to cause bigger problems because its javascript support is only partial) or selenium/webdriver (slow). I don't think it's worth it, I'd say just add an ignore (paradoxIgnorePaths). |
No description provided.