Ensure URL fragments to named anchors also validate #363

jroper · 2019-09-10T07:14:46Z

No description provided.

pvlugter

LGTM. Thanks!

pvlugter · 2019-09-10T08:00:45Z

The failure in the CI build is sbt/sbt#5049. Not sure why it's using different versions in scripted, but I assume that means that if sbt-paradox is published with sbt 1.3.0 it no longer works in sbt 1.2.8 projects?

raboof · 2019-09-10T08:22:56Z

sbt-paradox is published with sbt 1.3.0 it no longer works in sbt 1.2.8 projects

Looks like it. Do we think this is fine (then we should apply #364) or do we want to support 1.2.8 for a while (then we should roll back #356)? I think I'd be OK with the former.

raboof · 2019-09-10T09:12:55Z

Hmm, though that would also prevent projects like Akka gRPC, that themselves produce sbt plugins and might not be ready to require sbt 1.3.0, from upgrading. WDYT?

jroper · 2019-09-10T09:38:32Z

Can't we just change the version of sbt that we build against? There's no reason that sbt plugins need to be built against the same version of sbt that they use to build themselves. So we should just set sbtVersion to the lowest version of sbt that we're compatible with, eg 1.0.0 if we're compatible with that.

raboof · 2019-09-10T09:45:45Z

There's no reason that sbt plugins need to be built against the same version of sbt that they use to build themselves

You're right of course. Then I'd be fine with just sticking with sbt 1.3.0 for paradox and setting sbtVersion in any downstream projects that need to.

pvlugter · 2019-09-10T23:27:07Z

Another fragment check that doesn't work is to the datadog docs:

https://docs.datadoghq.com/tracing/guide/trace_sampling_and_storage/?tab=java#trace-storage

Looks like the id attribute is not quoted in this case: <h2 id=trace-storage>Trace Storage</h2>

pvlugter · 2019-09-10T23:28:44Z

And single quotes of id/name attributes would also not work. Should the validation support unquoted, single-quoted, and double-quoted?

jroper · 2019-09-11T00:12:05Z

Hmm... so we can keep coming up with edge cases etc and adding support for them, or we could do it properly, by parsing the document and then looking for named anchors or ids. This would mean using HtmlUnit (I'm not aware of any other HTML parser available on the JVM - there are plenty of xml parsers of course, but many web pages, such as the datadog docs example, are not valid XML so can't be parsed by an XML parser). Thoughts?

jroper · 2019-09-11T00:13:18Z

I forgot about jsoup.

pvlugter · 2019-09-11T01:08:39Z

Yes, my next thought as well. Switching over to jsoup sounds good to me.

jroper · 2019-09-11T02:46:54Z

Ok, I added jsoup. Also, I made a small improvement, links to the same page with different fragments only result in that page being downloaded once. Plus the code around grouping common links is simpler now.

Also, modified link validation so links to the same page with different fragments only load/download that page once.

pvlugter · 2019-09-11T06:06:25Z

Cool, I'll try it out again.

jroper · 2019-09-12T00:52:53Z

core/src/main/scala/com/lightbend/paradox/ParadoxProcessor.scala

+  private def validateFragments(path: String, content: Document, fragments: List[CapturedLinkFragment], errorContext: ErrorContext): Unit = {
+    fragments.foreach {
+      case CapturedLinkFragment(Some(fragment), sources) =>
+        if (content.getElementById(fragment) == null && content.select(s"a[name=$fragment]").isEmpty) {


Wasn't sure what the best way to lookup named a tags was. Could probably lookup all a tags, and then iterate through to see if any have a name of fragment. This is simpler, though with the possibility of escaping issues, for example, it definitely won't work for fragments with ] in them, there may also be other characters that will cause a problem. People don't usually put special characters that won't work in names since they're meant to appear in URLs, and URLs with percent encoding looks ugly, so it may just not be an issue. I think if we find any actual problems with this, we can switch to something that searches for it better.

pvlugter

LGTM, and tried it on Cinnamon docs. There's one fragment that didn't work:

https://kafka.apache.org/documentation.html#producerconfigs

Looks like it's embedded in a handlebars script or something (didn't look closely). We'll just ignore or change that.

jroper · 2019-09-12T06:43:54Z

Yeah you're right, it is. To validate that we'd have to use htmlunit (which is likely to cause bigger problems because its javascript support is only partial) or selenium/webdriver (slow). I don't think it's worth it, I'd say just add an ignore (paradoxIgnorePaths).

jroper mentioned this pull request Sep 10, 2019

Validation and improved error handling #359

Merged

pvlugter approved these changes Sep 10, 2019

View reviewed changes

raboof approved these changes Sep 10, 2019

View reviewed changes

Ensure URL fragments to named anchors also validate

6da4e66

jroper force-pushed the validate-named-fragments branch from d4e4cdd to 6e1c352 Compare September 11, 2019 02:45

Switched to using jsoup to validate fragments

28e907a

Also, modified link validation so links to the same page with different fragments only load/download that page once.

jroper force-pushed the validate-named-fragments branch from 6e1c352 to 28e907a Compare September 11, 2019 04:11

jroper commented Sep 12, 2019

View reviewed changes

pvlugter approved these changes Sep 12, 2019

View reviewed changes

pvlugter merged commit dc86f75 into lightbend:master Sep 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure URL fragments to named anchors also validate #363

Ensure URL fragments to named anchors also validate #363

jroper commented Sep 10, 2019

pvlugter left a comment

pvlugter commented Sep 10, 2019

raboof commented Sep 10, 2019

raboof commented Sep 10, 2019

jroper commented Sep 10, 2019

raboof commented Sep 10, 2019

pvlugter commented Sep 10, 2019

pvlugter commented Sep 10, 2019 •

edited

Loading

jroper commented Sep 11, 2019

jroper commented Sep 11, 2019

pvlugter commented Sep 11, 2019

jroper commented Sep 11, 2019

pvlugter commented Sep 11, 2019

jroper Sep 12, 2019

pvlugter left a comment

jroper commented Sep 12, 2019

Ensure URL fragments to named anchors also validate #363

Ensure URL fragments to named anchors also validate #363

Conversation

jroper commented Sep 10, 2019

pvlugter left a comment

Choose a reason for hiding this comment

pvlugter commented Sep 10, 2019

raboof commented Sep 10, 2019

raboof commented Sep 10, 2019

jroper commented Sep 10, 2019

raboof commented Sep 10, 2019

pvlugter commented Sep 10, 2019

pvlugter commented Sep 10, 2019 • edited Loading

jroper commented Sep 11, 2019

jroper commented Sep 11, 2019

pvlugter commented Sep 11, 2019

jroper commented Sep 11, 2019

pvlugter commented Sep 11, 2019

jroper Sep 12, 2019

Choose a reason for hiding this comment

pvlugter left a comment

Choose a reason for hiding this comment

jroper commented Sep 12, 2019

pvlugter commented Sep 10, 2019 •

edited

Loading