Redo path parsing for non-special URLs #212

annevk · 2017-01-20T11:34:25Z

As part of #148 and #185 we noticed that if we want to keep parsing non-special URLs in a way that preserves the ability for them to be base/relative URLs we need to make some changes, that would also bring them closer to the original RFCs.

I tried to do that as part of #185 but failed multiple times, so I'm going to make it a dedicated issue. What I think needs to happen is that we stop eating "/" early on. I think we need to drop the "path start state" as well.

What we want is that for special URLs we ensure that path has at least one item (potentially the empty string) and that for non-special URLs it can be empty if there is no path (e.g., as in foo://x#x having no path, but http://x#x becoming http://x/#x because special).

Currently "relative slash state" goes down "special authority ignore slashes state" even for non-special URLs. That means ///x against foo://y results in foo://x rather than foo:///x. That also needs to be fixed.

The text was updated successfully, but these errors were encountered:

See whatwg/url#212.

annevk · 2017-01-21T10:02:13Z

With these changes I get passing tests again (apart from five tests that require the opaque host changes to land):

diff --git a/lib/URL-impl.js b/lib/URL-impl.js
index 9e7d67c..0a394a3 100644
--- a/lib/URL-impl.js
+++ b/lib/URL-impl.js
@@ -136,6 +136,10 @@ exports.implementation = class URLImpl {
       return this._url.path[0];
     }
 
+    if (this._url.path.length === 0) {
+      return "";
+    }
+
     return "/" + this._url.path.join("/");
   }
 
diff --git a/src/url-state-machine.js b/src/url-state-machine.js
index 4534bd6..85bb11d 100644
--- a/src/url-state-machine.js
+++ b/src/url-state-machine.js
@@ -680,11 +680,13 @@ URLStateMachine.prototype["parse relative"] = function parseRelative(c) {
 };
 
 URLStateMachine.prototype["parse relative slash"] = function parseRelativeSlash(c) {
-  if (c === p("/") || (isSpecial(this.url) && c === p("\\"))) {
+  if (isSpecial(this.url) && (c === p("/") || c === p("\\"))) {
     if (c === p("\\")) {
       this.parseError = true;
     }
     this.state = "special authority ignore slashes";
+  } else if(c === p("/")) {
+    this.state = "authority";
   } else {
     this.url.username = this.base.username;
     this.url.password = this.base.password;
@@ -930,12 +932,27 @@ URLStateMachine.prototype["parse file host"] = function parseFileHost(c, cStr) {
 };
 
 URLStateMachine.prototype["parse path start"] = function parsePathStart(c) {
-  if (isSpecial(this.url) && c === p("\\")) {
-    this.parseError = true;
-  }
-  this.state = "path";
-  if (c !== p("/") && !(isSpecial(this.url) && c === p("\\"))) {
-    --this.pointer;
+  if (!isSpecial(this.url)) {
+    if (!this.stateOverride && c === p("?")) {
+      this.url.query = "";
+      this.state = "query";
+    } else if (!this.stateOverride && c === p("#")) {
+      this.url.fragment = "";
+      this.state = "fragment";
+    } else if (!isNaN(c)) {
+      if (c !== p("/")) {
+        --this.pointer;
+      }
+      this.state = "path";
+    }
+  } else {
+    if (c !== p("/") && c !== p("\\")) {
+      --this.pointer;
+    }
+    if(c === p("\\")) {
+      this.parseError = true;
+    }
+    this.state = "path";
   }
 
   return true;
@@ -1096,7 +1113,7 @@ function serializeURL(url, excludeFragment) {
 
   if (url.cannotBeABaseURL) {
     output += url.path[0];
-  } else {
+  } else if (url.path.length !== 0) {
     output += "/" + url.path.join("/");
   }

annevk · 2017-01-21T10:03:38Z

~~However, that probably means I should write more tests, since that seems like it would fail with state override into effect.~~ Done and updated the patch above.

This allows paths to be empty for non-special URLs and also takes that into account when serializing. Tests: web-platform-tests/wpt#4586. Fixes #212.

…sh after the host more compatible https://bugs.webkit.org/show_bug.cgi?id=167317 Source/WebCore: <rdar://problem/29526875> Reviewed by Sam Weinig. This is currently being added to the URL spec in whatwg/url#212 Covered by new API tests. * platform/URLParser.cpp: (WebCore::URLParser::parse): Only add a slash if there wasn't one if the URL has a special scheme. This new behavior matches the old behavior of URL::parse. Tools: Reviewed by Sam Weinig. * TestWebKitAPI/Tests/WebCore/URLParser.cpp: (TestWebKitAPI::TEST_F): git-svn-id: http://svn.webkit.org/repository/webkit/trunk@211058 268f45cc-cd09-0410-ab3c-d52691b4dbfc

This allows paths to be empty for non-special URLs and also takes that into account when serializing. Tests: web-platform-tests/wpt#4586. Fixes #212.

See whatwg/url#212.

Some of these used to be part of #4406. See also whatwg/url#212.

This allows paths to be empty for non-special URLs and also takes that into account when serializing. Tests: web-platform-tests/wpt#4586. Fixes #212.

Updates the pathname getter to return the empty string when the URL's path is empty, which mirrors the change in whatwg/url@b087fe2 which in turn fixed whatwg/url#212. (This was tested as part of the URL Standard change. It is not tested for the Location object as it is not feasible to do so as discussed in the pull request for this change.)

annevk mentioned this issue Jan 20, 2017

Add opaque hosts #185

Merged

annevk added a commit to web-platform-tests/wpt that referenced this issue Jan 20, 2017

Non-special URLs shouldn't eat slashes

3951ebc

See whatwg/url#212.

annevk mentioned this issue Jan 20, 2017

Non-special URL path tests web-platform-tests/wpt#4586

Merged

annevk added the topic: parser label Jan 21, 2017

annevk added a commit that referenced this issue Jan 21, 2017

Change path parsing for non-special URLs

5e6d4e5

This allows paths to be empty for non-special URLs and also takes that into account when serializing. Tests: web-platform-tests/wpt#4586. Fixes #212.

annevk mentioned this issue Jan 21, 2017

Change path parsing for non-special URLs #213

Merged

annevk added a commit that referenced this issue Jan 24, 2017

Change path parsing for non-special URLs

cea4ff8

This allows paths to be empty for non-special URLs and also takes that into account when serializing. Tests: web-platform-tests/wpt#4586. Fixes #212.

annevk added a commit to web-platform-tests/wpt that referenced this issue Jan 24, 2017

Non-special URLs shouldn't eat slashes

b077017

See whatwg/url#212.

annevk added a commit to web-platform-tests/wpt that referenced this issue Jan 27, 2017

Non-special URLs shouldn't eat slashes

2fde830

See whatwg/url#212.

annevk added a commit to web-platform-tests/wpt that referenced this issue Jan 31, 2017

Non-special URL path tests

37a9631

Some of these used to be part of #4406. See also whatwg/url#212.

annevk closed this as completed in #213 Jan 31, 2017

annevk added a commit that referenced this issue Jan 31, 2017

Change path parsing for non-special URLs

b087fe2

This allows paths to be empty for non-special URLs and also takes that into account when serializing. Tests: web-platform-tests/wpt#4586. Fixes #212.

watilde mentioned this issue Mar 21, 2017

url: tracking whatwg url issue #212 nodejs/node#11962

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redo path parsing for non-special URLs #212

Redo path parsing for non-special URLs #212

annevk commented Jan 20, 2017

annevk commented Jan 21, 2017 •

edited

Loading

annevk commented Jan 21, 2017 •

edited

Loading

Redo path parsing for non-special URLs #212

Redo path parsing for non-special URLs #212

Comments

annevk commented Jan 20, 2017

annevk commented Jan 21, 2017 • edited Loading

annevk commented Jan 21, 2017 • edited Loading

annevk commented Jan 21, 2017 •

edited

Loading

annevk commented Jan 21, 2017 •

edited

Loading