Semicolons should be legal in URL #2382

Siskin-Bot · 2020-02-15T18:10:22Z

Submitted by: Hostilefork

Semicolons in URLs are apparently legal:

https://stackoverflow.com/questions/1178024/can-a-url-contain-a-semi-colon

However, Rebol doesn't consider them to be part of a URL! when LOAD-ing, because the semicolon acts as a to-end-of-line comment.

r3-alpha>> u: http://example.com/foo;bar
== http://example.com/foo

Proposal would be that since a URL is delimited at its end by whitespace, that until whitespace is seen all characters are considered part of the content. This would match how a string doesn't consider a semicolon to be a comment if it is inside its delimiters, e.g. {foo ; not a comment}

^{Imported from: metaeducation#2381}

Comments:

Oldes commented on Jun 14, 2019:

It's not just a semicolon... Rebol stops also with any of the delimiter chars, like [ and (

>> load {http://httpbin.org/get?q=foo()boo}
== [http://httpbin.org/get?q=foo () boo]

But I'm quite not sure if I like this proposal, because semicolon and other mentioned chars should be url-encoded, when you want to load it and if you have input from other sources, you should validate it anyway. One can always use this:

>> u: to-url {http://httpbin.org/get?q=foo;[()]boo}
== http://httpbin.org/get?q=foo%3B%5B%28%29%5Dboo

>> u: append http://httpbin.org/get?q= {foo;[()]boo}
== http://httpbin.org/get?q=foo%3B%5B%28%29%5Dboo

>> form u
== "http://httpbin.org/get?q=foo;[()]boo"

On the other side, the change may not be breaking too much existing data/code. But it is still change in the lexer, which I try to avoid personally.

Hostilefork commented on Jun 14, 2019:

Good point about the brackets...although with the plan that most working on R3-Alpha code had agreed on, only ] and ) would be able to terminate a token. There would be 4 exceptions: ][, )(, ](, and )[. The idea that it would provide more lexical expansion possibilities in the future, if you could someday define what xy"abc" meant as being different from xy "abc".

metaeducation#2094

But we don't want to sacrifice [1 2 http://3] as meaning the expected thing.

I feel like the other compromises in Rebol, like saying {a {b} c} is a legal string, may justify something like [1 2 http://httpbin.org/get?q=foo;[()]boo] working as a 3-element block, with a complete URL.

But one thing we can do to punt on the question is to just make the sequence illegal for now. If you see http://foo[ or http://foo; then make that an error. We are planning errors on things like [3()4] anyway.

IngoHohmann commented on Feb 5, 2020:

>> to text! read to url! {https://httpbin.org/anything?a={"x":"y"}} 
== {{
  "args": {
    "a": ""
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Charset": "utf-8", 
    "Host": "httpbin.org", 
    "User-Agent": "REBOL", 
    "X-Amzn-Trace-Id": "Root=1-5e3b13d1-1b00ba3c41d343bcd6626578"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "134.101.146.93", 
  "url": "https://httpbin.org/anything?a="
}
}

It works, if it is run through ENHEX.

>> to text! read to url! enhex {https://httpbin.org/anything?a={"x":"y"}}
== {{
  "args": {
    "a": "{\"x\":\"y\"}"
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Charset": "utf-8", 
    "Host": "httpbin.org", 
    "User-Agent": "REBOL", 
    "X-Amzn-Trace-Id": "Root=1-5e3b13ea-462eb19c983b8d68a906994c"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "134.101.146.93", 
  "url": "https://httpbin.org/anything?a={\"x\":\"y\"}"
}
}

According to https://developer.mozilla.org/en-US/docs/Glossary/percent-encoding braces do not need to be encoded.

Copied here from: metaeducation/ren-c#1046
See also #2207, #1644
In #2012 also #1327, #1333 and #1644 are mentioned.

IngoHohmann mentioned this issue on Feb 5, 2020:
url!s are cut at curly braces when reading

Oldes commented on Feb 6, 2020:

@IngoHohmann your example seems to be working in my branch:

Oldes commented on Feb 6, 2020:

@IngoHohmann btw... I would use something like this:

read join https://httpbin.org/anything? {a={"x":"y"}}

instead:

read to url! {https://httpbin.org/anything?a={"x":"y"}}

And when posting issues here, you could try to use Rebol code... to text! is Ren-C's feature.

The text was updated successfully, but these errors were encountered:

Oldes · 2020-02-17T09:05:21Z

With recent addition of the new as native (Oldes/Rebol3@d27e4b1), it is now possible also:

>> as url! "http://example.com/foo;bar"
== http://example.com/foo%3Bbar

Siskin-Bot mentioned this issue Feb 15, 2020

WRITE should accept more datatypes for DATA. #2007

Closed

Oldes closed this as completed Feb 17, 2020

This was referenced Apr 13, 2022

URL scheme characters admitted by DECODE-URL more restrictive than those admitted by TRANSCODE #1327

Open

DECODE-URL ignores restriction on passwords #1333

Closed

Oldes mentioned this issue Jul 15, 2024

ENHEX is not compatible with Red language #2605

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semicolons should be legal in URL #2382

Semicolons should be legal in URL #2382

Siskin-Bot commented Feb 15, 2020

Oldes commented Feb 17, 2020