Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semicolons should be legal in URL #2382

Closed
Siskin-Bot opened this issue Feb 15, 2020 · 1 comment
Closed

Semicolons should be legal in URL #2382

Siskin-Bot opened this issue Feb 15, 2020 · 1 comment

Comments

@Siskin-Bot
Copy link
Collaborator

Submitted by: Hostilefork

Semicolons in URLs are apparently legal:

https://stackoverflow.com/questions/1178024/can-a-url-contain-a-semi-colon

However, Rebol doesn't consider them to be part of a URL! when LOAD-ing, because the semicolon acts as a to-end-of-line comment.

r3-alpha>> u: http://example.com/foo;bar
== http://example.com/foo

Proposal would be that since a URL is delimited at its end by whitespace, that until whitespace is seen all characters are considered part of the content. This would match how a string doesn't consider a semicolon to be a comment if it is inside its delimiters, e.g. {foo ; not a comment}


Imported from: metaeducation#2381

Comments:

Oldes commented on Jun 14, 2019:

It's not just a semicolon... Rebol stops also with any of the delimiter chars, like [ and (

>> load {http://httpbin.org/get?q=foo()boo}
== [http://httpbin.org/get?q=foo () boo]

But I'm quite not sure if I like this proposal, because semicolon and other mentioned chars should be url-encoded, when you want to load it and if you have input from other sources, you should validate it anyway. One can always use this:

>> u: to-url {http://httpbin.org/get?q=foo;[()]boo}
== http://httpbin.org/get?q=foo%3B%5B%28%29%5Dboo

>> u: append http://httpbin.org/get?q= {foo;[()]boo}
== http://httpbin.org/get?q=foo%3B%5B%28%29%5Dboo

>> form u
== "http://httpbin.org/get?q=foo;[()]boo"

On the other side, the change may not be breaking too much existing data/code. But it is still change in the lexer, which I try to avoid personally.


Hostilefork commented on Jun 14, 2019:

Good point about the brackets...although with the plan that most working on R3-Alpha code had agreed on, only ] and ) would be able to terminate a token. There would be 4 exceptions: ][, )(, ](, and )[. The idea that it would provide more lexical expansion possibilities in the future, if you could someday define what xy"abc" meant as being different from xy "abc".

metaeducation#2094

But we don't want to sacrifice [1 2 http://3] as meaning the expected thing.

I feel like the other compromises in Rebol, like saying {a {b} c} is a legal string, may justify something like [1 2 http://httpbin.org/get?q=foo;[()]boo] working as a 3-element block, with a complete URL.

But one thing we can do to punt on the question is to just make the sequence illegal for now. If you see http://foo[ or http://foo; then make that an error. We are planning errors on things like [3()4] anyway.


IngoHohmann commented on Feb 5, 2020:

>> to text! read to url! {https://httpbin.org/anything?a={"x":"y"}} 
== {{
  "args": {
    "a": ""
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Charset": "utf-8", 
    "Host": "httpbin.org", 
    "User-Agent": "REBOL", 
    "X-Amzn-Trace-Id": "Root=1-5e3b13d1-1b00ba3c41d343bcd6626578"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "134.101.146.93", 
  "url": "https://httpbin.org/anything?a="
}
}

It works, if it is run through ENHEX.

>> to text! read to url! enhex {https://httpbin.org/anything?a={"x":"y"}}
== {{
  "args": {
    "a": "{\"x\":\"y\"}"
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Charset": "utf-8", 
    "Host": "httpbin.org", 
    "User-Agent": "REBOL", 
    "X-Amzn-Trace-Id": "Root=1-5e3b13ea-462eb19c983b8d68a906994c"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "134.101.146.93", 
  "url": "https://httpbin.org/anything?a={\"x\":\"y\"}"
}
}

According to https://developer.mozilla.org/en-US/docs/Glossary/percent-encoding braces do not need to be encoded.

Copied here from: metaeducation/ren-c#1046
See also #2207, #1644
In #2012 also #1327, #1333 and #1644 are mentioned.


IngoHohmann mentioned this issue on Feb 5, 2020:
url!s are cut at curly braces when reading


Oldes commented on Feb 6, 2020:

@IngoHohmann your example seems to be working in my branch:
image


Oldes commented on Feb 6, 2020:

@IngoHohmann btw... I would use something like this:

read join https://httpbin.org/anything? {a={"x":"y"}}

instead:

read to url! {https://httpbin.org/anything?a={"x":"y"}}

And when posting issues here, you could try to use Rebol code... to text! is Ren-C's feature.


@Oldes
Copy link
Owner

Oldes commented Feb 17, 2020

With recent addition of the new as native (Oldes/Rebol3@d27e4b1), it is now possible also:

>> as url! "http://example.com/foo;bar"
== http://example.com/foo%3Bbar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants