Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semicolons should be legal in URL #2381

Open
hostilefork opened this issue Jun 14, 2019 · 5 comments
Open

Semicolons should be legal in URL #2381

hostilefork opened this issue Jun 14, 2019 · 5 comments

Comments

@hostilefork
Copy link
Member

Semicolons in URLs are apparently legal:

https://stackoverflow.com/questions/1178024/can-a-url-contain-a-semi-colon

However, Rebol doesn't consider them to be part of a URL! when LOAD-ing, because the semicolon acts as a to-end-of-line comment.

r3-alpha>> u: http://example.com/foo;bar
== http://example.com/foo

Proposal would be that since a URL is delimited at its end by whitespace, that until whitespace is seen all characters are considered part of the content. This would match how a string doesn't consider a semicolon to be a comment if it is inside its delimiters, e.g. {foo ; not a comment}

@Oldes
Copy link

Oldes commented Jun 14, 2019

It's not just a semicolon... Rebol stops also with any of the delimiter chars, like [ and (

>> load {http://httpbin.org/get?q=foo()boo}
== [http://httpbin.org/get?q=foo () boo]

But I'm quite not sure if I like this proposal, because semicolon and other mentioned chars should be url-encoded, when you want to load it and if you have input from other sources, you should validate it anyway. One can always use this:

>> u: to-url {http://httpbin.org/get?q=foo;[()]boo}
== http://httpbin.org/get?q=foo%3B%5B%28%29%5Dboo

>> u: append http://httpbin.org/get?q= {foo;[()]boo}
== http://httpbin.org/get?q=foo%3B%5B%28%29%5Dboo

>> form u
== "http://httpbin.org/get?q=foo;[()]boo"

On the other side, the change may not be breaking too much existing data/code. But it is still change in the lexer, which I try to avoid personally.

@hostilefork
Copy link
Member Author

hostilefork commented Jun 14, 2019

Good point about the brackets...although with the plan that most working on R3-Alpha code had agreed on, only ] and ) would be able to terminate a token. There would be 4 exceptions: ][, )(, ](, and )[. The idea that it would provide more lexical expansion possibilities in the future, if you could someday define what xy"abc" meant as being different from xy "abc".

#2094

But we don't want to sacrifice [1 2 http://3] as meaning the expected thing.

I feel like the other compromises in Rebol, like saying {a {b} c} is a legal string, may justify something like [1 2 http://httpbin.org/get?q=foo;[()]boo] working as a 3-element block, with a complete URL.

But one thing we can do to punt on the question is to just make the sequence illegal for now. If you see http://foo[ or http://foo; then make that an error. We are planning errors on things like [3()4] anyway.

@IngoHohmann
Copy link

>> to text! read to url! {https://httpbin.org/anything?a={"x":"y"}} 
== {{
  "args": {
    "a": ""
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Charset": "utf-8", 
    "Host": "httpbin.org", 
    "User-Agent": "REBOL", 
    "X-Amzn-Trace-Id": "Root=1-5e3b13d1-1b00ba3c41d343bcd6626578"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "134.101.146.93", 
  "url": "https://httpbin.org/anything?a="
}
}

It works, if it is run through ENHEX.

>> to text! read to url! enhex {https://httpbin.org/anything?a={"x":"y"}}
== {{
  "args": {
    "a": "{\"x\":\"y\"}"
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Charset": "utf-8", 
    "Host": "httpbin.org", 
    "User-Agent": "REBOL", 
    "X-Amzn-Trace-Id": "Root=1-5e3b13ea-462eb19c983b8d68a906994c"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "134.101.146.93", 
  "url": "https://httpbin.org/anything?a={\"x\":\"y\"}"
}
}

According to https://developer.mozilla.org/en-US/docs/Glossary/percent-encoding braces do not need to be encoded.

Copied here from: metaeducation/ren-c#1046
See also #2207, #1644
In #2012 also #1327, #1333 and #1644 are mentioned.

@Oldes
Copy link

Oldes commented Feb 6, 2020

@IngoHohmann your example seems to be working in my branch:
image

@Oldes
Copy link

Oldes commented Feb 6, 2020

@IngoHohmann btw... I would use something like this:

read join https://httpbin.org/anything? {a={"x":"y"}}

instead:

read to url! {https://httpbin.org/anything?a={"x":"y"}}

And when posting issues here, you could try to use Rebol code... to text! is Ren-C's feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants