-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add timeout support for sbcl & Add support to encode unicode characters in uri path #132
base: master
Are you sure you want to change the base?
Conversation
jingtaozf
commented
May 31, 2023
- add timeout support for sbcl.
- Add support to encode unicode characters in uri path.
Merge latest code from edicl/drakma
(with-output-to-string (*standard-output*) | ||
(loop for c across uri-string | ||
if (> (char-code c) 255) | ||
;; It's not a latin-1 character, so we need to encode it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URLs must only contain US-ASCII characters, everything else must be encoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drakma raised an exception when encountering URLs with Unicode characters in the path or the query parameters. To make URLs more accessible for non-English users, many websites have tried to incorporate Unicode characters in these sections of the URLs, even the HTTP protocol says a URL only contain US-ASCII characters.
I wonder whether we need to support it inside Drakma, if not, I'll try to revert related code change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @jingtaozf,
I understand what you're trying to accomplish. What I mean to say is that in the encoded URL, only US-ASCII characters are permitted, but you're checking for (> (char-code c) 255)
, which would pass non-US-ASCII characters as well. There also is the issue of determining the correct encoding for those characters. Nowadays, UTF-8 can mostly be assumed, but some web servers may actually try to use the Content-Type to determine the encoding. Some experimentation will be needed, I think.
In any case, I'd recommend that you check for (> (char-code c) 126)
and encode using percent encoding using UTF-8.
-Hans