Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add timeout support for sbcl & Add support to encode unicode characters in uri path #132

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

(with-output-to-string (*standard-output*)
(loop for c across uri-string
if (> (char-code c) 255)
;; It's not a latin-1 character, so we need to encode it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URLs must only contain US-ASCII characters, everything else must be encoded.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drakma raised an exception when encountering URLs with Unicode characters in the path or the query parameters. To make URLs more accessible for non-English users, many websites have tried to incorporate Unicode characters in these sections of the URLs, even the HTTP protocol says a URL only contain US-ASCII characters.

I wonder whether we need to support it inside Drakma, if not, I'll try to revert related code change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @jingtaozf,

I understand what you're trying to accomplish. What I mean to say is that in the encoded URL, only US-ASCII characters are permitted, but you're checking for (> (char-code c) 255), which would pass non-US-ASCII characters as well. There also is the issue of determining the correct encoding for those characters. Nowadays, UTF-8 can mostly be assumed, but some web servers may actually try to use the Content-Type to determine the encoding. Some experimentation will be needed, I think.

In any case, I'd recommend that you check for (> (char-code c) 126) and encode using percent encoding using UTF-8.

-Hans

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants