-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for unexpected decoding errors. #59
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,7 +14,7 @@ def get_whois_raw(domain, server="", previous=None, rfc3490=True, never_cut=Fals | |
# The following is a bit hacky, but IANA won't return the right answer for example.com because it's a direct registration. | ||
"example.com": "whois.verisign-grs.com" | ||
} | ||
|
||
if rfc3490: | ||
if sys.version_info < (3, 0): | ||
domain = encode( domain if type(domain) is unicode else decode(domain, "utf8"), "idna" ) | ||
|
@@ -71,7 +71,7 @@ def get_whois_raw(domain, server="", previous=None, rfc3490=True, never_cut=Fals | |
return (new_list, server_list) | ||
else: | ||
return new_list | ||
|
||
def get_root_server(domain): | ||
data = whois_request(domain, "whois.iana.org") | ||
for line in [x.strip() for x in data.splitlines()]: | ||
|
@@ -80,7 +80,7 @@ def get_root_server(domain): | |
continue | ||
return match.group(1) | ||
raise shared.WhoisException("No root WHOIS server found for domain.") | ||
|
||
def whois_request(domain, server, port=43): | ||
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) | ||
sock.connect((server, port)) | ||
|
@@ -91,4 +91,10 @@ def whois_request(domain, server, port=43): | |
if len(data) == 0: | ||
break | ||
buff += data | ||
return buff.decode("utf-8") | ||
encodings = ("utf-8", "iso-8859-1") | ||
for encoding in encodings: # This should probably not be a permanent solution. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need for that comment IMO. This is a perfectly fine solution. The RFC does not specify an encoding (rather comments on it as an issue), so the original assumption that it's valid UTF-8 data is where the problem lies. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, I also think the comment can be removed |
||
try: | ||
return buff.decode(encoding) | ||
except ValueError: | ||
pass | ||
raise ValueError("Could not decode whois response from {server}".format(server=server)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should add WIN-1252 (aka LATIN-1 aka
'cp1252'
in Python), very similar to iso-8859-1 with some differences. Supposedly, common as well.