Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for unexpected decoding errors. #59

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions pythonwhois/net.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ def get_whois_raw(domain, server="", previous=None, rfc3490=True, never_cut=Fals
# The following is a bit hacky, but IANA won't return the right answer for example.com because it's a direct registration.
"example.com": "whois.verisign-grs.com"
}

if rfc3490:
if sys.version_info < (3, 0):
domain = encode( domain if type(domain) is unicode else decode(domain, "utf8"), "idna" )
Expand Down Expand Up @@ -71,7 +71,7 @@ def get_whois_raw(domain, server="", previous=None, rfc3490=True, never_cut=Fals
return (new_list, server_list)
else:
return new_list

def get_root_server(domain):
data = whois_request(domain, "whois.iana.org")
for line in [x.strip() for x in data.splitlines()]:
Expand All @@ -80,7 +80,7 @@ def get_root_server(domain):
continue
return match.group(1)
raise shared.WhoisException("No root WHOIS server found for domain.")

def whois_request(domain, server, port=43):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((server, port))
Expand All @@ -91,4 +91,10 @@ def whois_request(domain, server, port=43):
if len(data) == 0:
break
buff += data
return buff.decode("utf-8")
encodings = ("utf-8", "iso-8859-1")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should add WIN-1252 (aka LATIN-1 aka 'cp1252' in Python), very similar to iso-8859-1 with some differences. Supposedly, common as well.

for encoding in encodings: # This should probably not be a permanent solution.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for that comment IMO. This is a perfectly fine solution. The RFC does not specify an encoding (rather comments on it as an issue), so the original assumption that it's valid UTF-8 data is where the problem lies.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I also think the comment can be removed

try:
return buff.decode(encoding)
except ValueError:
pass
raise ValueError("Could not decode whois response from {server}".format(server=server))