Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for unexpected decoding errors. #59

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

obskyr
Copy link

@obskyr obskyr commented Nov 3, 2014

Now pythonwhois no longer raises a UnicodeDecodeError whenever a response uses a non-UTF-8 encoding. It instead goes through UTF-8 and ISO-8859-1, and is easily extensible to other encodings Python supports. If it goes through the entire list of encodings without managing to decode the page, it raises a ValueError with the server in it for easy error spotting.

This should fix issue #57, but I'll leave that determination to someone else. As noted in that issue's comments, there could be other ways to do this.

Now no longer raises a UnicodeDecodeError when running into invalid
UTF-8 characters, but tries iso-8859-1 first instead.
@obskyr
Copy link
Author

obskyr commented Nov 3, 2014

@joepie91 joepie91 mentioned this pull request Sep 6, 2015
@joepie91
Copy link
Owner

joepie91 commented Sep 6, 2015

Could you have a look at #97 and, if you're still interested in working on this issue, verify that your solution works for all usecases in all supported Python versions? Thanks!

@obskyr
Copy link
Author

obskyr commented Sep 21, 2015

Just ran all the tests, both in my fork and the current latest official version. The tests all pass in Python 2, and some don't in Python 3 - but this PR doesn't break anything; those were broken even in the official version.

What this PR does fix, however, is get_whois for certain domains. pythonwhois.get_whois("sesise.com") (example taken from issue #57), for example, fails in the current latest version with a UnicodeDecodeError, while this version works just fine and gets the whois without problem. Both on Python 2 and 3.

@@ -91,4 +91,10 @@ def whois_request(domain, server, port=43):
if len(data) == 0:
break
buff += data
return buff.decode("utf-8")
encodings = ("utf-8", "iso-8859-1")
for encoding in encodings: # This should probably not be a permanent solution.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for that comment IMO. This is a perfectly fine solution. The RFC does not specify an encoding (rather comments on it as an issue), so the original assumption that it's valid UTF-8 data is where the problem lies.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I also think the comment can be removed

@dbrandt
Copy link

dbrandt commented Sep 30, 2015

Also: as a final catch-all instead of an error, one might normalise the data before conversion.
@joepie91 I know I just jumped into this, but in my eyes this is the best of the proposed solutions (so far).

@obskyr
Copy link
Author

obskyr commented Oct 12, 2015

Any progress on this decision as of yet?

@@ -91,4 +91,10 @@ def whois_request(domain, server, port=43):
if len(data) == 0:
break
buff += data
return buff.decode("utf-8")
encodings = ("utf-8", "iso-8859-1")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should add WIN-1252 (aka LATIN-1 aka 'cp1252' in Python), very similar to iso-8859-1 with some differences. Supposedly, common as well.

@floer32
Copy link

floer32 commented Apr 6, 2016

I wanted to try and run this, maybe rebased onto latest master, and see if I had same issues with tests failing. Blocked because the python3.3 testenv isn't working, to begin with ... Agree that it doesn't seem to be from the contents of this PR though, appears like it could have been an existing problem ...

@floer32
Copy link

floer32 commented Apr 6, 2016

OK. I got it working with pyenv...

# one-time:
pyenv install 3.3.6

# do this before each working session starts 
pyenv shell 3.3.6

tox

This works on bb57c22 ( @obskyr this is your commit) ... as well as on master. In both cases all pass:

  py26: commands succeeded
  py27: commands succeeded
  py33: commands succeeded
  congratulations :)

So IMHO this would be ready for merge??

@m3nu
Copy link
Contributor

m3nu commented Jul 17, 2016

This could be merged until a better solution is available. Just used it for a few days and seems fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants