Fix for unexpected decoding errors. #59

obskyr · 2014-11-03T07:29:14Z

Now pythonwhois no longer raises a UnicodeDecodeError whenever a response uses a non-UTF-8 encoding. It instead goes through UTF-8 and ISO-8859-1, and is easily extensible to other encodings Python supports. If it goes through the entire list of encodings without managing to decode the page, it raises a ValueError with the server in it for easy error spotting.

This should fix issue #57, but I'll leave that determination to someone else. As noted in that issue's comments, there could be other ways to do this.

Now no longer raises a UnicodeDecodeError when running into invalid UTF-8 characters, but tries iso-8859-1 first instead.

obskyr · 2014-11-03T07:43:14Z

List of issues this should fix:

joepie91 · 2015-09-06T17:12:30Z

Could you have a look at #97 and, if you're still interested in working on this issue, verify that your solution works for all usecases in all supported Python versions? Thanks!

obskyr · 2015-09-21T11:41:48Z

Just ran all the tests, both in my fork and the current latest official version. The tests all pass in Python 2, and some don't in Python 3 - but this PR doesn't break anything; those were broken even in the official version.

What this PR does fix, however, is get_whois for certain domains. pythonwhois.get_whois("sesise.com") (example taken from issue #57), for example, fails in the current latest version with a UnicodeDecodeError, while this version works just fine and gets the whois without problem. Both on Python 2 and 3.

dbrandt · 2015-09-30T09:42:09Z

pythonwhois/net.py

@@ -91,4 +91,10 @@ def whois_request(domain, server, port=43):
 		if len(data) == 0:
 			break
 		buff += data
-	return buff.decode("utf-8")
+	encodings = ("utf-8", "iso-8859-1")
+	for encoding in encodings: # This should probably not be a permanent solution.


No need for that comment IMO. This is a perfectly fine solution. The RFC does not specify an encoding (rather comments on it as an issue), so the original assumption that it's valid UTF-8 data is where the problem lies.

Agreed, I also think the comment can be removed

dbrandt · 2015-09-30T09:47:21Z

Also: as a final catch-all instead of an error, one might normalise the data before conversion.
@joepie91 I know I just jumped into this, but in my eyes this is the best of the proposed solutions (so far).

obskyr · 2015-10-12T11:40:46Z

Any progress on this decision as of yet?

floer32 · 2016-04-06T20:15:19Z

pythonwhois/net.py

@@ -91,4 +91,10 @@ def whois_request(domain, server, port=43):
 		if len(data) == 0:
 			break
 		buff += data
-	return buff.decode("utf-8")
+	encodings = ("utf-8", "iso-8859-1")


Maybe should add WIN-1252 (aka LATIN-1 aka 'cp1252' in Python), very similar to iso-8859-1 with some differences. Supposedly, common as well.

floer32 · 2016-04-06T20:32:42Z

I wanted to try and run this, maybe rebased onto latest master, and see if I had same issues with tests failing. Blocked because the python3.3 testenv isn't working, to begin with ... Agree that it doesn't seem to be from the contents of this PR though, appears like it could have been an existing problem ...

floer32 · 2016-04-06T20:50:39Z

OK. I got it working with pyenv...

# one-time:
pyenv install 3.3.6

# do this before each working session starts 
pyenv shell 3.3.6

tox

This works on bb57c22 ( @obskyr this is your commit) ... as well as on master. In both cases all pass:

  py26: commands succeeded
  py27: commands succeeded
  py33: commands succeeded
  congratulations :)

So IMHO this would be ready for merge??

m3nu · 2016-07-17T15:14:58Z

This could be merged until a better solution is available. Just used it for a few days and seems fine.

Slightly hot fix for decoding errors. Helps joepie91#57.

bb57c22

Now no longer raises a UnicodeDecodeError when running into invalid UTF-8 characters, but tries iso-8859-1 first instead.

obskyr mentioned this pull request Nov 3, 2014

Decoding issues for bidtheatre.com #51

Closed

joepie91 mentioned this pull request Sep 6, 2015

Encoding issues #97

Open

dbrandt reviewed Sep 30, 2015
View reviewed changes

floer32 reviewed Apr 6, 2016
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for unexpected decoding errors. #59

Fix for unexpected decoding errors. #59

obskyr commented Nov 3, 2014

obskyr commented Nov 3, 2014

joepie91 commented Sep 6, 2015

obskyr commented Sep 21, 2015

dbrandt Sep 30, 2015

floer32 Apr 6, 2016

dbrandt commented Sep 30, 2015

obskyr commented Oct 12, 2015

floer32 Apr 6, 2016

floer32 commented Apr 6, 2016

floer32 commented Apr 6, 2016

m3nu commented Jul 17, 2016

Fix for unexpected decoding errors. #59

Are you sure you want to change the base?

Fix for unexpected decoding errors. #59

Conversation

obskyr commented Nov 3, 2014

obskyr commented Nov 3, 2014

joepie91 commented Sep 6, 2015

obskyr commented Sep 21, 2015

dbrandt Sep 30, 2015

Choose a reason for hiding this comment

floer32 Apr 6, 2016

Choose a reason for hiding this comment

dbrandt commented Sep 30, 2015

obskyr commented Oct 12, 2015

floer32 Apr 6, 2016

Choose a reason for hiding this comment

floer32 commented Apr 6, 2016

floer32 commented Apr 6, 2016

m3nu commented Jul 17, 2016