Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode parsing problems #28

Closed
fsufitch opened this issue Jul 1, 2014 · 7 comments
Closed

Unicode parsing problems #28

fsufitch opened this issue Jul 1, 2014 · 7 comments

Comments

@fsufitch
Copy link

fsufitch commented Jul 1, 2014

Looking up some IP addresses seems to lead to an UnicodeDecodeError:

>>> pythonwhois.get_whois('179.175.242.131')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/fsufitch/python-whois/pythonwhois/__init__.py", line 4, in get_whois
    raw_data, server_list = net.get_whois_raw(domain, with_server_list=True)
  File "/home/fsufitch/python-whois/pythonwhois/net.py", line 42, in get_whois_raw
    response = whois_request(request_domain, target_server)
  File "/home/fsufitch/python-whois/pythonwhois/net.py", line 92, in whois_request
    return buff.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 565: invalid continuation byte

Also:

>>> pythonwhois.get_whois('80.148.135.28')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/fsufitch/python-whois/pythonwhois/__init__.py", line 8, in get_whois
    return parse.parse_raw_whois(raw_data, normalized=normalized, never_query_handles=False, handle_server=server_list[-1])
  File "/home/fsufitch/python-whois/pythonwhois/parse.py", line 553, in parse_raw_whois
    data["contacts"] = parse_registrants(raw_data, never_query_handles, handle_server)
  File "/home/fsufitch/python-whois/pythonwhois/parse.py", line 894, in parse_registrants
    contact = fetch_nic_contact(data_reference["handle"], handle_server)
  File "/home/fsufitch/python-whois/pythonwhois/parse.py", line 981, in fetch_nic_contact
    response = net.get_whois_raw(handle, lookup_server)
  File "/home/fsufitch/python-whois/pythonwhois/net.py", line 42, in get_whois_raw
    response = whois_request(request_domain, target_server)
  File "/home/fsufitch/python-whois/pythonwhois/net.py", line 92, in whois_request
    return buff.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdf in position 706: invalid continuation byte

More IPs that break in this way:

  • 177.169.129.223
  • 80.148.135.28
  • 181.1.55.141
  • 91.6.97.224
  • 179.246.105.144
  • 94.92.58.77

I ran into this while testing a separate program on completely random IPs. To reproduce:

import ipaddress, random, traceback
import pythonwhois

randip = lambda: ipaddress.ip_address(random.randint(0,256**4-1)).compressed

for i in range(50):
   ip = randip()
   try:
       pythonwhois.get_whois(ip)
   except UnicodeDecodeError as e:
       print('========', ip, '========')
       traceback.print_exc()
   except Exception as e:
       continue

Thanks!

@fsufitch
Copy link
Author

fsufitch commented Jul 1, 2014

Addendum: this was run with Python 3.4.0.

@joepie91
Copy link
Owner

joepie91 commented Jul 2, 2014

IP range WHOIS is not officially supported yet, see #12. However, it still shouldn't be giving Unicode errors, no matter what you searched for... I'm currently away from home, but I'll have a look at it as soon as I'm either back home or have a laptop at my disposal. That shouldn't be longer than a week and a half.

@fsufitch
Copy link
Author

fsufitch commented Jul 2, 2014

How is IP range WHOIS relevant? Did I imply IP ranges? If I did, that's my bad. Thanks for looking into this.

@joepie91
Copy link
Owner

joepie91 commented Jul 3, 2014

Sorry, I should've been more clear - every IP lookup will be an IP range lookup, since IPs are allocated in blocks (even if the size of those blocks may sometimes be 1), and the relevant WHOIS data is assigned in blocks as well. IP WHOIS isn't currently supported at all - only domains are currently parsed reliably.

@joepie91 joepie91 added the bug label Jul 11, 2014
@joepie91 joepie91 self-assigned this Jul 11, 2014
@catatonik
Copy link

is the below error related to this issue?

Registrant
Traceback (most recent call last):
  File "/usr/local/bin/pwhois", line 115, in <module>
    print("%s %s" % (label, actual_data))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 81-83: ordinal not in range(256)

when pwhois'ing 'english.hk'

afilipovich added a commit to afilipovich/python-whois that referenced this issue Sep 9, 2014
joepie91 added a commit that referenced this issue Oct 5, 2014
Fix for issue #28 Unicode parsing problems
@joncombe
Copy link

joncombe commented Apr 7, 2015

FYI, I'm seeing this error for the domain vatanim.com.tr.

@joepie91
Copy link
Owner

joepie91 commented Sep 6, 2015

I've created a new canonical thread for this in #97.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants