1,2,3-octet/hexadecimal hostnames detected as IPv4 addresses #290

elliotwutingfeng · 2023-05-20T23:46:05Z

The following inputs are recognized as IPv4 addresses due to the use of socket.inet_aton().

1.1.1 -> domain parsed as 1.1.1
1.1 -> domain parsed as 1.1
1 -> domain parsed as 1 (output is still correct nonetheless)

The above is legacy behavior from UNIX's inet_aton for classful networks, a network addressing architecture made obsolete in 1993.

01.01.01.01 -> domain parsed as 01.01.01.01
01.01.01 -> domain parsed as 01.01.01
01.01 -> domain parsed as 01.01
01 -> domain parsed as 01 (output is still correct nonetheless)

0x1.0x1.0x1.0x1 -> domain parsed as 0x1.0x1.0x1.0x1
0x1.0x1.0x1 -> domain parsed as 0x1.0x1.0x1
0x1.0x1 -> domain parsed as 0x1.0x1
0x1 -> domain parsed as 0x1 (output is still correct nonetheless)

Given that tldextract's regex-based ipv4() function only recognizes IPv4 addresses with 4 decimal octets without zero padding, this is probably a bug.

It can be fixed by using socket.inet_pton() in looks_like_ip() instead of socket.inet_aton(). However, it is only supported on Unix/Unix-Like/Windows systems. Some of these systems do not.

A more portable fix would be using ipaddress.IPv4Address, though it is much slower.

If suffix_index == len(labels) == 4, are there any edge cases not covered by IP_RE?

The text was updated successfully, but these errors were encountered:

john-kurkowski · 2023-05-24T17:42:11Z

Thank you for the thorough report.

It can be fixed by using socket.inet_pton() in looks_like_ip() instead of socket.inet_aton(). However, it is only supported on Unix/Unix-Like/Windows systems. Some of these systems do not.

A more portable fix would be using ipaddress.IPv4Address, though it is much slower.

Maybe try socket.inet_pton, and if it's unavailable for the system, fall back to ipaddress.IPv4Address?

…th unicode dots. (#292) - IPv4 addresses with unicode dots are now recognized. Closes #287 - IPv4 addresses must have 4 decimal octets. Closes #290 --------- Co-authored-by: John Kurkowski <john.kurkowski@gmail.com>

elliotwutingfeng changed the title ~~1,2,3-octet hostnames detected as IPv4 addresses~~ 1,2,3-octet/hexadecimal hostnames detected as IPv4 addresses May 21, 2023

elliotwutingfeng mentioned this issue May 25, 2023

Accept only 4 decimal octet IPv4 addresses. Support IPv4 addresses with unicode dots. #292

Merged

john-kurkowski closed this as completed in #292 May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1,2,3-octet/hexadecimal hostnames detected as IPv4 addresses #290

1,2,3-octet/hexadecimal hostnames detected as IPv4 addresses #290

elliotwutingfeng commented May 20, 2023 •

edited

Loading

john-kurkowski commented May 24, 2023

1,2,3-octet/hexadecimal hostnames detected as IPv4 addresses #290

1,2,3-octet/hexadecimal hostnames detected as IPv4 addresses #290

Comments

elliotwutingfeng commented May 20, 2023 • edited Loading

john-kurkowski commented May 24, 2023

elliotwutingfeng commented May 20, 2023 •

edited

Loading