Skip to content

The "validators.url" function fails with paths longer than 64 characters #257

@dgilm

Description

@dgilm

Here is an example:

>>> from validators import url

>>> url("https://www.example-domain.com/us/blog/the-longest-path-i-can-even-write-with-more-than-64-chars")
ValidationFailure(
    func=url,
    args={
        'reason': "encoding with 'idna' codec failed (UnicodeError: label too long)",
        'value': 'https://www.example-domain.com/us/blog/the-longest-path-i-can-even-write-with-more-than-64-chars'
    }
)

>>> url("https://www.example-domain.com/us/blog/the-longest-path-i-can-even-write-with-more-than-64-ch")
True

This is due to IDNA encoding used (Internationalized Domain Names in Applications) for paths, which specifies a maximum number of 64 characters:
https://github.com/python-validators/validators/blob/0.20.9/validators/url.py#L106

Here is where the UnicodeError is raised from the standard lib:
https://github.com/python/cpython/blob/main/Lib/encodings/idna.py#L99,L101

Although this limitation makes sense for domains, I don't know if it's valid for URL paths. The internet is full of that kind of URLs.

Metadata

Metadata

Assignees

Labels

bugIssue: Works not as designed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions