Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uuid.uuid3 and uuid.uuid5 cannot be used with non-UTF-8 names #97856

Closed
thehatmakesbling opened this issue Oct 4, 2022 · 2 comments
Closed
Labels
type-bug An unexpected behavior, bug, or error

Comments

@thehatmakesbling
Copy link

Bug report

Consider a name space fc48656f-2196-4866-ad70-0cf68bf80146 which defines a name as the concatenation of the byte representation of two or more UUIDs.

$ python3 -c 'import uuid; print(repr(uuid.uuid5(namespace = uuid.UUID("fc48656f-2196-4866-ad70-0cf68bf80146"), name = uuid.UUID("61695a88-5a35-48b0-b8f6-89c2c5a77aa8").bytes + uuid.UUID("b1467ea8-0e1c-4e11-9185-e2eaaafc6270").bytes)))'

This raises "TypeError: encoding without a string argument" due to the call to bytes(name, "utf-8"), and is a regression from Python 2, which handles this case correctly:

$ python2 -c 'import uuid; print repr(uuid.uuid5(namespace = uuid.UUID("fc48656f-2196-4866-ad70-0cf68bf80146"), name = uuid.UUID("61695a88-5a35-48b0-b8f6-89c2c5a77aa8").bytes + uuid.UUID("b1467ea8-0e1c-4e11-9185-e2eaaafc6270").bytes))'
UUID('ab74c285-20ad-583e-978a-e26b99ef7c9b')

The current implementation makes it impossible to use these functions with a name that cannot be decoded as a valid UTF-8 string. RFC 4122 makes it clear in section 4.3 that this restriction should not be imposed:

"The concept of name and name space should be broadly construed, and not limited to textual names."

It goes on to state that the name space may define how the name is converted to bytes, leaving the developer completely out of luck if the name space has been defined by someone else:

"Convert the name to a canonical sequence of octets (as defined by the standards or conventions of its name space)"

This is reinforced by the reference implementation, which takes void * and a length as arguments, rather than any string type, and by the definition of an X.500 DN name space that allows DER-encoded names, which also cannot be guaranteed to be representable as UTF-8. The availability of an X.500 DN name space allowing DER-encoded names is also repeated in the uuid module documentation.

Your environment

I have encountered this bug in the following Python versions:

Python 3.10.5 (tags/v3.10.5:f377153, Jun  6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] on win32
Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD64)] on win32
Python 3.5.3 (default, Apr  5 2021, 09:00:41) [GCC 6.3.0 20170516] on linux

And it appears to be present in the python/cpython GitHub repository as of 2022-10-04.

@thehatmakesbling thehatmakesbling added the type-bug An unexpected behavior, bug, or error label Oct 4, 2022
@thehatmakesbling
Copy link
Author

This bug appears to have been reported in the following issues:

uuid(3|5) generation does not accept names which are not utf-8 decodable #94684

gh-94684 uuid3/5 support name argument as bytes #94709

These did not appear in my initial search for related issues. Those issues also do not cite the RFC text that explicitly calls out the functionality in question, so they do not appear to have been classified as a bug. The commit attached to #94709 appears to resolve it.

@JelleZijlstra
Copy link
Member

Duplicate of #94684.

@JelleZijlstra JelleZijlstra closed this as not planned Won't fix, can't repro, duplicate, stale Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants