Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow uuid_str() to take any string or blob #99

Closed
terefang opened this issue Oct 25, 2023 · 8 comments
Closed

allow uuid_str() to take any string or blob #99

terefang opened this issue Oct 25, 2023 · 8 comments

Comments

@terefang
Copy link

today this happens:

sqlean> select uuid_str(md5('x'));
9dd4e461-268c-8034-f5c8-564e155c67a6
sqlean> select uuid_str(sha1('x'));

sqlean> select uuid_str(sha3('x'));

sqlean> select uuid_str(sha256('x'));

sqlean> select uuid_str(sha512('x'));

sqlean> 

if the string or blob is at least 16 bytes long uuid_str() could just take the first 16 bytes and ignore the rest.

in addition a shortcut could be dedicated functions like:

  • uuid_str_md5(data)
  • uuid_str_sha1(data)
  • uuid_str_sha3(data)
  • uuid_str_sha256(data)
  • uuid_str_sha512(data)
@nalgeon
Copy link
Owner

nalgeon commented Nov 1, 2023

uuid_str only works with valid UUIDs. And why would you want to create a UUID from the first 16 bytes of the SHA-256 hash? What's the use case here?

@terefang
Copy link
Author

terefang commented Nov 2, 2023

like me reference UUIDv3, v5 and v8:

A UUID is generated based on an unspecified name. Names are unique identifiers for an object, resource or similar within an assigned namespace. Starting from a UUID for the namespace, a UUID is generated from the name by forming a byte sequence from the namespace UUID and the name itself and then hashing this byte sequence using MD5 or SHA1. The hash is then distributed among the available UUID bits in a defined manner.

  • UUIDv3 are created from the output of a MD5 hash
  • UUIDv5 are created from the output of a SHA1 hash
  • UUIDv8 are created from arbitrary byte sequences.

one could call the function more precisely:

  • uuidv3_str_md5(data)
  • uuidv5_str_sha1(data)
  • uuidv8_str_sha3(data)
  • uuidv8_str_sha256(data)
  • uuidv8_str_sha512(data)

@terefang
Copy link
Author

terefang commented Nov 2, 2023

my particular real-world use case is that i create "stable unique ids" out of the concatenation of various text-fields in the row, which are then easier to join and reference and can also act as safe ids in other protocols (like rest-urls).

today i have to do this outside of sqlite with scripting re-writing the csv-import.

it would be much simpler to do the csv-import and then issue UPDATE table SET XID=uuidv8_str_sha512(f1 || f2 || f3).

@terefang
Copy link
Author

terefang commented Nov 6, 2023

i have looked at the code ... a quick win would be to just check if the parameter is "at least" and not "exactly" 16 bytes.

@nalgeon would you agree ?

@nalgeon
Copy link
Owner

nalgeon commented Nov 12, 2023

Sorry, I don't like the idea of uuid_str truncating its argument. You can always use substr(x, 1, 16) on the hashcode and then call uuid_str on the result.

@nalgeon nalgeon closed this as not planned Won't fix, can't repro, duplicate, stale Nov 12, 2023
@terefang
Copy link
Author

would you accept a pull request for uuid_str_<HASHALGO> ?

@nalgeon
Copy link
Owner

nalgeon commented Nov 16, 2023

No, I don't think so, sorry.

@terefang
Copy link
Author

i dont understand

  • i have presented a use-case
  • i have given standard reference

i was only interested of also making other users lifes easer.

sqlean is a really useful contribution to the sqlite community and i hope you keep up the existing work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants