add `shard_results` and `hashtext` functions #2220

tychoish · 2023-12-07T17:11:57Z

Description

I was working on something with GlareDB cloud and I wished to have these functions in glaredb. We can discus names, but basically the operations would be:

`hashtext`

This could easily be fnv() or fnv1a() wold take the raw data of a field, and return the hash value as an unsigned 64 bit integer. If we differ from the postgres function we shouldn't use the same name. I'm partial to 64bit fnv1a but any non-cryptographic hash is fine.

`shard_results`

shard_results(<data>, <num_shards>, <shard_id>)

Data could be any type (we'll use it's byte sequence, no need to cast), num_shards is a positive non-zero integer, and shard_id is a number that is within the [0,<num_shards>) range. The function would return a boolean, and be used in a WHERE clause.

The operation would be, basically hash(<data>) % <num_shards> == <shard_id>.

This function should be implemented in terms of the first.

As future work, It would be interesting if for parquet data sources, to see if it would end up working so that we'd pull the column in question, do the filtering, and then pull the remaining data out?

Use Case

If you have multiple stateless application servers and you want to divide the output of query (which represents some work), into slices (shards) for each application servers, this function can help push that calculation into the database, and reduce the amount of data that's sent to the application.

The text was updated successfully, but these errors were encountered:

Part of #2220

Adds partitioning (sharding) of result sets using the hashing method. Closes #2220

tychoish added the feat New feature or request label Dec 7, 2023

greyscaled added the cloud Issues that affect cloud label Dec 7, 2023

This was referenced Dec 23, 2023

feat: add non-cryptographic hash function #2298

Merged

feat: partition results helper #2299

Merged

tychoish added a commit that referenced this issue Dec 28, 2023

feat: add non-cryptographic hash function (#2298)

026bacd

Part of #2220

tychoish closed this as completed in #2299 Dec 28, 2023

tychoish added a commit that referenced this issue Dec 28, 2023

feat: partition results helper (#2299)

ee4b440

Adds partitioning (sharding) of result sets using the hashing method. Closes #2220

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `shard_results` and `hashtext` functions #2220

add `shard_results` and `hashtext` functions #2220

tychoish commented Dec 7, 2023

add shard_results and hashtext functions #2220

add shard_results and hashtext functions #2220

Comments

tychoish commented Dec 7, 2023

Description

hashtext

shard_results

Use Case

add `shard_results` and `hashtext` functions #2220

add `shard_results` and `hashtext` functions #2220

`hashtext`

`shard_results`