You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was working on something with GlareDB cloud and I wished to have these functions in glaredb. We can discus names, but basically the operations would be:
hashtext
This could easily be fnv() or fnv1a() wold take the raw data of a field, and return the hash value as an unsigned 64 bit integer. If we differ from the postgres function we shouldn't use the same name. I'm partial to 64bit fnv1a but any non-cryptographic hash is fine.
shard_results
shard_results(<data>, <num_shards>, <shard_id>)
Data could be any type (we'll use it's byte sequence, no need to cast), num_shards is a positive non-zero integer, and shard_id is a number that is within the [0,<num_shards>) range. The function would return a boolean, and be used in a WHERE clause.
The operation would be, basically hash(<data>) % <num_shards> == <shard_id>.
This function should be implemented in terms of the first.
As future work, It would be interesting if for parquet data sources, to see if it would end up working so that we'd pull the column in question, do the filtering, and then pull the remaining data out?
Use Case
If you have multiple stateless application servers and you want to divide the output of query (which represents some work), into slices (shards) for each application servers, this function can help push that calculation into the database, and reduce the amount of data that's sent to the application.
The text was updated successfully, but these errors were encountered:
Description
I was working on something with GlareDB cloud and I wished to have these functions in
glaredb
. We can discus names, but basically the operations would be:hashtext
This could easily be
fnv()
orfnv1a()
wold take the raw data of a field, and return the hash value as an unsigned 64 bit integer. If we differ from the postgres function we shouldn't use the same name. I'm partial to 64bit fnv1a but any non-cryptographic hash is fine.shard_results
shard_results(<data>, <num_shards>, <shard_id>)
Data could be any type (we'll use it's byte sequence, no need to cast),
num_shards
is a positive non-zero integer, andshard_id
is a number that is within the[0,<num_shards>)
range. The function would return a boolean, and be used in aWHERE
clause.The operation would be, basically
hash(<data>) % <num_shards> == <shard_id>
.This function should be implemented in terms of the first.
As future work, It would be interesting if for parquet data sources, to see if it would end up working so that we'd pull the column in question, do the filtering, and then pull the remaining data out?
Use Case
If you have multiple stateless application servers and you want to divide the output of query (which represents some work), into slices (shards) for each application servers, this function can help push that calculation into the database, and reduce the amount of data that's sent to the application.
The text was updated successfully, but these errors were encountered: