-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vindex varchar support #1649
Vindex varchar support #1649
Conversation
// GetBytes returns bytes for a given value | ||
func GetBytes(key interface{}) ([]byte, error) { | ||
var buf bytes.Buffer | ||
enc := gob.NewEncoder(&buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think gob hashes the input. You can use this function instead: https://github.com/youtube/vitess/blob/master/go/vt/vtgate/vindexes/hash.go#L118. Should look something like this:
func binHash(key interface{}) ([]byte, error) {
source, ok := key.([]byte)
if !ok {
return nil, fmt.Errorf("unexpected data type for binHash: %T", key)
}
dest := make([]byte, len(source))
block3DES.Encrypt(dest, source)
return dest, nil
}
Review status: 0 of 4 files reviewed at latest revision, 2 unresolved discussions. go/vt/vtgate/vindexes/varchar_hash.go, line 42 [r3] (raw file): go/vt/vtgate/vindexes/varchar_hash.go, line 48 [r3] (raw file): Comments from Reviewable |
go/vt/vtgate/vindexes/varchar_hash.go, line 54 [r5] (raw file): Comments from Reviewable |
Reviewed 2 of 4 files at r4. go/vt/vtgate/vindexes/varchar_hash.go, line 54 [r5] (raw file): So, I think I confused you with my original comment. Basically, the binHash implementation I gave should have just replaced GetBytes. Everything else was fine. The older getHash function for varchar that performed the lower-casing was also fine, except that it would work only if you lower-cased before hashing. Comments from Reviewable |
Review status: 0 of 4 files reviewed at latest revision, 2 unresolved discussions. go/vt/vtgate/vindexes/utf8cihash.go, line 54 [r5] (raw file): go/vt/vtgate/vindexes/varchar_hash.go, line 42 [r3] (raw file): Comments from Reviewable |
LGTM Reviewed 6 of 6 files at r6. Comments from Reviewable |
I've been doing some reading and testing about unicode collations. The good news is that I've found the perfect Go library to perform the correct normalization. It produces a normalized 'key' that is the lexical equivalent of the string.
I'll provide an initial implementation using option 1. If that's insufficient, we can look at the other options. |
Option 1 seems likely to bite us down the road. It may work as a custom vindex plugin for a given user, but it shouldn't be shipped along with Vitess core in that case because it has misleading properties. I could imagine wanting to shard on a varchar that often starts with the same 4 characters, for example an app-specific entity ID string like "uid-1234". For option 3, how would you combine the resulting encrypted chunks? Would you just concatenate and have variable-length keyspace IDs? Can we just use an irreversible fixed-width hash instead, since normalization is not reversible anyway? |
Hmm. Fixed prefix would cause problems even for option 3 because all keys will start with the hash of the first 4 characters. |
What about option 4: checksum the whole normalized key with something like MD5 or SHA-N? |
MD5 looks good. I also looked up CRC, which they say is more efficient, but less random. |
Since vindex behavior should never be changed (once people start using it), I'm going to rename these to more accurately represent what they do: |
The first option would definitely be a problem for use-cases where-in teams use some prefix, in our case lets say a tracking-id always starts from "FMCP".I tried searching for some other options but i think MD5 looks more promising. |
In my earlier companies, we have used Jenkins Hash which ensures evenness in distribution, but came across this post, it seems like it is not that optimized in GO. |
I'm working on the change. You can see progress here https://github.com/sougou/vitess/commits/v3. |
* Increase retries and timeout for expectation (vitessio#1628) This still flakes with the current setup, but the total wait time today is only a maximum of 50ms. With increasing the retries and the time between them, we now retry for up to 5 seconds. Anecdotally, Actions can see stalls up to like 2 seconds so we need a large retry time window here to improve test reliability. This also means we can remove a lot of other sleeps here since the expectation will retry anyway, improving the happy path test performance. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * boost: Implement logic to mimic global routing in Vitess for Boost (vitessio#1636) * boost: Implement logic to mimic global routing in Vitess for Boost In Vitess if there's no keyspace currently active, it's still possible to route a query to the right keyspace. We don't currently support this in Boost which means there's a mismatch between how it runs directly vs. when Boost is used. We need this for supporting planetscale/boost-internal#21 but also need to relax some constraints on Singularity as well as reverting the changes in planetscale/api-bb#6463 & planetscale/api-bb#6476. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Improve error message for no keyspace Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Handle routing cached queries with no keyspace When you install a query with no keyspace configured, we want to be able to match that whether you have a keyspace selected or not. This allows that to work by excluding the keyspace from the hash but matching it after the fact against the given keyspace, or allowing it if it's empty. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> --------- Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * boost: Retry leadership election if it fails (vitessio#1649) When we have an etcd blip right at the moment we also want to do the leadership election here, we don't correctly retry. The change here sets up a separate context we can cancel and then retry so we close the old watcher and spin up a new one in case this happens. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Collations & Evalengine Backport (vitessio#1646) * sqlparser: Update tooling Signed-off-by: Vicent Marti <vmg@strn.cat> * BugFix: Escaping Percentage and Underscore require special handling (vitessio#11823) * test: add failing end to end test for escaped % and _ Signed-off-by: Manan Gupta <manan@planetscale.com> * feat: preserve escaping of \ and % in the parser Signed-off-by: Manan Gupta <manan@planetscale.com> * feat: add end to end test and planner test to verify that % and _ handling is correct in Vitess on vtgate level Signed-off-by: Manan Gupta <manan@planetscale.com> * feat: add test to evaluation engine to verify that backslash and underscore and handled correctly even when escaped Signed-off-by: Manan Gupta <manan@planetscale.com> * feat: remove unrequired changes Signed-off-by: Manan Gupta <manan@planetscale.com> Signed-off-by: Manan Gupta <manan@planetscale.com> Signed-off-by: Vicent Marti <vmg@strn.cat> * collations: fix sorting in UCA900 collations (vitessio#12555) * collations: fix sorting in UCA900 collations When using the fast iterator to _compare_ two strings with an UCA collation, we need to keep in mind that the weights in the collation are in BIG ENDIAN (this is the output format for the weight strings, so we store the weights this way), so comparing them directly will not result in the proper collation order. They need to be byte-swapped before they can be compared with an arithmetic operation! Signed-off-by: Vicent Marti <vmg@strn.cat> * collations: comment Signed-off-by: Vicent Marti <vmg@strn.cat> --------- Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine: Refactorings & fixes (vitessio#12554) * evalengine: add a compiler struct Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine: refactor arithmetic ops Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine: rename type check APIs Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine/json: extract APIs Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine/bit: add comment for behavior Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine/call: make interface an Expr Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine/json: extract conversion API Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine/compare: fix multi-comparison behavior Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine/convert: extract decimal precision API Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine/bit: extract error Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine/bytes: extract toNumericHex Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine/integration: move all tests into their own package Signed-off-by: Vicent Marti <vmg@strn.cat> * collations: expose the charset package Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine: remove unused file Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine: add license headers Signed-off-by: Vicent Marti <vmg@strn.cat> --------- Signed-off-by: Vicent Marti <vmg@strn.cat> * collations: upgrade to MySQL 8.0.32 (vitessio#12557) * collations: update from MySQL 8.0.32 Signed-off-by: Vicent Marti <vmg@strn.cat> * collations: use an array to store collations Signed-off-by: Vicent Marti <vmg@strn.cat> * collations: remove Init from Collation interface Signed-off-by: Vicent Marti <vmg@strn.cat> * collations: perform eager initialiation and simplify APIs Signed-off-by: Vicent Marti <vmg@strn.cat> * collations: apply utf8mb3 <-> utf8 in both directions Signed-off-by: Vicent Marti <vmg@strn.cat> * collations: expose the Binary collation Signed-off-by: Vicent Marti <vmg@strn.cat> --------- Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine: More cleanup (vitessio#12573) * evalengine: fix multi-comparisons Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine: add a limit for REPEAT Signed-off-by: Vicent Marti <vmg@strn.cat> * evalengine: Additional fixes for edge cases This solves edge cases around decimal promotion which follows a conversion to int64 (not uint64) and then is casted to uint64 for all bitwise operations. Also adds a bunch of tests for the hex() function and implemented conversions for non string like types. It also found more bugs in MySQL CASE handling, yay. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> --------- Signed-off-by: Vicent Marti <vmg@strn.cat> Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com> * boost: update collation APIs Signed-off-by: Vicent Marti <vmg@strn.cat> --------- Signed-off-by: Vicent Marti <vmg@strn.cat> Signed-off-by: Manan Gupta <manan@planetscale.com> Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com> Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * boost: Improve upquery generation (vitessio#1650) This fixes upquery generation for a union with group bys where there's a literal projection by ensuring we match the generation of the query to what we do without a join. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * [boost planner] Union on top of union (vitessio#1645) * Add tests for broken unions Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * attempt to allow building of unions on top of unions Signed-off-by: Andres Taylor <andres@planetscale.com> * use the right columns even for UNION ALL Signed-off-by: Andres Taylor <andres@planetscale.com> * fix derived tables and projections after TopK Signed-off-by: Andres Taylor <andres@planetscale.com> * use group by with count(*) instead of distinct Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr> * flownode: implement proper ordering trace for Readers Signed-off-by: Vicent Marti <vmg@strn.cat> * check one the right node Signed-off-by: Andres Taylor <andres@planetscale.com> * allow expressions from inside derived table to be handled correctly Signed-off-by: Andres Taylor <andres@planetscale.com> * boost: fix indeterministic union parents Signed-off-by: Vicent Marti <vmg@strn.cat> * Removed unused distinct flownode Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> --------- Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Andres Taylor <andres@planetscale.com> Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr> Signed-off-by: Vicent Marti <vmg@strn.cat> Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com> Co-authored-by: Florent Poinsard <florent.poinsard@outlook.fr> Co-authored-by: Vicent Marti <vmg@strn.cat> Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> * Fix test expectation Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> --------- Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Vicent Marti <vmg@strn.cat> Signed-off-by: Manan Gupta <manan@planetscale.com> Signed-off-by: Andres Taylor <andres@planetscale.com> Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr> Co-authored-by: Vicent Martí <42793+vmg@users.noreply.github.com> Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com> Co-authored-by: Andres Taylor <andres@planetscale.com> Co-authored-by: Florent Poinsard <florent.poinsard@outlook.fr> Co-authored-by: Vicent Marti <vmg@strn.cat>
This Pull Request contains the following change:
This change is