-
Notifications
You must be signed in to change notification settings - Fork 4k
workload: use same key sequence during KV --insert-count #152979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Refactor the KV workload's key generator interface to reduce the interface surface area. The keyGenerator is now a struct, rather than an interface, and a more constrained keyMapper interface encapsulates the logic necessary for mapping the sequential index to the appropriate key. Key generator configuration is held in a keyGeneratorConfig struct, while state that requires mutual exclusion is held in a keyGeneratorState struct. Epic: none Release note: none
RaduBerinde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong and @golgeek)
|
Hrm, I'm realizing there's an issue. |
|
Can we use a pseudorandom way of generating permutations? https://chatgpt.com/share/68b9b11a-18dc-8010-9fe1-990d5c3e731a |
Previously the BatchedTuples used to load initial data for a workload disallowed duplicate keys violating uniqueness constraints. Some of the workload key generators may inevitably generate duplicates when hashing keys because of the relatively small key space of 64-bit integers. This commit updates the BatchedTuples type so that the workload can indicate the possible existence of duplicates, and updates the data loader to use INSERT ... ON CONFLICT DO NOTHING queries to perform the insertion, effectively ignoring the duplicates.
When inserting an initial --insert-count keys, use the configured key schema to insert the keys. This ensures that a subsequent call passing an appropriate --write-seq flag will read the keys inserted through --insert-count. Fix cockroachdb#107874. Epic: none Release note: none
interesting... I think maybe for the uniform generator, yes, but I'm not sure how to adapt the zipfian generator. For now I adapted the data loader to use use the insert data loader if there's the possibility of duplicates, and to configure the insert data loader to use |
Indeed.. a zipfian write pattern necessarily implies we are overwriting keys |
|
I guess ideally for a zipfian key generator, |
Looking at the zipfian generator code, that just means keys |
|
Oof, I didn't even notice that. The ycsb workload will hash the index afterwards, ensuring keys are still randomly distributed across the keyspace. TFTR! bors r=RaduBerinde |
In 1041bf0 (cockroachdb#152979), we converted ycsb and kv workloads to use INSERT statements for initial data loading when hashing keys. Hashing keys can cause key collisions. At higher row counts, these collisions are inevitable. Since IMPORTs require imported data to be unique, an IMPORT with colliding keys would fail. Before cockroachdb#152979, only the ycsb workload was susceptible to this failure mode, because the kv workload did not use the same key generator for data loaded through --insert-count (cockroachdb#107874). Fixing cockroachdb#107874 made the kv workload susceptible to this type of failure. The change to use INSERTs instead of IMPORTs ran afoul of KV message size limits (see cockroachdb#153086). For now we revert the change to automatically use IMPORTs and instead warn about the possibility of collisions. Epic: none Release note: none
153066: cursor: add user-level CLAUDE.md to rule r=benbardin a=benbardin Useful for user-level preferences like gh to access Github, etc. Epic: none Release note: None 153171: workload: use IMPORT by default, but warn against uniqueness violations r=tbg a=jbowens In 1041bf0 (#152979), we converted ycsb and kv workloads to use INSERT statements for initial data loading when hashing keys. Hashing keys can cause key collisions. At higher row counts, these collisions are inevitable. Since IMPORTs require imported data to be unique, an IMPORT with colliding keys would fail. Before #152979, only the ycsb workload was susceptible to this failure mode, because the kv workload did not use the same key generator for data loaded through --insert-count (#107874). Fixing #107874 made the kv workload susceptible to this type of failure. The change to use INSERTs instead of IMPORTs ran afoul of KV message size limits (see #153086). For now we revert the change to automatically use IMPORTs and instead warn about the possibility of collisions. Epic: none Release note: none 153206: pkg/cmd/roachtests: fix disk bandwidth passed to cluster setting r=nicktrav a=nicktrav As of #135019, the disk bandwidth variable has units of bytes, but the string passed to the cluster setting has units of `MiB`. This can result an effective bandwidth that is orders of magnitude too high. Change the former to match the units of the latter. Release note: None. Epic: None. 153293: storage: bump pebble, add new blob rewrite cluster setting r=RaduBerinde a=jbowens Changes: * [`342d2e8b`](cockroachdb/pebble@342d2e8b) db: add high-priority blob file rewrite compactions * [`fff7d3ad`](cockroachdb/pebble@fff7d3ad) db: fix MarkedForCompaction flake * [`d86f6dab`](cockroachdb/pebble@d86f6dab) scripts: more improvements to stress.sh Release note: none. Epic: none. Add a new cluster setting for configuring high-priority blob file rewrite compactions. One of these compactions is allowed to run ahead of a default compaction if the amount of garbage exceeds the configured percentage. Co-authored-by: Ben Bardin <bardin@cockroachlabs.com> Co-authored-by: Jackson Owens <jackson@cockroachlabs.com> Co-authored-by: Nick Travers <travers@cockroachlabs.com>
When inserting an initial --insert-count keys, use the configured key schema to
insert the keys. This ensures that a subsequent call passing an appropriate
--write-seq flag will read the keys inserted through --insert-count.
Fix #107874.
Epic: none
Release note: none