Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: remove uniqueness checks for gen_random_uuid() #61132

Merged
merged 1 commit into from
Feb 26, 2021

Conversation

rytaft
Copy link
Collaborator

@rytaft rytaft commented Feb 25, 2021

This commit removes uniqueness checks for UUID columns that are set
to gen_random_uuid(), because uniqueness can be sufficiently guaranteed
by randomness. The probability of collision when using gen_random_uuid()
is extremely low. If users still want the checks, however, they can set
sql.optimizer.uniqueness_checks_for_gen_random_uuid.enabled, a new cluster
setting, to true. By default the setting is false.

Fixes #57790

Release justification: This commit is a low-risk update to new functionality.

Release note (sql change): Added a new cluster setting
sql.optimizer.uniqueness_checks_for_gen_random_uuid.enabled, which controls
creation of uniqueness checks for UUID columns set to gen_random_uuid().
When enabled, uniqueness checks will be added for the UUID column if it
has a unique constraint that cannot be enforced by an index. When disabled,
no uniqueness checks are planned for these columns, and uniqueness is
assumed due to the near-zero collision probability of gen_random_uuid()

@rytaft rytaft requested review from mgartner, RaduBerinde and a team February 25, 2021 17:58
@rytaft rytaft requested a review from a team as a code owner February 25, 2021 17:58
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: You are on 🔥 !!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/exec/execbuilder/testdata/unique, line 1690 at r1 (raw file):

                  label: buffer 1

# We can also detect gen_random_uuid() when it is a projection.

Can you add a test where gen_random_uuid comes from a DEFAULT column expression?


pkg/sql/opt/optbuilder/mutation_builder_unique.go, line 216 at r1 (raw file):

		// the columns has UUID type and returns gen_random_uuid(), unique check not
		// needed.
		requireChecksForGenRandUUID := UniquenessChecksForGenRandomUUIDClusterMode.Get(&mb.b.evalCtx.Settings.SV)

Super nit, but given that the default is false, we may as well check it after the other checks (so we can avoid the check in the common non-UUID case)

@rytaft rytaft force-pushed the uuid-elim branch 2 times, most recently from 1ef5583 to 67fb0e4 Compare February 25, 2021 23:09
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)


pkg/sql/opt/exec/execbuilder/testdata/unique, line 1690 at r1 (raw file):

Previously, RaduBerinde wrote…

Can you add a test where gen_random_uuid comes from a DEFAULT column expression?

Done.


pkg/sql/opt/optbuilder/mutation_builder_unique.go, line 216 at r1 (raw file):

Previously, RaduBerinde wrote…

Super nit, but given that the default is false, we may as well check it after the other checks (so we can avoid the check in the common non-UUID case)

Done.

This commit removes uniqueness checks for UUID columns that are set
to gen_random_uuid(), because uniqueness can be sufficiently guaranteed
by randomness. The probability of collision when using gen_random_uuid()
is extremely low. If users still want the checks, however, they can set
sql.optimizer.uniqueness_checks_for_gen_random_uuid.enabled, a new cluster
setting, to true. By default the setting is false.

Fixes cockroachdb#57790

Release justification: This commit is a low-risk update to new functionality.

Release note (sql change): Added a new cluster setting
sql.optimizer.uniqueness_checks_for_gen_random_uuid.enabled, which controls
creation of uniqueness checks for UUID columns set to gen_random_uuid().
When enabled, uniqueness checks will be added for the UUID column if it
has a unique constraint that cannot be enforced by an index. When disabled,
no uniqueness checks are planned for these columns, and uniqueness is
assumed due to the near-zero collision probability of gen_random_uuid()
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Sorry for the late drive-by, but thought I'd mention that it might be nice for the docs to be clear what the non-obvious consequences of a collision are. For example, I think if you queried for the duplicate rows and the optimizer chose a locality optimized search, you'd only ever see 1 of them. There might be other weird side effects. But I could also not be respecting the low chances of a collision and overly paranoid...

Reviewed 2 of 6 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale)

@craig
Copy link
Contributor

craig bot commented Feb 26, 2021

Build succeeded:

@craig craig bot merged commit ac25d9f into cockroachdb:master Feb 26, 2021
@rytaft
Copy link
Collaborator Author

rytaft commented Feb 26, 2021

I think you're right that we should be sure to call this out in the docs. Added a docs-todo tag (cc @rmloveland)

@andy-kimball
Copy link
Contributor

From Wikipedia:

For example, the number of random version-4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion...

This number is equivalent to generating 1 billion UUIDs per second for about 85 years. A file containing this many UUIDs, at 16 bytes per UUID, would be about 45 exabytes.

...the probability to find a duplicate within 103 trillion version-4 UUIDs is one in a billion.

The chance of a civilization ending asteroid hitting the earth in your lifetime is about 1 in 10 million. That will happen 100 times before we ever find a duplicate UUID, even after having generated 103 trillion of them in the same table.

@RaduBerinde
Copy link
Member

^assuming a perfect random source, which is a tall order :)

@ajstorm
Copy link
Collaborator

ajstorm commented Feb 26, 2021

^assuming a perfect random source, which is a tall order :)

+1. I've seen enough bugs in random number generation to infer that the asteroid ending event isn't our biggest problem :-D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: eliminate uniqueness checks when uniqueness ensured through randomness
6 participants