Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use gen_random_uuid() for CockroachDB INSERTs #807

Merged
merged 2 commits into from
Jan 24, 2023

Conversation

alnr
Copy link
Contributor

@alnr alnr commented Jan 20, 2023

Using the gen_random_uuid() builtin can significantly speed up inserts because the primary key uniqueness constraint check is elided.

I've also fixed up code where there was a prepared statement being created for just one query, which is super inefficient.

I'm kinda having trouble running the tests locally, so I'd appreciate a workflow approval too. @aeneasr @sio4 👍

@aeneasr
Copy link
Member

aeneasr commented Jan 23, 2023

Approval given :)

You can run the tests locally using ./test.sh which executes tests against all DBs, or using:

go build -v -tags sqlite -o tsoda ./soda

SODA_DIALECT=cockroach

./tsoda drop -e $SODA_DIALECT -c ./database.yml -p ./testdata/migrations
./tsoda create -e $SODA_DIALECT -c ./database.yml -p ./testdata/migrations
./tsoda migrate -e $SODA_DIALECT -c ./database.yml -p ./testdata/migrations

go test -cover -race -tags sqlite -count=1 $args ./...

Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very nice already! Do we have a test showing that the ID is generated correctly for Cockroach? I do believe that such tests exist, but it would be good to have proof that we don't introduce regressions :)

}
txlog(logging.SQL, c, query, model.Value)
stmt, err := c.Store.PrepareNamed(query)
rows, err := c.Store.NamedQueryContext(model.ctx, query, model.Value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, it makes no sense to use prepared statements if they're not cached.

@alnr alnr force-pushed the cockroach-uuid branch 4 times, most recently from cf82f4b to 4d9ddfc Compare January 23, 2023 22:08
@alnr
Copy link
Contributor Author

alnr commented Jan 23, 2023

  • A couple of tests were not idempotent. I think I've fixed them all.
  • Fixed missing Close on sql.Rows
  • Added more explicit checks about UUID generation to test

ready for review :)

@alnr alnr marked this pull request as ready for review January 23, 2023 22:10
Copy link
Member

@sio4 sio4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alnr,
Thank you for your PR.

Before reviewing the main topic of this PR, please remove changes regarding testing configurations (database.yml and docker-compose.yml). I don't think they are parts of the topic of this PR. Changing the supported database version should have more consideration around other conditions and may not be a part of this feature PR.

Please keep PR as simple as possible with a single point of interest.

database.yml Outdated Show resolved Hide resolved
docker-compose.yml Outdated Show resolved Hide resolved
@sio4 sio4 self-assigned this Jan 24, 2023
@sio4 sio4 added the enhancement New feature or request label Jan 24, 2023
Copy link
Member

@sio4 sio4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sio4
Copy link
Member

sio4 commented Jan 24, 2023

Hi @alnr,

I am just wondering. What could be the cons and pros of using gen_random_uuid() for generating a random UUID ID on the server side?

Personally, I prefer to use application-side actions (spending the application server's CPU) rather than offloading CPU tasks to the database side since the application is more flexible to spread out (scaling) than controlling database load unless the action should be taken on the database side for integrity or similar things. Also for the main topic of this PR, it is easier to maintain the code if we use the same generic method in the generic dialect.

Could you please explain your idea and/or considerations?

significantly speed up inserts because the primary key uniqueness constraint check is elided

Do you have any data regarding this? Having the real or test data here would be great!

@sio4 sio4 self-requested a review January 24, 2023 16:17
@alnr
Copy link
Contributor Author

alnr commented Jan 24, 2023

I am just wondering. What could be the cons and pros of using gen_random_uuid() for generating a random UUID ID on the server side?

Pros:

  • primary key uniqueness constraint check will be elided on insert. This makes a huge difference in CockroachDB multiregion databases. If you don't use gen_random_uuid(), the uniqueness constraint will be verified prior to insert on all regional shards, which requires contacting them. Across continents, that's multiple hundred milliseconds. I've confirmed this with a US-EU-APAC cluster on a hosted enterprise Cockroach cluster.

Cons:

  • can't think of any. Generating a UUID is free. Doesn't matter if the app or the DB does it.

Personally, I prefer to use application-side actions (spending the application server's CPU) rather than offloading CPU tasks to the database side since the application is more flexible to spread out (scaling) than controlling database load unless the action should be taken on the database side for integrity or similar things. Also for the main topic of this PR, it is easier to maintain the code if we use the same generic method in the generic dialect.

After this change, we burn a lot fewer CPU cycles anyway, because the DB does not have to verify the uniqueness constraint on the primary key. Checking that constraint takes approximately infinity times more time than generating a UUID.

@alnr
Copy link
Contributor Author

alnr commented Jan 24, 2023

Not sure what's going on with that failing test 🤷

@sio4
Copy link
Member

sio4 commented Jan 24, 2023

Not sure what's going on with that failing test shrug

Don't worry. The short test on the Windows machine is failing many times, and I don't think it is related to your PR. We need to make it better but actually, the Windows environment is too hard. :-)

For the main topic, I quickly researched the Cockroach's resources and I found the followings:

It seems like we can take another approach for this, not for right now but for later:

  • improve the generic routine can allow empty UUID id value (with a new option)
  • improve fizz to create a better CREATE statement with the default gen_random_uuid() (or manually?)

I think this approach will make Pop simple without losing the database side performance. Any ideas or suggestions will help us greatly!

Checking that constraint takes approximately infinity times more time than generating a UUID.

Yeah, could be in a global cluster across the universe 😆

@sio4 sio4 added this to the v6.2.0 milestone Jan 24, 2023
@sio4 sio4 merged commit ec9229d into gobuffalo:main Jan 24, 2023
@alnr
Copy link
Contributor Author

alnr commented Jan 24, 2023

Thanks for merging!

Honestly I would keep the code as-is for now. Not everyone uses fizz to create schemas. Regarding the empty UUID value: I don't think allowing null UUIDs makes sense for the ID (primary key) column. It makes a lot of sense for other columns, of course, but that's not where this optimization is most useful anyway.

@alnr alnr mentioned this pull request Jan 24, 2023
@sio4
Copy link
Member

sio4 commented Jan 26, 2023

Honestly I would keep the code as-is for now. Not everyone uses fizz to create schemas. Regarding the empty UUID value: I don't think allowing null UUIDs makes sense for the ID (primary key) column.

Yes, for now. and...
Maybe my explanation is not enough. I don't mean ID can be null UUID. Currently, we remove an integer ID field from the model when executing Create() because we assume auto-increment is enabled on the database side. With the same concept, if we know (with option or anything) the database table was created with a UUID key with DEFAULT like below (example taken from the cockroach's document), we can just remove the ID field so the database will handle it.

CREATE TABLE movr.max_schema.vehicles (
      id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
      type movr.max_schema.vtype,
      creation_time TIMESTAMPTZ,
      available BOOL,
      last_location STRING
  );

The function or method to create an auto key can be different by database engines, but we can make the Create() routine the same.

There are also related issues (opposite actually) for the integer ID that cannot be preserved when if the user sets ID explicitly and I think those issues can be considered together with it. (#382, #586)

@alnr alnr mentioned this pull request May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants