Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl: UUID clash while importing data #61203

Closed
wma1729 opened this issue Feb 26, 2021 · 2 comments · Fixed by #61214
Closed

importccl: UUID clash while importing data #61203

wma1729 opened this issue Feb 26, 2021 · 2 comments · Fixed by #61214
Assignees
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-triaged blathers was able to find an owner

Comments

@wma1729
Copy link

wma1729 commented Feb 26, 2021

Describe the problem

I am evaluating crdb for our business needs. I was trying to migrate from an existing database into crdb. Import worked seamlessly, using a similar approach described below, for smaller tables but generated uuid clash when importing big tables.

To Reproduce

My existing table schema is:

create table T1
(
id bigint primary key not null default autoincrement,
...
<around 50 other fields>
);

This table has close to 10 million records. For performance reasons, crdb recommends using UUID or a combination of other fields as the primary key is used for partitioning. So I changed the schema to:

create table T1
(
id uuid primary key not null default gen_randon_uuid(),
oldid bigint unique not null,
...
);

I then used import statement to import the data from csv file (unloaded from my existing db) into cockroach. The import failed with duplicate primary index value. I tried splitting my csv file in 20 smaller chunks. I still got the collision.

To overcome this, I changed the schema to:

create table T1
(
id uuid null,
oldid bigint unique not null,
...
);

Now the import complete in 5 minutes. This is faster than I can load the data in my existing database. Bravo!
But now I had to fix id field so I ran:
UPDATE T1 SET id = gen_random_uuid();
This statement did not finish execution in 12 hours or so... I cancelled it.

My only workaround is to unload from existing database using a special select statement that generates the unique uuid for me. I have tried it and it works.

Expected behavior

I should be able to load the table and crdb generates unique primary keys for me. The workaround I am using is not feasible in all the scenarios that I need to support.

Additional context
What was the impact?

I have to import 200+ tables and ~20 tables are huge. I wonder how many custom unloads will I have to do.

Add any other context about the problem here.

@wma1729 wma1729 added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Feb 26, 2021
@blathers-crl
Copy link

blathers-crl bot commented Feb 26, 2021

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

  • @cockroachdb/bulk-io (found keywords: Import)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Feb 26, 2021
@pbardea pbardea self-assigned this Feb 26, 2021
@pbardea
Copy link
Contributor

pbardea commented Feb 26, 2021

Sorry that you're running into this issue -- thank you for filing this issue with reproduction steps!

This does look like a bug around IMPORT which evaluates default expressions differently from when they're executed via standard SQL commands. After a bit of investigation, it does look like a bug in IMPORT around the interaction of the evaluation of these default expressions and how we parallelize IMPORT's work. I've sent out a PR that hopefully resolves this issue, and this issue will be updated once the fix is merged.

@pbardea pbardea changed the title UUID clash while importing data importccl: UUID clash while importing data Feb 26, 2021
@craig craig bot closed this as completed in 0cea9dc Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants