Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge query issues for tables datasets, dataset_versions, and job_versions #2673

Open
ddave09 opened this issue Oct 30, 2023 · 2 comments
Open
Labels
bug Something isn't working db
Milestone

Comments

@ddave09
Copy link

ddave09 commented Oct 30, 2023

Problem

Table: dataset_versions

https://github.com/MarquezProject/marquez/blob/0.42.0/api/src/main/java/marquez/db/DatasetVersionDao.java#L295

ERROR: duplicate key value violates unique constraint "dataset_versions_dataset_uuid_version_key"
DETAIL: Key (dataset_uuid, version)=(<uuid>, <uuid>) already exists.
STATEMENT: /* DatasetVersionDao.upsert */
INSERT INTO dataset_versions (
uuid, 
created_at, 
dataset_uuid, 
version, 
run_uuid, 
fields, 
namespace_name, 
dataset_name, 
lifecycle_state
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) ON **CONFLICT(version)** DO UPDATE SET run_uuid = EXCLUDED.run_uuid RETURNING *

Broken since version 0.12.0 (It was introduced then and ON CONFLICT only checks for (version) where it should be checking for version and uuid)
Based on the schema which exists since V1__intial_schema.sql

CREATE TABLE dataset_versions (
  uuid         UUID PRIMARY KEY,
  created_at   TIMESTAMP NOT NULL,
  dataset_uuid UUID REFERENCES datasets(uuid),
  version      UUID NOT NULL,
  run_uuid     UUID,
  **UNIQUE (dataset_uuid, version)**
);

Table: datasets

https://github.com/MarquezProject/marquez/blob/0.42.0/api/src/main/java/marquez/db/DatasetDao.java#L191

ERROR: duplicate key value violates unique constraint "datasets_namespace_uuid_name_key"
DETAIL: Key (namespace_uuid, name)=(<uuid>, <name>) already exists.
STATEMENT: /* DatasetDao.upsert */ INSERT INTO datasets (
uuid,
type,
created_at,
updated_at,
namespace_uuid,
namespace_name,
source_uuid,
source_name,
name,
physical_name,
description,
is_deleted,
is_hidden
) VALUES (
$1,
$2,
$3,
$4,
$5,
$6,
$7,
$8,
$9,
$10,
$11,
$12,
false
)
**ON CONFLICT (uuid)**
DO UPDATE SET
type = EXCLUDED.type,
updated_at = EXCLUDED.updated_at,
physical_name = EXCLUDED.physical_name,
description = EXCLUDED.description,
is_deleted = EXCLUDED.is_deleted,
is_hidden = EXCLUDED.is_hidden
RETURNING *

Broken since version 0.27.0
PR: #2087
PR DIFF: https://github.com/MarquezProject/marquez/pull/2087/files#diff-c687ca258e023847591c9b8c044adddaae1e7c1f6f9483928bf2871050cf6a63R232

Current constraint marquez.public.datasets.datasets_namespace_uuid_name_key (namespace_uuid, name), but INSERT statement ON CONFLICT only checks for (uuid)

Table: job_versions

https://github.com/MarquezProject/marquez/blob/0.42.0/api/src/main/java/marquez/db/JobVersionDao.java#L169

ERROR: duplicate key value violates unique constraint "job_versions_job_uuid_version_key"
DETAIL: Key (job_uuid, version)=(<uuid>, <uuid>) already exists.
STATEMENT: /* JobVersionDao.upsertJobVersion */ INSERT INTO job_versions (
uuid,
created_at,
updated_at,
job_uuid,
location,
version,
job_name,
namespace_uuid,
namespace_name
) VALUES (
$1,
$2,
$3,
$4,
$5,
$6,
$7,
$8,
$9)
**ON CONFLICT(version)**
DO
UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *

Broken since version 0.13.0 (It was introduced then and ON CONFLICT only checks for (version) where it should be checking for version and uuid)
Based on the schema which exists since V1__intial_schema.sql

CREATE TABLE job_versions (
  uuid            UUID PRIMARY KEY,
  created_at      TIMESTAMP NOT NULL,
  updated_at      TIMESTAMP NOT NULL,
  job_uuid        UUID REFERENCES jobs(uuid),
  version         UUID NOT NULL,
  location        VARCHAR(255) NOT NULL,
  latest_run_uuid UUID,
  **UNIQUE (job_uuid, version)**
);

Proposal

For the three tables discussed above, either code needs to be adjusted to account for missing columns when looking for conflicts or migration needs to be applied to modify the unique constraint to adjust to the code.

@boring-cyborg
Copy link

boring-cyborg bot commented Oct 30, 2023

Thanks for opening your first issue in the Marquez project! Please be sure to follow the issue template!

@ddave09
Copy link
Author

ddave09 commented Oct 30, 2023

Please direct me to the template, I couldn't find one. I looked that REAME.md or Contributing.md. At the moment I have followed a template used by one of the issues.

@wslulciuc wslulciuc added bug Something isn't working db labels Feb 1, 2024
@wslulciuc wslulciuc added this to the Roadmap milestone Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working db
Projects
None yet
Development

No branches or pull requests

2 participants