Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1081] fail on sequence primary key hudi model #438

Closed
cadl opened this issue Aug 25, 2022 · 2 comments
Closed

[CT-1081] fail on sequence primary key hudi model #438

cadl opened this issue Aug 25, 2022 · 2 comments
Labels
Stale type:bug Something isn't working

Comments

@cadl
Copy link

cadl commented Aug 25, 2022

Describe the bug

I want to combine two columns as hudi model unique key, the model below. But got into some troubles.

{{ config(
    materialized='incremental',
    file_format='hudi',
    incremental_strategy='merge',
    options={
        'type': 'cow',
        'precombineKey': 'version'
    },
    unique_key="**explain next**"
   )
}}

select 
  1 as a,
  1 as b,
  1 as version
union all
  1 as a,
  2 as b,
  1 as version
  • if I set unique_key ='a,b', I got invalid merge into sql
  • if I set unique_key = ['a', 'b'] and set options={"primaryKey": "a,b"}, error unique_key and options('primaryKey') should be the same column(s). occurred
  • If I set unique_key = ['a', 'b'] only, hudi options primaryKey is invalid(['a', 'b'])

Steps To Reproduce

Expected behavior

Run model success

Screenshots and log output

System information

The output of dbt --version:

Core:
  - installed: 1.2.0
  - latest:    1.2.0 - Up to date!

Plugins:
  - spark: 1.2.0 - Up to date!

The operating system you're using:

The output of python --version: 3.7

Additional context

Add any other context about the problem here.

@cadl cadl added type:bug Something isn't working triage:product labels Aug 25, 2022
@github-actions github-actions bot changed the title fail on sequence primary key hudi model [CT-1081] fail on sequence primary key hudi model Aug 25, 2022
@dbeatty10
Copy link
Contributor

Good catch @cadl!

Agreed that unique_id should be able to take a list like unique_key = ['a', 'b'].

This feature was added in dbt Core 1.1 and is described in the docs here. The original implementation of this feature for spark was covered in #282 and #291 and it sounds like the hudi file format wasn't fully handled.

Thank you for opening pull request #439 -- I marked it as "ready for review" and someone will take a look at it.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants