Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: MAJOR Optimizing is running repeatedly #1924

Closed
2 tasks done
Tracked by #1930
celltobig opened this issue Sep 6, 2023 · 1 comment · Fixed by #1976
Closed
2 tasks done
Tracked by #1930

[Bug]: MAJOR Optimizing is running repeatedly #1924

celltobig opened this issue Sep 6, 2023 · 1 comment · Fixed by #1976
Labels
priority:major type:bug Something isn't working

Comments

@celltobig
Copy link
Contributor

celltobig commented Sep 6, 2023

What happened?

In the absence of the need for optimizing, a Iceberg format table are still undergoing major optimizing repeatedly.

The table have enable full optimizing with configuration:

'self-optimizing.full.trigger.interval'='86400000'

image

Affects Versions

master

What engines are you seeing the problem on?

AMS, Optimizer

How to reproduce

  • Create a Iceberg v2 partition table
  • Set set table property 'self-optimizing.full.trigger.interval'='86400000'
  • Insert overwrite some data into one partition.

CREATE TABLE spark_catalog.dl_ods.ods_iceberg_t1 (
channel_id INT,
label STRING,
price_sign BIGINT,
item_id BIGINT,
item_type INT,
is_maintain INT,
cur_name STRING,
adjust_code STRING,
platform_id BIGINT,
business_id BIGINT NOT NULL,
price DECIMAL(20,4),
price_type_name STRING,
price_type_id BIGINT,
price_type_code STRING,
sku_id BIGINT,
goods_no STRING,
spu_id BIGINT,
store_id BIGINT,
gmt_update TIMESTAMP,
gmt_create TIMESTAMP,
id BIGINT NOT NULL,
store_status STRING,
store_no STRING,
com_id STRING,
store_name STRING)
USING iceberg
PARTITIONED BY (business_id)
LOCATION 'hdfs://xxxxx/user/hive/warehouse/datalake/dl_ods/ods_iceberg_t1'
TBLPROPERTIES(
'clean-independent-delete-files.enabled' = 'true',
'clean-orphan-file.enabled' = 'true',
'clean-orphan-file.min-existing-time-minutes' = '1440',
'current-snapshot-id' = '5211867258629833319',
'engine.hive.enabled' = 'true',
'flink.max-continuous-empty-commits' = '2147483647',
'format' = 'iceberg/parquet',
'format-version' = '2',
'identifier-fields' = '[id,business_id]',
'self-optimizing.enabled' = 'true',
'self-optimizing.full.trigger.interval' = '-1',
'self-optimizing.group' = 'external-group',
'self-optimizing.quota' = '0.1',
'snapshot.base.keep.minutes' = '60',
'table-expire.enabled' = 'true',
'write.distribution-mode' = 'hash',
'write.metadata.delete-after-commit.enabled' = 'true',
'write.metadata.previous-versions-max' = '1',
'write.upsert.enabled' = 'true')
;

Relevant log output

No response

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

  • I agree to follow this project's Code of Conduct
@celltobig celltobig added the type:bug Something isn't working label Sep 6, 2023
@wangtaohz
Copy link
Contributor

Thanks for your report! I will add this issue to the roadmap for version 0.5.1 and look forward to your PR.👍

@wangtaohz wangtaohz mentioned this issue Sep 8, 2023
56 tasks
@wangtaohz wangtaohz changed the title [Bug]: MAJOR Optimizing is running [Bug]: MAJOR Optimizing is running repeatedly Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:major type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants