Skip to content

Conversation

@RussellSpitzer
Copy link
Member

When attempting to insert overwrite with an empty dataset
we would previously throw an error. This patch causes spark
to skip any no-op partition replacement operations.

@github-actions github-actions bot added the spark label Aug 11, 2021
@RussellSpitzer
Copy link
Member Author

Solves #2895

#2895 is Caused by attempting to build a dynamic replace operation which contains no files. This is currently not allowed. We can either change this to a NOOP in Spark or allow it in Iceberg. This PR changes the operation to a NOOP in Spark.

@RussellSpitzer RussellSpitzer requested a review from rdblue August 11, 2021 15:00
@RussellSpitzer
Copy link
Member Author

@binhnv Could please review as well

When attempting to insert overwrite with an empty dataset
we would previously throw an error. This patch causes spark
to skip any no-op partition replacement operations.
@Override
public void commit(WriterCommitMessage[] messages) {
Iterable<DataFile> files = files(messages);
if (Iterables.size(files) == 0) {
Copy link
Contributor

@rdblue rdblue Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using !files.hasNext instead? I'm not sure we want to assume that the iterable can be consumed multiple times. Plus there's no need to consume the entire iterable just to check whether it is empty.

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a minor comment, but this looks good to me.

@binhnv
Copy link

binhnv commented Aug 11, 2021

Thank you for fixing this. The change looks good to me and it is also consistent with Hive's behavior.

@RussellSpitzer
Copy link
Member Author

Thanks @rdblue + @binhnv for reviews! Will merge

@RussellSpitzer RussellSpitzer merged commit e4df91e into apache:master Aug 11, 2021
@RussellSpitzer RussellSpitzer deleted the AllowEmptyReplace branch August 11, 2021 19:33
hankfanchiu added a commit to hankfanchiu/iceberg that referenced this pull request Aug 27, 2021
chenjunjiedada pushed a commit to chenjunjiedada/incubator-iceberg that referenced this pull request Oct 20, 2021
Merge remote-tracking branch 'upstream/merge-master-20210816' into master
## 该MR主要解决什么?

merge upstream/master,引入最近的一些bugFix和优化

## 该MR的修改是什么?

核心关注PR:
> Predicate PushDown 支持,https://github.com/apache/iceberg/pull/2358, https://github.com/apache/iceberg/pull/2926, https://github.com/apache/iceberg/pull/2777/files
> Spark场景写入空dataset 报错问题,直接skip掉即可, apache#2960
> Flink UI补充uidPrefix到operator方便跟踪多个iceberg sink任务, apache#288
> Spark 修复nested Struct Pruning问题, apache#2877
> 可以使用Table Properties指定创建v2 format表,apache#2887
> 补充SortRewriteStrategy框架,逐步支持不同rewrite策略, apache#2609 (WIP:apache#2829)
> Spark 为catalog配置hadoop属性支持, apache#2792
> Spark 针对timestamps without timezone读写支持, apache#2757
> Spark MicroBatch支持配置属性skip delete snapshots, apache#2752
> Spark V2 RewriteDatafilesAction 支持
> Core: Add validation for row-level deletes with rewrites, apache#2865 > schema time travel 功能相关,补充schema-id, Core: add schema id to snapshot 
> Spark Extension支持identifier fields操作, apache#2560
> Parquet: Update to 1.12.0, apache#2441
> Hive: Vectorized ORC reads for Hive, apache#2613
> Spark: Add an action to remove all referenced files, apache#2415

## 该MR是如何测试的?

UT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants