Skip to content

Conversation

@sshkvar
Copy link
Contributor

@sshkvar sshkvar commented Jun 29, 2021

Creating PR from my forked repo as discussed #2282

public void testWriteTimestampWithoutZoneError() {
String errorMessage = String.format("Write operation performed on a timestamp without timezone field while " +
"'%s' set to false should throw exception", SparkUtil.HANDLE_TIMESTAMP_WITHOUT_TIMEZONE);
Runnable insert = () -> sql("INSERT INTO %s VALUES %s", tableName, rowToSqlValues(values));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just put this right into the AssertThrows, we usually do this.

AssertThrows(errorMessage, IllegalArguementException.class, SparkUtil.TIMESTAMP_WITHOUT_TIMEZONE_ERROR,
  () -> code);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using the rowToSqlValues function, it may be simpler to just do a spark.createDataset(values).write command. That way we don't have to worry about string building

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved insert Runnable and errorMessage to the AssertThrows

I would like to leave rowToSqlValues if you do not mind. From my point of view this is better to have pure sql in this test

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a few little clean up nits, I think once these are done we should be good to go!

@sshkvar
Copy link
Contributor Author

sshkvar commented Jul 12, 2021

@RussellSpitzer I have pushed code changes for requested clean up nits

SparkUtil.HANDLE_TIMESTAMP_WITHOUT_TIMEZONE),
IllegalArgumentException.class,
SparkUtil.TIMESTAMP_WITHOUT_TIMEZONE_ERROR,
() -> spark.read().format("iceberg")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong indentation here, .option should be indented as it's a continuation of spark.read.format and not another arg to assertThrows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, fixed

@RussellSpitzer
Copy link
Member

Tests running now!

@sshkvar
Copy link
Contributor Author

sshkvar commented Jul 12, 2021

Tests running now!

Added fix for Extra separation in import group before 'java.io.File' [ImportOrder]

@sshkvar
Copy link
Contributor Author

sshkvar commented Jul 12, 2021

added additional formatting fix for 'lambda arguments' has incorrect indentation level 12, expected level should be one of the following: 6, 8.

@RussellSpitzer
Copy link
Member

@sshkvar for future reference you can use g build -x test -x integrationTest -x testSpark31 to run the checkstyle and other static analysis tests

Comment on lines 194 to 197
Option<SparkSession> sparkSession = SparkSession.getActiveSession();
if (sparkSession.isDefined()) {
sparkSession.get().conf().set(SparkUtil.HANDLE_TIMESTAMP_WITHOUT_TIMEZONE, "true");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in AvroDataTest, but there is some code for handling additional Spark SQL conf options here that might be something worth copying:

protected void withSQLConf(Map<String, String> conf, Action action) {
SQLConf sqlConf = SQLConf.get();
Map<String, String> currentConfValues = Maps.newHashMap();
conf.keySet().forEach(confKey -> {
if (sqlConf.contains(confKey)) {
String currentConfValue = sqlConf.getConfString(confKey);
currentConfValues.put(confKey, currentConfValue);
}
});
conf.forEach((confKey, confValue) -> {
if (SQLConf.staticConfKeys().contains(confKey)) {
throw new RuntimeException("Cannot modify the value of a static config: " + confKey);
}
sqlConf.setConfString(confKey, confValue);
});
try {
action.invoke();
} finally {
conf.forEach((confKey, confValue) -> {
if (currentConfValues.containsKey(confKey)) {
sqlConf.setConfString(confKey, currentConfValues.get(confKey));
} else {
sqlConf.unsetConf(confKey);
}
});
}
}
@FunctionalInterface
protected interface Action {
void invoke();
}
}

Additionally, it might be good to unset this property after its use (so the configuration doesn't get mixed up with other tests now or in the future).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I changed the code based on your recommendations

@sshkvar
Copy link
Contributor Author

sshkvar commented Jul 13, 2021

@sshkvar for future reference you can use g build -x test -x integrationTest -x testSpark31 to run the checkstyle and other static analysis tests

ok, thanks

@sshkvar
Copy link
Contributor Author

sshkvar commented Jul 14, 2021

Check failed

org.apache.iceberg.actions.TestDeleteReachableFilesAction24 > dataFilesCleanupWithParallelTasks FAILED
    java.lang.AssertionError: FILE_A should be deleted

Locally this test passed successfully. @RussellSpitzer Can we try to rerun this step?

@RussellSpitzer
Copy link
Member

I can trigger a re-run, just pinged @karuppayya to take a look


static Schema fixup(Schema schema) {
return new Schema(TypeUtil.visit(schema,
new SparkFixupTimestampType(schema)).asStructType().fields());
Copy link
Contributor

@kbendick kbendick Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this is overly indented.

When continuing an expression on a second line, we usually indent 4 spaces (as opposed to the normal 2 spaces for changes in scope etc).

So I think that new SparkFixup… on this line should align the n of new with the second r from return above it (4 spaces in from the start of the word return).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it asap, where can I find code style guide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use IntelliJ, you can have it set up to point to the style guide and auto format it for you: http://iceberg.apache.org/community/#setting-up-ide-and-code-style

Otherwise, in general I believe the rules come from here: https://github.com/apache/iceberg/blob/master/.baseline/idea/intellij-java-palantir-style.xml

There are admittedly a number of these cases, so auto formatting seems like the best idea. If you don’t use IntelliJ, I think this command can be run from command line somehow but I’m not 100% sure.

Thanks for all the work on this so far. It’s very close!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the plug-in that is used in general for your info. Specifically, this one would be the checkstyle one, but we also use error prone as well: https://github.com/palantir/gradle-baseline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, I am using IntelliJ so it is really helpful

private StructType lazyType() {
if (type == null) {
Preconditions.checkArgument(readTimestampWithoutZone || !SparkUtil.hasTimestampWithoutZone(lazySchema()),
SparkUtil.TIMESTAMP_WITHOUT_TIMEZONE_ERROR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add this check into SparkSchemaUtil.convert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think no, because we need this check only in few places, while SparkSchemaUtil.convert used in a lot of places

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, this is something just to consider and not a strong mandate or requirement to merge.

@FunctionalInterface
protected interface Action {
void invoke() throws IOException;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up PR (in a separate PR either before or after this one is merged), particularly if you're looking for some more work to do to contribute to the project, you might explore if this combination of withSQLConf and the corresponding @FunctionalInterface can be abstracted into their own interface that one could mix into tests.

I'm not 100% sure how that would look, maybe an interface like ConfigurableTestSQLConf or something?

Again, just copying it for now is fine, but it would be nice to reduce the code duplication and make this easier for others to use in the future. Your exploration might find that it’s better to not do that (I’m more of a Scala developer myself and so to me it feels like a mixin). Something to think about for later though!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fully agree with you, it can be moved to separate interface with static method and placed in some general package like

@FunctionalInterface
public interface ConfigurableTestSQLConf {

  void invoke() throws IOException;

  static void withSQLConf(Map<String, String> conf, ConfigurableTestSQLConf action) throws IOException {
    SQLConf sqlConf = SQLConf.get();

    Map<String, String> currentConfValues = Maps.newHashMap();
    conf.keySet().forEach(confKey -> {
      if (sqlConf.contains(confKey)) {
        String currentConfValue = sqlConf.getConfString(confKey);
        currentConfValues.put(confKey, currentConfValue);
      }
    });

    conf.forEach((confKey, confValue) -> {
      if (SQLConf.staticConfKeys().contains(confKey)) {
        throw new RuntimeException("Cannot modify the value of a static config: " + confKey);
      }
      sqlConf.setConfString(confKey, confValue);
    });

    try {
      action.invoke();
    } finally {
      conf.forEach((confKey, confValue) -> {
        if (currentConfValues.containsKey(confKey)) {
          sqlConf.setConfString(confKey, currentConfValues.get(confKey));
        } else {
          sqlConf.unsetConf(confKey);
        }
      });
    }
  }
}

But this part better to do in separate PR, because other packages will be affected

@RussellSpitzer RussellSpitzer merged commit 9a0d154 into apache:master Jul 15, 2021
Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thank you @sshkvar!

@RussellSpitzer
Copy link
Member

Thanks @sshkvar + @bkahloon ! This is a great contribution

Solves #2244

minchowang pushed a commit to minchowang/iceberg that referenced this pull request Aug 2, 2021
…2757)

Previously Spark could not handle Iceberg tables which contained Timestamp.withoutTimeZone. New parameters are introduced to allow Timestamp without TimeZone to be treated as Timestamp with Timezone.  

Co-authored-by: bkahloon <kahlonbakht@gmail.com>
Co-authored-by: shardulm94
chenjunjiedada pushed a commit to chenjunjiedada/incubator-iceberg that referenced this pull request Oct 20, 2021
Merge remote-tracking branch 'upstream/merge-master-20210816' into master
## 该MR主要解决什么?

merge upstream/master,引入最近的一些bugFix和优化

## 该MR的修改是什么?

核心关注PR:
> Predicate PushDown 支持,https://github.com/apache/iceberg/pull/2358, https://github.com/apache/iceberg/pull/2926, https://github.com/apache/iceberg/pull/2777/files
> Spark场景写入空dataset 报错问题,直接skip掉即可, apache#2960
> Flink UI补充uidPrefix到operator方便跟踪多个iceberg sink任务, apache#288
> Spark 修复nested Struct Pruning问题, apache#2877
> 可以使用Table Properties指定创建v2 format表,apache#2887
> 补充SortRewriteStrategy框架,逐步支持不同rewrite策略, apache#2609 (WIP:apache#2829)
> Spark 为catalog配置hadoop属性支持, apache#2792
> Spark 针对timestamps without timezone读写支持, apache#2757
> Spark MicroBatch支持配置属性skip delete snapshots, apache#2752
> Spark V2 RewriteDatafilesAction 支持
> Core: Add validation for row-level deletes with rewrites, apache#2865 > schema time travel 功能相关,补充schema-id, Core: add schema id to snapshot 
> Spark Extension支持identifier fields操作, apache#2560
> Parquet: Update to 1.12.0, apache#2441
> Hive: Vectorized ORC reads for Hive, apache#2613
> Spark: Add an action to remove all referenced files, apache#2415

## 该MR是如何测试的?

UT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants