Allow custom hadoop properties to be loaded in the Spark data source #7

mccheah · 2018-11-26T19:33:20Z

Supporting these custom Hadoop properties should also be done in other Iceberg integrations in subsequent patches.

Closes Netflix/iceberg#91.

Supporting these custom Hadoop properties should also be done in other Iceberg integrations in subsequent patches.

spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

…o so.

spark/src/test/java/com/netflix/iceberg/spark/source/TestIcebergSource.java

spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

…hadoop-properties

…acing.

…ies' into custom-hadoop-properties

mccheah · 2018-12-11T00:19:29Z

Addressed comments and is ready for another round of review.

spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

…tions

mccheah · 2018-12-11T01:53:40Z

Addressed all comments so far.

rdblue · 2018-12-11T17:32:48Z

spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

    }

-    return Optional.of(new Writer(table, lazyConf(), format));
+    return Optional.of(new Writer(table, conf, format));


After #47, we may not need to pass in conf. (Not something we need to change here.)

rdblue · 2018-12-11T17:35:33Z

spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

  }

-  protected SparkSession lazySparkSession() {
+  private SparkSession lazySparkSession() {


This is nice to have in subclasses, which is why it is protected. We use it in findTable to get information about the catalog to use. Not a big deal if it becomes private, since we can make a quick change in our add-on library and keep track of it there.

rdblue · 2018-12-11T17:36:48Z

spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

+    Table table = findTable(options, conf);
+    // Set confs from table properties, but do not overwrite options from the Spark Context with
+    // configurations from the table
+    mergeIcebergHadoopConfs(conf, table.properties(), false);


I think this still needs to be true, in which case we can remove the option. Table properties still need to override those set in the Hadoop Configuration. Then we re-apply the ones from options to fix up precedence.

Hm, I would think that properties set in the JVM, particularly if set on the Spark Context via spark.hadoop.*, should take precedence over the table properties.

Values set in the Configuration are session specific and what we want is to move to table settings instead of Spark settings for configuration like Parquet row group size that are tied to the data. Write-specific settings from the write config can override.

Table settings should take priority over session-wide settings because session-wide config would apply for all tables, and that's not usually appropriate like the row group size example.

That's fair enough, I suppose as long as the behavior is well documented it should be clear to the user on how to get the final configurations they want.

rdblue · 2018-12-11T17:37:55Z

spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

+      Configuration baseConf, Map<String, String> options, boolean overwrite) {
+    options.keySet().stream()
+        .filter(key -> key.startsWith("iceberg.hadoop"))
+        .filter(key -> overwrite || baseConf.get(key.replaceFirst("iceberg.hadoop", "")) == null)


Doesn't overwrite discard all keys? I don't think it matters now because it isn't needed anymore.

…hadoop-properties

) Properties that start with iceberg.hadoop are copied into the Hadoop Configuration used in the Spark source. These may be set in table properties or in read and write options passed to the Spark operation. Read and write options take precedence over the table properties. Supporting these custom Hadoop properties should also be done in other Iceberg integrations in subsequent patches.

aokolnychyi · 2019-06-11T08:35:25Z

@mccheah @rdblue @rdsr don't we need to discard iceberg.hadoop. and not iceberg.hadoop?

rdblue · 2019-06-11T17:06:55Z

@aokolnychyi, yes. Want to open a PR?

aokolnychyi · 2019-06-13T06:30:44Z

Yeah, will do so.

# This is the 1st commit message: Issue-629: Cherrypick Id # This is the commit message #2: Removed redundant methods and changed method name # This is the commit message #3: Fix Imports # This is the commit message #4: Fix Operation Check # This is the commit message apache#5: Fix Error Message # This is the commit message apache#6: Cherry picking operation to apply changes from incoming snapshot on current snapshot # This is the commit message apache#7: Initial working version of cherry-pick operation which applies appends only

(cherry picked from commit e2af91c) Co-authored-by: Rishi <sririshindra@gmail.com>

….openapitools-openapi-generator-gradle-plugin-7.13.0 Build: Bump org.openapitools:openapi-generator-gradle-plugin from 7.12.0 to 7.13.0

mccheah added 2 commits November 19, 2018 13:32

Allow custom hadoop properties to be loaded in the Spark data source.

78f8847

Supporting these custom Hadoop properties should also be done in other Iceberg integrations in subsequent patches.

Merge remote-tracking branch 'mccheah-incubator/master' into HEAD

c85ce3a

rdblue reviewed Nov 26, 2018

View reviewed changes

spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java Show resolved Hide resolved

rdblue reviewed Nov 26, 2018

View reviewed changes

spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java Outdated Show resolved Hide resolved

rdblue reviewed Nov 26, 2018

View reviewed changes

spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java Outdated Show resolved Hide resolved

Reuse Hadoop configuration in second case where it's thread-safe to d…

66f2160

…o so.

rdblue reviewed Dec 5, 2018

View reviewed changes

spark/src/test/java/com/netflix/iceberg/spark/source/TestIcebergSource.java Outdated Show resolved Hide resolved

rdsr reviewed Dec 6, 2018

View reviewed changes

spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java Outdated Show resolved Hide resolved

rdblue mentioned this pull request Dec 7, 2018

Allow custom hadoop properties to be loaded in the Spark data source. Netflix/iceberg#111

Closed

mccheah added 3 commits December 10, 2018 15:58

Merge remote-tracking branch 'upstream-incubator/master' into custom-…

46732c8

…hadoop-properties

Set order of precedence. Don't create too many configurations. Fix sp…

9e53508

…acing.

Merge remote-tracking branch 'mccheah-incubator/custom-hadoop-propert…

498e58d

…ies' into custom-hadoop-properties

rdblue reviewed Dec 11, 2018

View reviewed changes

spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java Outdated Show resolved Hide resolved

Properly resolve hadoop configurations by doing a second pass over op…

6f07979

…tions

rdblue reviewed Dec 11, 2018

View reviewed changes

mccheah added 2 commits December 11, 2018 12:52

Merge remote-tracking branch 'upstream-incubator/master' into custom-…

8019608

…hadoop-properties

Remove overwrite flag

8b238ea

rdblue merged commit ec31b27 into apache:master Dec 14, 2018

puchengy added a commit to puchengy/iceberg that referenced this pull request May 11, 2022

Core: Fix history timestamp for rollbacks (apache#4135) (apache#7)

a174759

(cherry picked from commit e2af91c) Co-authored-by: Rishi <sririshindra@gmail.com>

pavibhai added a commit to pavibhai/iceberg that referenced this pull request Oct 31, 2022

Fixed apache#4 - Obtain stripe offsets from writer (apache#7)

4cdd592

Allow custom hadoop properties to be loaded in the Spark data source #7

Allow custom hadoop properties to be loaded in the Spark data source #7

Uh oh!

Conversation

mccheah commented Nov 26, 2018

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mccheah commented Dec 11, 2018

Uh oh!

Uh oh!

mccheah commented Dec 11, 2018

Uh oh!

rdblue Dec 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue Dec 11, 2018

Choose a reason for hiding this comment

Uh oh!

rdblue Dec 11, 2018

Choose a reason for hiding this comment

Uh oh!

mccheah Dec 11, 2018

Choose a reason for hiding this comment

Uh oh!

rdblue Dec 11, 2018

Choose a reason for hiding this comment

Uh oh!

mccheah Dec 11, 2018

Choose a reason for hiding this comment

Uh oh!

rdblue Dec 11, 2018

Choose a reason for hiding this comment

Uh oh!

aokolnychyi commented Jun 11, 2019

Uh oh!

rdblue commented Jun 11, 2019

Uh oh!

aokolnychyi commented Jun 13, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rdblue Dec 11, 2018 •

edited

Loading