Skip to content

Conversation

@mccheah
Copy link
Contributor

@mccheah mccheah commented Nov 26, 2018

Supporting these custom Hadoop properties should also be done in other Iceberg integrations in subsequent patches.

Closes Netflix/iceberg#91.

Supporting these custom Hadoop properties should also be done in other Iceberg integrations in subsequent patches.
@mccheah
Copy link
Contributor Author

mccheah commented Dec 11, 2018

Addressed comments and is ready for another round of review.

@mccheah
Copy link
Contributor Author

mccheah commented Dec 11, 2018

Addressed all comments so far.

}

return Optional.of(new Writer(table, lazyConf(), format));
return Optional.of(new Writer(table, conf, format));
Copy link
Contributor

@rdblue rdblue Dec 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #47, we may not need to pass in conf. (Not something we need to change here.)

}

protected SparkSession lazySparkSession() {
private SparkSession lazySparkSession() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice to have in subclasses, which is why it is protected. We use it in findTable to get information about the catalog to use. Not a big deal if it becomes private, since we can make a quick change in our add-on library and keep track of it there.

Table table = findTable(options, conf);
// Set confs from table properties, but do not overwrite options from the Spark Context with
// configurations from the table
mergeIcebergHadoopConfs(conf, table.properties(), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still needs to be true, in which case we can remove the option. Table properties still need to override those set in the Hadoop Configuration. Then we re-apply the ones from options to fix up precedence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I would think that properties set in the JVM, particularly if set on the Spark Context via spark.hadoop.*, should take precedence over the table properties.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Values set in the Configuration are session specific and what we want is to move to table settings instead of Spark settings for configuration like Parquet row group size that are tied to the data. Write-specific settings from the write config can override.

Table settings should take priority over session-wide settings because session-wide config would apply for all tables, and that's not usually appropriate like the row group size example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair enough, I suppose as long as the behavior is well documented it should be clear to the user on how to get the final configurations they want.

Configuration baseConf, Map<String, String> options, boolean overwrite) {
options.keySet().stream()
.filter(key -> key.startsWith("iceberg.hadoop"))
.filter(key -> overwrite || baseConf.get(key.replaceFirst("iceberg.hadoop", "")) == null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't overwrite discard all keys? I don't think it matters now because it isn't needed anymore.

@rdblue rdblue merged commit ec31b27 into apache:master Dec 14, 2018
rdblue referenced this pull request in rdblue/iceberg Dec 21, 2018
)

Properties that start with iceberg.hadoop are copied into the Hadoop Configuration used in the Spark source. These may be set in table properties or in read and write options passed to the Spark operation. Read and write options take precedence over the table properties.

Supporting these custom Hadoop properties should also be done in other Iceberg integrations in subsequent patches.
@aokolnychyi
Copy link
Contributor

@mccheah @rdblue @rdsr don't we need to discard iceberg.hadoop. and not iceberg.hadoop?

@rdblue
Copy link
Contributor

rdblue commented Jun 11, 2019

@aokolnychyi, yes. Want to open a PR?

@aokolnychyi
Copy link
Contributor

Yeah, will do so.

prodeezy pushed a commit to rominparekh/incubator-iceberg that referenced this pull request Dec 17, 2019
# This is the 1st commit message:

Issue-629: Cherrypick Id

# This is the commit message #2:

Removed redundant methods and changed method name

# This is the commit message #3:

Fix Imports

# This is the commit message #4:

Fix Operation Check

# This is the commit message apache#5:

Fix Error Message

# This is the commit message apache#6:

Cherry picking operation to apply changes from incoming snapshot on current snapshot

# This is the commit message apache#7:

Initial working version of cherry-pick operation which applies appends only
puchengy added a commit to puchengy/iceberg that referenced this pull request May 11, 2022
(cherry picked from commit e2af91c)

Co-authored-by: Rishi <sririshindra@gmail.com>
pavibhai added a commit to pavibhai/iceberg that referenced this pull request Oct 31, 2022
fabio-rizzo-01 added a commit to fabio-rizzo-01/iceberg that referenced this pull request May 6, 2025
….openapitools-openapi-generator-gradle-plugin-7.13.0

Build: Bump org.openapitools:openapi-generator-gradle-plugin from 7.12.0 to 7.13.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Custom Hadoop Properties In The Data Source

4 participants