Skip to content

Conversation

@liancheng
Copy link
Contributor

When appending Parquet files to a directory containing existing Parquet data together with their summary files, new data may contain different but compatible schema from existing data. Ideally, summary files should be updated and contain a merged version of schema. However, ParquetOutputCommitter may fail to write summary files if new files and old files contain conflicting user-defined metadata. In this case, we should remove existing _metadata as well as _common_metadata.

@julienledem
Copy link
Member

+1

@julienledem
Copy link
Member

there's a typo in the description. Could you fix it? PARQUE_T_-359

@liancheng liancheng changed the title PARQUE-359: Removes existing _common_metadata when fails to write summary files PARQUET-359: Removes existing _common_metadata when fails to write summary files Nov 18, 2015
@liancheng
Copy link
Contributor Author

@julienledem Oops, thanks. Fixed.

@julienledem
Copy link
Member

thanks @liancheng. Could you rebase your branch as well?

@liancheng
Copy link
Contributor Author

@julienledem Actually I just realized that #277 has already addressed this issue. So I'm closing this one. Thanks for the review!

@liancheng liancheng closed this Nov 18, 2015
@liancheng liancheng deleted the parquet-359/cleanup-common-metadata branch November 18, 2015 07:53
@julienledem
Copy link
Member

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants