Skip to content

Conversation

@InvisibleProgrammer
Copy link
Contributor

@InvisibleProgrammer InvisibleProgrammer commented Nov 18, 2025

Orc file format allows metadata field names with ignoring casing. For example, we have manual tests when query based compaction creates Orc files with lowercase fields (the root cause for this is under unvestigation).

As OrcInputFormat.isOriginal currently checks the field names with strict casing, FixAcidKeyIndex can fail if the Orc file footer contains its metadata fields (like currentTransaction) lowercasing.

What changes were proposed in this pull request?

OrcInputFormat.isOriginal now does its checking with ignoring casing.

Why are the changes needed?

To be able to fix wrong Orc ACID files even with lower cased metadata.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added new test case:
TestFixAcidKeyIndex#testValidKeyIndex_withAcidMetadataLowerCase

@sonarqubecloud
Copy link

@InvisibleProgrammer
Copy link
Contributor Author

Please wait an additional day merging in - our downstream test had a result that I want to double-check: There is a chance that collections4 only comes as a transitive dependency, and our downstream cluster didn't find the collections4 jar file. I want to check if I need to add this as an explicit dependency to hive-exec.

@InvisibleProgrammer
Copy link
Contributor Author

Double tested. We don't need to add extra dependencies.

export HADOOP_OPTS="-Dfs.defaultFS=file:///"
$HIVE_HOME/bin/hive --service fixacidkeyindex --recover /mypath/escalations/acidwrongorder/delta_0000001_0000003_v0000005/bucket_00000

@kasakrisz kasakrisz merged commit e9d3cd7 into apache:master Nov 25, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants