Skip to content

Conversation

@the-other-tim-brown
Copy link
Contributor

@the-other-tim-brown the-other-tim-brown commented Dec 24, 2025

Describe the issue this Pull Request addresses

The HFile readers are looking up the file length when the code supplies a path to read. We often know the length for the reader upfront so we should avoid looking this up again.

Summary and Changelog

  • Constructs StoragePathInfo objects to pass in the relevant info instead of looking it up from the file system

Impact

Improves performance when dealing with remote filesystems like S3/GCS

Risk Level

Low, functionality is covered by existing tests

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Dec 24, 2025
@the-other-tim-brown the-other-tim-brown marked this pull request as ready for review December 24, 2025 19:06
@the-other-tim-brown the-other-tim-brown changed the title perf: Avoid fetching file status from FS for HFile readers perf: Avoid re-fetching file status from FS for HFile readers Dec 24, 2025
byte[] buffer;
try (SeekableDataInputStream stream = storage.openSeekable(path, false)) {
buffer = new byte[(int) storage.getPathInfo(path).getLength()];
buffer = new byte[(int) fileSize];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an inline path (inlinefs:), this represents the log block size instead of the full log file size. Will that be a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a path that executes this that I can run locally?

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants