Skip to content

Conversation

@codope
Copy link
Member

@codope codope commented Oct 14, 2023

Change Logs

Reverts 2d779fb and fixes #9858

Impact

Fixes partition pruning and improves performance for a table with multiple partition fields.
After the fix, only 1 partition is listed as in the screenshot below.
Screenshot 2023-10-14 at 10 33 29 AM

Risk level (write none, low medium or high below)

none

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

}
}

@Disabled("HUDI-6320")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will be useful when HUDI-6320 is fixed properly.

|| classOf[TimestampBasedAvroKeyGenerator].getName.equalsIgnoreCase(keyGeneratorClassName)
|| classOf[CustomKeyGenerator].getName.equalsIgnoreCase(keyGeneratorClassName)
|| classOf[CustomAvroKeyGenerator].getName.equalsIgnoreCase(keyGeneratorClassName)) {
|| classOf[TimestampBasedAvroKeyGenerator].getName.equalsIgnoreCase(keyGeneratorClassName)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also fix HUDI-6914

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks for taking care.

if (listingModeOverride != null) {
properties.setProperty(DataSourceReadOptions.FILE_INDEX_LISTING_MODE_OVERRIDE.key, listingModeOverride)
}
val partitionColumns = metaClient.getTableConfig.getPartitionFields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice,and in our condition only merge this code would work well #9862 then we would have more test for it

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@beyond1920
Copy link
Contributor

After apply this patch, some cases would mistaken choose broadcast hash join even for big table. In those cases, relation#sizeInBytes returns 0 leads to choose broadcast hash join. It seems the partition pruning is not done here, so FileIndex#cachedAllInputFileSlices is still empty map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:performance Performance optimizations priority:critical Production degraded; pipelines stalled release-0.14.1

Projects

Status: 🆕 New

Development

Successfully merging this pull request may close these issues.

[SUPPORT] sparksql query perfermance degrade in hudi 0.14-rc

6 participants