-
Notifications
You must be signed in to change notification settings - Fork 2.5k
feat: MDT Test framework without writing data files #17693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
PavithranRick
wants to merge
55
commits into
apache:master
Choose a base branch
from
PavithranRick:ENG-35484
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,992
−117
Draft
Changes from all commits
Commits
Show all changes
55 commits
Select commit
Hold shift + click to select a range
5dbb271
MDT Test framework without writing data files
677aa96
MDT Test framework - using filterFileSlics for colstats
7294abc
MDT Test framework - using createCommitMetadata, tagLocation and file…
c688bf1
MDT Test framework - using createCommitMetadata, tagLocation and file…
5ed79d6
MDT Test framework - initializing files partition before commit
68a9fab
MDT Test framework - bug fixes with files partition
a8f01d3
MDT Test framework - Added writing colStats to same upsertPreppedReco…
328065f
MDT Test framework - Added read path changes using filterFileSlices
3fc0d7c
MDT Test framework - create empty parquet data files from commit meta…
718056e
MDT Test framework - disable partition stats and reconcile markers
a49beee
Add spark context to the HoodieMDTStats class
vamsikarnika 8ebdb0a
Modify colsToIndex config to take column names
vamsikarnika 4c2a7e6
Fix Partition field config
vamsikarnika ef17392
Add md file on how to use the tool
vamsikarnika 24f41a6
Fix usage file
vamsikarnika 6807f9c
Add config for enabling partition stats
vamsikarnika d68824f
Creating files using engine context
vamsikarnika fe41155
parallelize the empty parquet file creation based on no of partitions
vamsikarnika 5e05e80
Writes files through multiple commits
vamsikarnika 0bd440f
Revert "fix: Partition stats should be controlled using column stats …
7f7f8e6
MDT Test framework without writing data files
0bef6ae
MDT Test framework - using filterFileSlics for colstats
cf39439
MDT Test framework - using createCommitMetadata, tagLocation and file…
3569b72
MDT Test framework - using createCommitMetadata, tagLocation and file…
077667e
MDT Test framework - initializing files partition before commit
f0c619f
MDT Test framework - bug fixes with files partition
a87ddce
MDT Test framework - Added writing colStats to same upsertPreppedReco…
3c37ff2
MDT Test framework - Added read path changes using filterFileSlices
0a8ace9
MDT Test framework - create empty parquet data files from commit meta…
145ea3b
MDT Test framework - disable partition stats and reconcile markers
ed53d79
Add spark context to the HoodieMDTStats class
vamsikarnika f8556a2
Modify colsToIndex config to take column names
vamsikarnika 3909f3a
Fix Partition field config
vamsikarnika d6cdd75
Add md file on how to use the tool
vamsikarnika e4ea970
Fix usage file
vamsikarnika 698e51c
Add config for enabling partition stats
vamsikarnika 9d705bf
Creating files using engine context
vamsikarnika ac0ee85
parallelize the empty parquet file creation based on no of partitions
vamsikarnika e711058
Writes files through multiple commits
vamsikarnika c1ef9d2
MDT Test framework - rename files and disabling partition stats
6b2a071
MDT Test framework - rename files and disabling partition stats
8abc18d
Bulk Insert Files & Column stats
vamsikarnika 6295a96
MDT Test framework - removing disabling partition stats code
3f59e97
MDT Test framework - Setting partition stats as false in table config
71c8b36
Merge remote-tracking branch 'vamsi/mdt_stats_tool' into ENG-35484
46270b9
MDT Test framework - delete
01a919e
MDT Test framework - rename
7702c4a
MDT Test framework - rebased vamsi changes and bug fixes
5bd0a84
Fixing write side benchmarking
nsivabalan 45c6d48
MDT Test framework - Refactored input params and added tenantId columns
19074bc
Add Partition filters to the query
vamsikarnika 83a4344
refactor code
vamsikarnika 7d83cdc
Add benchmarking mode config
vamsikarnika 48845ed
fix default partition filter config value
vamsikarnika 65b2771
Add configs for data filters
vamsikarnika File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
53 changes: 53 additions & 0 deletions
53
...nt/hudi-client-common/src/test/java/org/apache/hudi/metadata/MetadataWriterTestUtils.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.hudi.metadata; | ||
|
|
||
| import org.apache.hudi.common.data.HoodieData; | ||
| import org.apache.hudi.common.model.HoodieFileGroupId; | ||
| import org.apache.hudi.common.model.HoodieRecord; | ||
| import org.apache.hudi.common.util.collection.Pair; | ||
|
|
||
| import java.util.List; | ||
| import java.util.Map; | ||
|
|
||
| /** | ||
| * Test utility class to access protected methods from HoodieBackedTableMetadataWriter. | ||
| * This class is in the same package to access protected methods without duplication. | ||
| */ | ||
| public class MetadataWriterTestUtils { | ||
|
|
||
| /** | ||
| * Tag records with location using the metadata writer's tagRecordsWithLocation method. | ||
| * This is a wrapper around the protected method to make it accessible from tests. | ||
| * | ||
| * @param metadataWriter The metadata writer instance | ||
| * @param partitionRecordsMap Map of partition path to records | ||
| * @param isInitializing Whether this is during initialization | ||
| * @return Pair of tagged records and file group IDs | ||
| */ | ||
| @SuppressWarnings("rawtypes") | ||
| public static <I, O> Pair<HoodieData<HoodieRecord>, List<HoodieFileGroupId>> tagRecordsWithLocation( | ||
| HoodieBackedTableMetadataWriter<I, O> metadataWriter, | ||
| Map<String, HoodieData<HoodieRecord>> partitionRecordsMap, | ||
| boolean isInitializing) { | ||
| // Access the protected method - this works because we're in the same package | ||
| return metadataWriter.tagRecordsWithLocation(partitionRecordsMap, isInitializing); | ||
| } | ||
| } | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PavithranRick : do we still need this? or can we remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be required now.