-
Notifications
You must be signed in to change notification settings - Fork 2.5k
feat(schema): phase 17 - Remove AvroSchemaUtils usage (part 2) #17581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(schema): phase 17 - Remove AvroSchemaUtils usage (part 2) #17581
Conversation
64ac0dd to
1d7fdbc
Compare
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveTypeUtils.java
Outdated
Show resolved
Hide resolved
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveAvroSerializer.java
Outdated
Show resolved
Hide resolved
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveAvroSerializer.java
Outdated
Show resolved
Hide resolved
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveAvroSerializer.java
Outdated
Show resolved
Hide resolved
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveAvroSerializer.java
Outdated
Show resolved
Hide resolved
311cfd7 to
4b9814f
Compare
4b9814f to
91797f9
Compare
|
Rebased |
f395ea6 to
2ca8f9f
Compare
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchemaUtils.java
Outdated
Show resolved
Hide resolved
| .orElse(null); | ||
|
|
||
| if (nonNullType == null) { | ||
| throw new org.apache.hudi.internal.schema.HoodieSchemaException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, let's import the class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
hudi-common/src/test/java/org/apache/hudi/common/schema/TestHoodieSchemaUtils.java
Outdated
Show resolved
Hide resolved
| HoodieSchema hoodieResult = HoodieSchemaUtils.resolveUnionSchema(hoodieFieldSchema, "TypeA"); | ||
|
|
||
| // Should produce equivalent schemas | ||
| assertEquals(avroResult.toString(), hoodieResult.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert on the objects being equal instead of the string representation by calling .toAvroSchema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump on this, it looks like it is not in the latest commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveTypeUtils.java
Outdated
Show resolved
Hide resolved
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveTypeUtils.java
Show resolved
Hide resolved
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveAvroSerializer.java
Outdated
Show resolved
Hide resolved
2ca8f9f to
41163c4
Compare
the-other-tim-brown
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM. Just one minor item remaining on the tests
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchemaUtils.java
Show resolved
Hide resolved
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveAvroSerializer.java
Show resolved
Hide resolved
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveTypeUtils.java
Show resolved
Hide resolved
c16ca0a to
13ac494
Compare
000b93c to
b9e691a
Compare
| HoodieSchema schema = HoodieSchema.parse(schemaWithTimestampMicros); | ||
|
|
||
| // HiveTypeUtils.generateColumnTypes throws an exception for timestamp-micros since it's not supported by AvroSerDe | ||
| assertThrows(Exception.class, () -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this case it will currently fall back to the underlying primitive and return a long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testGenerateColumnTypesForTimeMicros()
testGenerateColumnTypesForTimeMillis()
testGenerateColumnTypesForTimestampMicros()
These 3 tests are failing on master without the HiveTypeUtils migration. Let me fix them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified the test + fixed the HiveTypeUtils#generateTypeInfo.
|
|
||
| HoodieSchemaType type = schema.getType(); | ||
| if (type == DECIMAL && AvroSerDe.DECIMAL_TYPE_NAME | ||
| .equalsIgnoreCase((String) schema.getProp(AvroSerDe.AVRO_PROP_LOGICAL_TYPE))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These logical type checks seem to be very avro specific, can we just rely on the HoodieSchemaType now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, it's doable for all the types in this function except VARCHAR and CHAR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, PTAL
yihua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
badeef9 to
82775b6
Compare
82775b6 to
03b8fb4
Compare
|
Gosh, the CI is timing out, I'm going to increase the timeout to check if we the Azure CI can succeed without throwing any errors NOTE: Before we merge this in, we will need to undo the last commit on this PR to revert the Azure CI timeout. Edit to add: Not reverting Azure CI timeout increase here as increasing it to 3 hours is better than trying to get it to pass for 8 hours. |
…e#17581) * Remove AvroSchemaUtils from HiveAvroSerializer and HiveTypeUtils * Address comments * Address comments * Remove AvroSchemaUtils#resolveUnionSchema * Add more test to cover generateColumnTypes * Fix HiveTypeUtils#generateTypeInfo behaviour and ensure that it is not changed * Address comments * Increase Azure CI timeout



Describe the issue this Pull Request addresses
Reference issue: #14282
This PR focuses on migrating usages of
AvroSchemaUtilsto the internalHoodieSchemaabstraction andHoodieSchemaUtils.Key Changes:
HoodieTable,HoodieWriteHandle,HoodieRowParquetWriteSupport) to useHoodieSchemaandHoodieSchemaCompatibility.AvroSchemaUtilsstatic functions have been fully qualified. This explicitly marks technical debt and makes these usages easily searchable for future refactoring.HoodieSchemaUtilsandHoodieSchemaCompatibilityto supportHoodieSchemaobjects directly.Classes NOT included in migration (Fully Qualified)
The following classes retain
AvroSchemaUtilsusage but are now fully qualified:TestAvroSchemaUtilsAvroSchemaUtils(Self)AvroSchemaComparatorForRecordProjectionHoodieAvroUtilsHoodieSchemaCompatibilityMissingSchemaFieldExceptionHoodieSchemaUtilsSchemaBackwardsCompatibilityExceptionAvroSchemaRepairTestAvroSchemaRepairTestHoodieSchemaCompatibility(For consistency testing)Specific Ignored Usages (getAvroRecordQualifiedName)
The following classes ignore
AvroSchemaUtils.getAvroRecordQualifiedName:BaseHoodieWriteClientHoodieCatalogHoodieHiveCatalogHoodieTableFactoryStreamSyncTestHoodieTableFactorySummary and Changelog
This PR is a refactoring effort to improve schema abstraction within Hudi. By moving away from raw Avro utils, we pave the way for better type safety and cleaner internal APIs.
Impact
HoodieSchemaUtilsandHoodieSchemaCompatibilitynow play a larger role in schema validation and evolution logic.Risk Level
Low
Documentation Update
none
Contributor's checklist