-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel][Writes] Add schema validation utils #3003
Conversation
} | ||
|
||
/** | ||
* Returns all column names in this schema as a flat list. For example, a schema like: | - a | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really easily read or understand | - a | | - 1 | | - 2 | - b | - c | | - nest | | - 3
Can you make this multi line and in a tree format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because of Intellij auto-format. Added <pre>
tag to avoid formatting this block.
if (uniqueColNames.size() != flattenColNames.size()) { | ||
String duplicateColumns = flattenColNames.stream() | ||
.map(String::toLowerCase) | ||
.filter(n -> !uniqueColNames.add(n)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't every element of flattenColNames
already in uniqueColNames
?
so won't uniqueColNames.add(n)
return false every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. This should be a new set. Added a test to match the error message exactly to contain only the duplicate column names.
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/SchemaUtils.java
Outdated
Show resolved
Hide resolved
(Split from the larger PR delta-io#2944) These are utility to make sure the given schema when creating the table is valid (has no duplicate column names or invalid chars). The code/logic is similar to Delta-Spark/Standalone. Unittests
Description
(Split from the larger PR #2944)
These are utility to make sure the given schema when creating the table is valid (has no duplicate column names or invalid chars). The code/logic is similar to Delta-Spark/Standalone.
How was this patch tested?
Unittests