Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel][Writes] Add schema validation utils #3003

Merged
merged 2 commits into from
May 1, 2024

Conversation

vkorukanti
Copy link
Collaborator

Description

(Split from the larger PR #2944)

These are utility to make sure the given schema when creating the table is valid (has no duplicate column names or invalid chars). The code/logic is similar to Delta-Spark/Standalone.

How was this patch tested?

Unittests

@vkorukanti vkorukanti added this to the 3.2.0 milestone May 1, 2024
}

/**
* Returns all column names in this schema as a flat list. For example, a schema like: | - a | |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really easily read or understand | - a | | - 1 | | - 2 | - b | - c | | - nest | | - 3

Can you make this multi line and in a tree format?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because of Intellij auto-format. Added <pre> tag to avoid formatting this block.

if (uniqueColNames.size() != flattenColNames.size()) {
String duplicateColumns = flattenColNames.stream()
.map(String::toLowerCase)
.filter(n -> !uniqueColNames.add(n))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't every element of flattenColNames already in uniqueColNames ?

so won't uniqueColNames.add(n) return false every time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This should be a new set. Added a test to match the error message exactly to contain only the duplicate column names.

@vkorukanti vkorukanti merged commit bc4fd23 into delta-io:master May 1, 2024
8 checks passed
vkorukanti added a commit to vkorukanti/delta that referenced this pull request May 1, 2024
(Split from the larger PR delta-io#2944)

These are utility to make sure the given schema when creating the table
is valid (has no duplicate column names or invalid chars). The
code/logic is similar to Delta-Spark/Standalone.

Unittests
@vkorukanti vkorukanti deleted the schemaUtils branch May 9, 2024 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants