-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type name case sensitivity #10085
Comments
I would say the case sensitive matching is a reasonable default. Making it configurable without performance hit is also complicated. After all you can maintain a mapping from the user-facing names to normalized names in planner, and worker (thus Velox) only need to deal with normalized names thus case sensitivity should not be a problem here. |
My understanding that whether a column name is case sensitive depends on the database implementation and there is no rule to say if it should be case insensitive. https://www.scaler.com/topics/is-sql-case-sensitive/. So if Velox chooses to allow it to be case sensitive then it is what it is. @Yuhta Is this a common consensus in Meta? |
@yingsu00 It's more about design the library to support more cases. Velox is at the bottom level so it has to be in the most strict mode, and engine authors can relax the criteria by normalizing the names. If Velox chose to match name case-insensitively, there would be nothing engine authors can do to make it case sensitive at higher level. |
I consulted more people on this matter, and as @tdcmeehan pointed out, "Case sensitivity is a property of storage. For example, Hive and Iceberg are case insensitive, but some SQL databases are case sensitive. It is not a property of the engine--the engine needs to respect the convention adopted by storage. " If this is correct, implementing sense insensitiviy in just one single Engine probably won't work. "Suppose Spark wrote fooBar, and Presto wrote FooBar. In the Presto planner, we say, OK, the normalized name is foobar. We plan out the query and one worker is reading fooBar, another is reading FooBar--how can we accommodate case insensitivity without knowing in advance that we've written out the column in two different formats" It seems to me that we may want to normalize the column names to lower case in the Hive connector which supports both Hive table format and Iceberg table format. One possible place for it to happen is in |
No by storage he means metastore, not Velox. Files do not keep public facing names, that's the job of metastore. The names in files are just data stream identifiers. Engines (or the storage submodule in engines) need to match user facing names to these identifiers (with the help of metastore), and different engines can have different ways to match them (by name, case sensitive/insensitive, by position, by other unique id than name). Different engines using the same storage do need to keep consistent though, otherwise there will be confusion. |
We do have an option to normalize all names in file to lower case to make matching by name case insensitive easier for now, but we do have plan to have some configurable mechanism to make the column matching more generic. The implementation would likely to be replace the names in file with some changed names, like what DWRF reader is doing when |
@Yuhta We met similar issue when the column name contains spaces. I created #10388. I think the say to solve it is to
Our team is already working on 1) |
Description
Currently when checking type children here and here, it is using string equal which is case sensitive.
There is also a [test|https://github.com/facebookincubator/velox/blob/main/velox/type/tests/TypeTest.cpp#L405] which has a todo comment regarding case behavior.
Internally we are running into cases like
Field not found: foobar. Available fields are: x, y, fooBar
So I wonder is there any particular reason/use case to be case sensitive? Or should we make it case insensitive? If yes, I can file a PR.
The text was updated successfully, but these errors were encountered: