-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discuss data key category for properties resources #1196
Comments
CC @iainireland I could go either way on this one. I think I lean toward putting all the data in the |
Right now, we distinguish between enumerated property sets and binary property sets by checking whether the key contains an '='. Will putting additional keys in the same namespace interfere with that? It seems fine in the specialized case (eg If it doesn't cause any problems, I'm fine either way. |
I lean towards having different category names because the type/shape of the return data is different. It is analogous to what we did for data struct naming -- even though we have the From the data struct naming ambiguity, I conclude that it is simpler to name data structs after the shape of the data they return, not how they are used or what functionality they represent. A further example: in the hypothetical scenario that Iain wants to generate UnicodeSets from enumerated property key=val pairings using CodePointTrie (for overall memory savings at cost of some perf hit), he will care about the shape of data, not just the purpose or code component. |
We should consider the namespacing to have four types:
How about
|
Okay, after thinking about this again, I've gone back to preferring a single
|
Shane's position seems reasonable. I'm in favour of lumping everything into the same namespace, unless we need to split up namespaces to avoid ambiguities. Is there any risk of ambiguity between Shane's categories 3+4 above (enumerated properties of code points vs enumerated properties of code points or strings)? Will we ever want to provide two versions of the same property, where one supports both strings and code points, and the other supports only code points? |
As I understand, Unicode will never change a property of code points to a property of code points or strings, so category 4 would only be used for new properties that don't exist yet. It could be used for a theoretical property like "emoji emotional state": given an emoji sequence, tell whether it is happy, sad, or neutral (enumerated property of strings). |
I don't have a strong opinion for either approach, but going for the simplicity is more appeal to me.
This scenario is similar to my comment in #1273 (comment). |
I was imagining a scenario where there's a property that is defined for strings and code points, but there's a compelling use case that only needs code points. In theory we might want to be able to provide only the code point data for that property, in which case the unified namespace might hypothetically be an obstacle. |
Interesting scenario. I think in such a case, we would just invent a new key in the same namespace; something like |
I can see the argument for consistency in having the ResourceKey category be "properties" to directly correspond with its source code component. Let me sketch this out and try to organize it for my own understanding... We have 2 dimensions:
I wonder if we will have enough characters in the TinyStr16 to hold all this info based on the above discussion. Let me see if I can sketch out examples to answer that question one way or another:
The "Script Extensions" row has to be different from enumerated properties because the return value for a code point is an array (not a single value representable as an integer), so we need to have unique getter fns for it, similar to what ICU does. |
|
|
Currently, the ResourceKey category name is
uniset
for theUnicodeSet
data (for binary and enumerated properties) and forCodePointTrie
data. For more context, see this comment.Briefly, that means we have keys like
We could change the ResourceKey category from
uniset
toproperties
. That would keep all of the keys in the same category, as they currently are. But it does not reflect the type of data in the payload being returned.Alternatively, we could change the category for CodePointTrie keys from
uniset
tocodepointttrie
orcpt
. Then,"uniset/gc@1"
becomes"codepointtrie/gc@1"
. This would make the category name reflect the type of data in the data payload / data struct, but some properties will "appear" in these 2 different keys (albeit differently) -- ex:"uniset/gc=Lu@1"
vs."codepointtrie/gc@1"
.Need approval:
The text was updated successfully, but these errors were encountered: