-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move error_value (and optionally default_value) to the CPT header? #1879
Comments
This makes sense to me. |
It seems fine to me, it seems to be an incremental improvement on code clarity and safety without actually changing any behavior. If you're so inclined, since this value should be the last element in the CPT's data array (according to this constant), you could truncate the data array by 1 element. But I don't know if that's beneficial. |
@pandusonu2 is working on this The bug is to add an error_value field to this type, of type
And we can parse it in from the last element of the data array here: icu4x/utils/codepointtrie/src/toml.rs Lines 103 to 122 in b6774e2
And then replace all uses of the following constant with it:
We also probably need a |
@pandusonu2 What is the status of this issue? |
See previous discussion in #1183.
The thread #1183 got derailed into a discussion on trust boundaries. I moved that conversation over to #1290. Our work on CrabBake is our solution to that side of the problem and allows us to reason about "invalid serialized data" in the serde case.
I increasingly feel that the option most consistent with how our data model works is to move these two fields, especially
error_value
, into the CPT header. This would help us eliminate the problematicDATA_GET_ERROR_VALUE
.This should be an easy change: the error value is always the last value in the data array, so at datagen time, we just remove the last value from the array and save it into the header.
Responses to the pushback on this idea from the previous thread:
In Rust, I do not believe that this will increase code complexity. It would involve adding a type parameter to
CodePointTrieHeader
, but this parameter would just be bubbled down fromCodePointTrie
, which already has a type parameter.It may require adding
#[derive(Deserialize)]
on TrieType types that don't already have it.We are moving from one type of GIGO to another type of GIGO. The new type of GIGO is in fact slightly more robust, because it allows us to automatically fail in the constructor if the error value field is not present in the header, something we can't as easily do when it is stored in the data array. Discussion of when or when not to rely on GIGO is the topic of #1290 and I do not want to derail this thread again on that separate, very important issue.
Needs approval from:
The text was updated successfully, but these errors were encountered: