-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-write ResourceKey #1511
Re-write ResourceKey #1511
Conversation
A few more notes:
|
provider/core/src/resource.rs
Outdated
/// | ||
/// Panics if the syntax of the path is not valid. | ||
#[inline] | ||
pub const fn new(path: &'static str) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that the rust compiler will happily generate (potentially expensive) runtime code for this when this is called in a non const context: i recommend using a macro the way my tinystr PR does to force it to be a static. That way you can also guarantee that the hash will be in the source and not generated at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if I agree.
- We cannot prevent people from using the function, because the macro is just codegen and it needs to wrap a public function (although we could rename or hide the function).
- The type name is included in the prelude and it's more clear what the function is for (it creates a new ResourceKey, by definition); macros are relatively less friendly for documentation and tooling.
- My next step is going to be adding a
const ResourceKey
field toDataMarker
, which will force these into a const context. - The problem of const functions "accidentally" resolving at runtime instead of build time is not unique to us. I'd rather not introduce the concept of using macros to solve this problem unless you can point to something that says that this is the "official" way to solve the footgun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case because it's going to be rare to construct these on your own I'm less worried about preventing people from using this, and more about giving people a convenient way to instantiate a const. There is no such convenient way currently.
But given that most users will just use the exported const, I guess it's fine.
I don't have anything saying this is the "official" way to solve the footgun but it's basically the only way and I asked other community members: the const thing happens because rust doesn't really pick whether to const a function based on whether the argument is a literal, and the only way to say "my arguments must be literals" is via a macro. It's the right tool for the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to disallow runtime key creation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the typical way to do that is to remove all constructors (except often Default::default()
), leave in a #[doc(hidden)]
constructor with a name that has underscores in it (ResourceKey::__internal_construct()
or something), and use it from the macro.
I don't think we need to here though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added back the macro, and kept the try_new
constructor but not the new
constructor, with appropriate documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I anticipate adding a global hash-to-ResourceKey map for all ICU4X components in the tools directory so that we can reproduce the ResourceKey from a ResourceKeyHash. I think this is the best path forward.
I think requiring a central directory of keys is a big downside of tagging the hashes instead of the strings. I've updated #1480 to have a similar API to this PR, but tag the path instead.
provider/core/src/resource.rs
Outdated
path: &'static str, | ||
_tag0: [u8; 8], | ||
hash: ResourceKeyHash, | ||
_tag1: [u8; 2], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResourceKeyHash
has a fixed size, so we don't need the closing tag.
You could also use a single array for tag and key instead of making them separate fields and forcing them to be consecutive with repr(C)
:
#[inline]
pub const fn try_new(path: &'static str) -> Result<Self, ()> {
match Self::check_path_syntax(path) {
Ok(_) => {
let mut bytes = *b"ICU4XK[\x02XXXX\x03]";
bytes[9..=12] = helpers::fxhash_32(path.as_bytes()).to_le_bytes();
Ok(Self { path, bytes })
},
Err(_) => Err(())
}
}
#[inline]
pub const fn get_hash(&self) -> ResourceKeyHash {
self.bytes[9..=12] // modulo wrapping
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the closing tag to make it less likely to have false positives (based on the assumption that one instance of a tag is more likely to occur than two tags separated by a specific number of bytes).
I'll need to check to make sure that self.bytes[9..=12]
doesn't generate panicky code. Since it's an array, theoretically it should figure out that the operation can't fail, but in general the index operator []
is panicky.
What's wrong with repr(C)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slicing is actually done by a trait, so self.bytes[9..=12]
isn't const
.
Regarding repr(C)
it just seems like it's the wrong hammer. We're not interacting with C code, we're just trying to put data in a specific order, which I think is clearer by using a single array if the array types match like they do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on the assumption that one instance of a tag is more likely to occur than two tags separated by a specific number of bytes
I don't think that holds if the single tag has the same length as the two tags combined.
provider/core/src/resource.rs
Outdated
match Self::check_path_syntax(path) { | ||
Ok(_) => Ok(Self { | ||
path, | ||
_tag0: *b"ICU4XK[\x02", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're making tags human readable (instead of just being magic numbers), why use control characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They may as well be magic numbers. I have no scientific reason for doing what I did here other than to make it less likely to have collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a 2 any less likely than a 57? I don't see how any string is less or more likely than any other (maybe other than all NUL
), so I'd prefer if these were all printable bytes.
If I'm not mistaken the choice of tag length came from the previous PR because it fits exactly into a tinystr8
and not any architectural or probabilistic arguments. If the bytes in a binary are uniformly distributed then it's very unlikely that 64 consecutive bits match this.
In any case, I'd appreciate removing _tag1
and making the tag printable (alternatively, use arbitrary numbers).
provider/core/src/resource.rs
Outdated
/// | ||
/// Panics if the syntax of the path is not valid. | ||
#[inline] | ||
pub const fn new(path: &'static str) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep category
, sub_category
, version
as the public API, rather than having the client assemble the string.
This implies using a macro that concat!
s the literals, and then delegates to this const fn
(as it cannot concat).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it this way because:
- This new model allows multiple namespaces in key strings, rather than just category/subcategory, which was an artificial limitation
- I find that writing the string in code is actually shorter and more readable than using an opaque concatenation algorithm
provider/core/src/resource.rs
Outdated
/// | ||
/// Panics if the syntax of the path is not valid. | ||
#[inline] | ||
pub const fn new(path: &'static str) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to disallow runtime key creation?
provider/core/src/resource.rs
Outdated
/// ); | ||
/// ``` | ||
pub fn get_components(&self) -> ResourceKeyComponents { | ||
self.into() | ||
pub fn iter_components(&self) -> impl Iterator<Item = Cow<str>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be &str
now as it's always borrowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
I would like to decouple this PR from the static data slicing / keyextract questions. I will remove the tags and the |
(cherry picked from commit 04b6acf)
This reverts commit a95f84a.
(cherry picked from commit fa2a295)
Note: This PR comes with a code size win, most likely because the key string is computed at compile time. Old:
New:
|
experimental/tinystr_neo/Cargo.toml
Outdated
@@ -32,3 +32,7 @@ serde = { version = "1.0.123", optional = true, default-features = false, featur | |||
serde_json = { version = "1.0", default-features = false, features = ["alloc"] } | |||
bincode = "1.3" | |||
postcard = { version = "0.7", features = ["use-std"] } | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was from #1514 which is now merged.
provider/core/src/resource.rs
Outdated
/// Therefore, users should not generally create ResourceKey instances; they should instead use | ||
/// the ones exported by a component. | ||
#[derive(PartialEq, Eq, Copy, Clone)] | ||
#[repr(C)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove?
} | ||
/// A compact hash of a [`ResourceKey`]. Useful for keys in maps. | ||
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Copy, Clone, Hash)] | ||
#[repr(transparent)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the repr
s here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep the repr(transparent)
on ResourceKeyHash because that one will be used as a ULE in the BlobProvider. But I removed the repr(C)
on ResourceKey.
/// ); | ||
/// ``` | ||
pub fn get_components(&self) -> ResourceKeyComponents { | ||
self.into() | ||
pub fn iter_components(&self) -> impl Iterator<Item = &str> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have get_path
, I don't see why we still (publicly) need this. Before this was the cheapest way to return the path, but now we can just return the static ref. So instead of doing path_buf.extend(key.iter_components())
we can do path_buf.push(key.get_path())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provider/core/src/resource.rs
Outdated
pub fn get_component_0(&self) -> &str { | ||
// This cannot fail because of the preconditions on path (at least one '/') | ||
self.iter_components().next().unwrap() | ||
} | ||
|
||
/// Gets the second path component of a [`ResourceKey`]. | ||
/// | ||
/// # Examples | ||
/// | ||
/// ``` | ||
/// use icu_provider::prelude::*; | ||
/// | ||
/// let resc_key = icu_provider::hello_world::key::HELLO_WORLD_V1; | ||
/// assert_eq!("helloworld@1", resc_key.get_component_1()); | ||
/// ``` | ||
pub fn get_component_1(&self) -> &str { | ||
// This cannot fail because of the preconditions on path (at least one '/') | ||
self.iter_components().nth(1).unwrap() | ||
} | ||
|
||
/// Gets the last path component of a [`ResourceKey`] without the version suffix. | ||
/// | ||
/// # Examples | ||
/// | ||
/// ``` | ||
/// use icu_provider::prelude::*; | ||
/// | ||
/// let resc_key = icu_provider::hello_world::key::HELLO_WORLD_V1; | ||
/// assert_eq!("helloworld", resc_key.get_last_component_no_version()); | ||
/// ``` | ||
pub fn get_last_component_no_version(&self) -> &str { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like these names at all...
Also, you just told me that you prefer the constructor to take a string instead of (category, subcategory, version) so there's more flexibility in how many namespaces a client uses, but this API very rigidly only supports the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted get_component_0
and get_component_1
. I need get_last_component_no_version
for compatibility with uprops provider. I filed #1515 to follow up on that dependency.
/// ); | ||
/// ``` | ||
pub fn get_components(&self) -> ResourceOptionsComponents { | ||
self.into() | ||
pub fn iter_components(&self) -> impl Iterator<Item = Cow<str>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only ever used to extend a PathBuf
. How about not constructing iterators and cows and instead doing something like:
pub fn push_to(&self, path_buf: &mut PathBuf) {
match self.variant {
Some(variant) => path_buf.push(*variant),
_ => {}
}
match self.langid {
Some(langid) => path_buf.push(langid.to_string()),
_ => {}
}
}
We could do something similar for ResourceKey
(instead of exposing get_path
and iter_components
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Good observation. I agree that we should clean this up.
One issue is that std::path::PathBuf
is not no_std
. Currently these APIs are no_std
and are used indirectly by other no_std
code.
I filed #1516 to follow up.
🎉 All dependencies have been resolved ! |
Fixes #1148
Depends on #1504
Depends on #1514
I implemented it as
When we search the executables, we can find the hash, which is enough for what we need. I anticipate adding a global hash-to-ResourceKey map for all ICU4X components in the tools directory so that we can reproduce the ResourceKey from a ResourceKeyHash. I think this is the best path forward.
I have not yet updated all the components to the new ResourceKey because I would like approval on the approach first.