Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract Numerator and Denominator Constants to Byte Arrays #3922

Merged
merged 110 commits into from
Oct 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
1bda4d4
Add the needed helper and the structure
younies Aug 23, 2023
8b97948
fix clippy
younies Aug 23, 2023
1ac078a
change signature
younies Aug 23, 2023
ce27339
Add all the helpers functions
younies Aug 24, 2023
f785251
Use the helpers to generate the constant value
younies Aug 24, 2023
2114087
fix
younies Aug 24, 2023
acedc32
first stage in data ( the data is not complete yet!)
younies Aug 24, 2023
b80be70
fix clippy
younies Aug 24, 2023
bf273b7
Merge branch 'main' of github.com:unicode-org/icu4x into units-conver…
younies Sep 5, 2023
40a412c
fix fingerprint
younies Sep 5, 2023
e376ec6
Units conversion skeleton.
younies Sep 13, 2023
8cd825b
finalize the skeleton
younies Sep 15, 2023
0ab1091
update the data
younies Sep 15, 2023
7fefed3
fix
younies Sep 15, 2023
4c99aab
fix
younies Sep 15, 2023
ecd0df2
fix ule issue
younies Sep 19, 2023
40f541e
Fix ULE issue and get the correct results
younies Sep 20, 2023
51fb8a3
remove unnecessary print
younies Sep 20, 2023
3330f09
fix assign
younies Sep 20, 2023
10883bc
fix clippy
younies Sep 20, 2023
5cbd0f2
improvement
younies Sep 20, 2023
b809232
improve
younies Sep 20, 2023
27bbcc9
improve the code
younies Sep 20, 2023
631b598
move to the correct place
younies Sep 20, 2023
4a93af5
improve naming
younies Sep 20, 2023
748d8e0
fix convert fractional
younies Sep 20, 2023
0415d57
improve
younies Sep 20, 2023
6c62c40
Fix comments and add tests
younies Sep 20, 2023
4291b40
fix format
younies Sep 20, 2023
3f8503e
improve and add test cases
younies Sep 20, 2023
a6248f3
fix clippy
younies Sep 20, 2023
eef1612
fix the comments
younies Sep 20, 2023
50c03b0
rename
younies Sep 20, 2023
42a7817
fix duplication
younies Sep 20, 2023
346bd9a
fix the code
younies Sep 20, 2023
b5790d6
fix data bake
younies Sep 20, 2023
8efed21
fix
younies Sep 20, 2023
c2ff685
fix
younies Sep 20, 2023
c7fb8da
fix clippy
younies Sep 20, 2023
41beda1
Merge branch 'main' of github.com:unicode-org/icu4x into units-conver…
younies Sep 25, 2023
a52ca57
fix after merge
younies Sep 25, 2023
754f33e
Update provider/datagen/src/transform/cldr/units/helpers.rs
younies Sep 25, 2023
8cba11b
Add TODOs
younies Sep 25, 2023
c046110
fix the places of dependencies
younies Sep 25, 2023
f7467e6
fix uses
younies Sep 25, 2023
d098dfe
fix
younies Sep 25, 2023
6c7cda6
fix tidy
younies Sep 25, 2023
edd6035
fix order
younies Sep 25, 2023
ea7e1bf
fix : cargo make ci-job-msrv-features-3
younies Sep 25, 2023
408435d
Update experimental/unitsconversion/src/provider.rs
younies Sep 26, 2023
bcf11b1
make it more concise.
younies Sep 26, 2023
9b2926a
rename a fn
younies Sep 26, 2023
eea4da0
make the removal more clear
younies Sep 26, 2023
242fbf0
fix name
younies Sep 26, 2023
9c27942
fix comments
younies Sep 26, 2023
a45cd8c
adjust function
younies Sep 26, 2023
e339d63
fix loop breaker
younies Sep 26, 2023
20fac57
make the code concise
younies Sep 26, 2023
1860cbd
make test concise
younies Sep 26, 2023
4a10967
fix clippy
younies Sep 26, 2023
940909e
fix naming
younies Sep 26, 2023
8248b96
fix convert_constant_to_num_denom_strings
younies Sep 26, 2023
e965ce5
use BigRational
younies Sep 26, 2023
ac9daf5
fix fn name
younies Sep 26, 2023
9af5b98
rename
younies Sep 26, 2023
37bae97
fix
younies Sep 26, 2023
a107b75
fix the maximum depth
younies Sep 26, 2023
9483c8f
fix
younies Sep 26, 2023
d1b022c
fix constant name
younies Sep 26, 2023
8cc2a11
fix clibby
younies Sep 26, 2023
e781cf5
make the code more concise
younies Sep 27, 2023
35ef822
fix fmt
younies Sep 27, 2023
21ce45d
fix
younies Sep 27, 2023
e557ed2
remove the remove white space fn
younies Sep 27, 2023
7e34f1f
improve
younies Sep 27, 2023
9a60a86
fix
younies Sep 27, 2023
364da65
fix clibby
younies Sep 27, 2023
bbed484
fix
younies Sep 28, 2023
9553ef2
Update provider/datagen/src/transform/cldr/units/helpers.rs
younies Sep 28, 2023
c7365e4
fix
younies Sep 28, 2023
eea8a3e
make the function more concise
younies Sep 28, 2023
fe40b0e
add comments
younies Sep 28, 2023
84fa812
fix
younies Sep 28, 2023
010afc7
fix
younies Sep 28, 2023
33fea6e
done fixing
younies Sep 28, 2023
0fb142f
fix clibby
younies Sep 28, 2023
63647c9
add issue
younies Sep 28, 2023
e734e14
remove dependency of "serde"
younies Oct 2, 2023
45b3a1e
Merge branch 'main' of github.com:unicode-org/icu4x into units-conver…
younies Oct 2, 2023
e9c1a7c
fix after merge.
younies Oct 2, 2023
bcb706f
Merge branch 'main' of github.com:unicode-org/icu4x into units-conver…
younies Oct 9, 2023
2398f8d
Use GenericFraction
younies Oct 10, 2023
7324663
remove using vecs.
younies Oct 10, 2023
78f03d5
fix clippy
younies Oct 10, 2023
e18ba12
detect the loop
younies Oct 10, 2023
ebfe084
detect the loop 2
younies Oct 11, 2023
a64515f
fix the loop
younies Oct 11, 2023
1ca05fe
detect the loop 3
younies Oct 11, 2023
36380c3
fix naming
younies Oct 11, 2023
832aeeb
fix clippy
younies Oct 11, 2023
a5aa753
fix tidy
younies Oct 11, 2023
bb6b786
fix comment
younies Oct 11, 2023
d98c818
Merge branch 'main' of github.com:unicode-org/icu4x into units-conver…
younies Oct 11, 2023
afb8d8f
use &str instead
younies Oct 16, 2023
1900832
c
younies Oct 16, 2023
ba1dde2
fix build after merge.
younies Oct 16, 2023
f5e2e79
rename count
younies Oct 16, 2023
61796c6
Update provider/datagen/src/transform/cldr/units/mod.rs
younies Oct 16, 2023
4a6e2bb
remove "num"
younies Oct 16, 2023
70f48c7
add num
younies Oct 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 57 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 0 additions & 2 deletions experimental/unitsconversion/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,7 @@ displaydoc = { version = "0.2.3", default-features = false }
icu_locid = { workspace = true }
icu_provider = { workspace = true, features = ["macros"] }
serde = { version = "1.0", default-features = false, features = ["derive", "alloc"], optional = true }
tinystr = { workspace = true, features = ["zerovec"], default-features = false }
zerovec = { workspace = true, features = ["yoke"] }

icu_unitsconversion_data = { workspace = true, optional = true }

[features]
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

80 changes: 76 additions & 4 deletions experimental/unitsconversion/src/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
//! Read more about data providers: [`icu_provider`]

use icu_provider::prelude::*;
use zerovec::ZeroMap;
use zerovec::{ZeroMap, ZeroVec};

#[cfg(feature = "compiled_data")]
#[derive(Debug)]
Expand All @@ -36,15 +36,15 @@ const _: () = {
/// The latest minimum set of keys required by this component.
pub const KEYS: &[DataKey] = &[UnitsConstantsV1Marker::KEY];

/// This type contains all of the constants data for units conversion.
/// This type encapsulates all the constant data required for unit conversions.
///
/// <div class="stab unstable">
/// 🚧 This code is considered unstable; it may change at any time, in breaking or non-breaking ways,
/// including in SemVer minor releases. While the serde representation of data structs is guaranteed
/// to be stable, their Rust representation might not be. Use with caution.
/// </div>
#[icu_provider::data_struct(marker(UnitsConstantsV1Marker, "units/constants@1", singleton))]
#[derive(Default, Clone, PartialEq, Debug)]
#[derive(Clone, PartialEq, Debug)]
#[cfg_attr(
feature = "datagen",
derive(serde::Serialize, databake::Bake),
Expand All @@ -57,5 +57,77 @@ pub struct UnitsConstantsV1<'data> {
// Also, the constant types.
/// Maps from constant name (e.g. ft_to_m) to the value of the constant (e.g. 0.3048).
#[cfg_attr(feature = "serde", serde(borrow))]
pub constants_map: ZeroMap<'data, str, str>,
pub constants_map: ZeroMap<'data, str, ConstantValueULE>,
}

/// This enum is used to represent the type of a constant value.
/// It can be either `ConstantType::Actual` or `ConstantType::Approximate`.
/// If the constant type is `ConstantType::Approximate`, it indicates that the value is not numerically accurate.
#[zerovec::make_ule(ConstantExactnessULE)]
#[cfg_attr(
feature = "datagen",
derive(serde::Serialize, databake::Bake),
databake(path = icu_unitsconversion::provider),
)]
#[cfg_attr(feature = "serde", derive(serde::Deserialize))]
#[derive(Copy, Clone, Debug, PartialOrd, Ord, PartialEq, Eq, Default)]
#[repr(u8)]
pub enum ConstantExactness {
#[default]
Exact = 0,
Approximate = 1,
}

/// This enum is used to represent the sign of a constant value.
#[zerovec::make_ule(SignULE)]
#[cfg_attr(
feature = "datagen",
derive(serde::Serialize, databake::Bake),
databake(path = icu_unitsconversion::provider),
)]
#[cfg_attr(feature = "serde", derive(serde::Deserialize))]
#[derive(Copy, Clone, Debug, PartialOrd, Ord, PartialEq, Eq, Default)]
#[repr(u8)]
pub enum Sign {
#[default]
Positive = 0,
Negative = 1,
}

// TODO(#4098): Improve the ULE representation. Consider using a single byte for sign and type representation.
/// This struct encapsulates a constant value, comprising a numerator, denominator, sign, and type.
#[zerovec::make_varule(ConstantValueULE)]
#[derive(Clone, Debug, PartialOrd, Ord, PartialEq, Eq, Default)]
#[cfg_attr(
feature = "datagen",
derive(databake::Bake),
databake(path = icu_unitsconversion::provider),
)]
#[cfg_attr(
feature = "datagen",
derive(serde::Serialize),
zerovec::derive(Serialize)
)]
#[cfg_attr(
feature = "serde",
derive(serde::Deserialize),
zerovec::derive(Deserialize)
)]
#[zerovec::derive(Debug)]
pub struct ConstantValue<'data> {
// TODO(https://github.com/unicode-org/icu4x/issues/4092).
/// The numerator of the constant value in bytes starting with the least significant byte.
#[cfg_attr(feature = "serde", serde(borrow))]
pub numerator: ZeroVec<'data, u8>,

// TODO(https://github.com/unicode-org/icu4x/issues/4092).
/// The denominator of the constant value in bytes starting with the least significant byte.
#[cfg_attr(feature = "serde", serde(borrow))]
pub denominator: ZeroVec<'data, u8>,

/// Determines whether the constant value is positive or negative.
pub sign: Sign,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: if you make one of the denominator/numerator signed, you only need a bit instead of an extra byte to store the sign.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not only the sign, it is still the ConstantExactness. Therefore, I prefer to add just a todo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exactness will be harder to optimise away, whereas the sign has an obvious solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a signed numerator and an unsigned denominator is not a good idea. Consistency is beneficial for both maintenance and readability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know when deserializing whether the number is positive or negative?

Like, how do you know if 0xFF is +255 or -1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of that function seems nontrivial; it involves copying the bytes to a new vector and complementing them

https://docs.rs/num-bigint/latest/src/num_bigint/bigint/convert.rs.html#381

Copy link
Member

@sffc sffc Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think how the library serializes negatives is just by making the high bit in the highest nonzero byte be 1. But this means we lose capacity. For example, I imagine that you need to store 9 bytes for 255 (8 for the mantissa and 1 for the sign). You can't get out of storing the sign bit somewhere.

Copy link
Member

@robertbastian robertbastian Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I assumed BigInt would use 2s complement internally, but it uses a sign bit as well. I guess this aligns with what Younies is currently doing.

Note that from_bytes_le also allocates, it's not possible to create a borrowed BigUint.


/// Determines whether the constant value is actual or approximate.
pub constant_exactness: ConstantExactness,
}
5 changes: 4 additions & 1 deletion provider/datagen/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ version.workspace = true
all-features = true

[dependencies]

# ICU components
icu_calendar = { workspace = true, features = ["datagen"] }
icu_casemap = { workspace = true, features = ["datagen"] }
Expand Down Expand Up @@ -90,6 +89,10 @@ zip = { version = ">=0.5, <0.7", default-features = false, features = ["deflate"
rayon = { version = "1.5", optional = true }
ureq = { version = "2", optional = true }

fraction = {version = "0.14.0", default-features = false }
num-bigint = { version = "0.4.4", default-features = false }


# Dependencies for "bin" feature
clap = { version = "4", optional = true, features = ["derive"] }
eyre = { version = "0.6", optional = true }
Expand Down
Loading
Loading