-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dependency Updates, Documentation Enhancements, New Features, and Code Refactoring #250
base: version3.0
Are you sure you want to change the base?
Conversation
…ration - Replaced `getrandom` crate with `rand` to simplify and modernize random number generation. - Updated `gen_random_*` functions to use `rand::thread_rng().fill()` for generating random bytes of specified lengths. - Adjusted encryption functions to use `gen_random_bytes` for generating random values.
…to rows, and column_dimensions to columns - Renamed `cell_collection` to `cells` for consistency and clarity. - Renamed `row_dimensions` to `rows` for improved readability. - Renamed `column_dimensions` to `columns` for better understanding. - Updated all method calls and references accordingly.
This commit removes the `ahash` dependency and replaces its usage with the standard library's `DefaultHasher`. This change simplifies the dependency tree and reduces external dependencies while maintaining the same hashing functionality. - Removed the `ahash` crate from `Cargo.toml`. - Updated the `SharedStringItem` struct to use `std::hash::DefaultHasher` instead of `AHasher` from `ahash`.
…tion This commit refactors the random byte generation in the `gen_random_bytes` function by replacing the use of `OsRng` with `thread_rng`. This change simplifies the code and improves performance by using the thread-local random number generator. - Updated the `gen_random_bytes` function to utilize `thread_rng()` for filling the byte vector. - Removed the `OsRng` import as it is no longer needed.
This commit simplifies the test code by replacing custom hex decoding and encoding functions with the `hex-literal` crate. This change improves readability and reduces boilerplate code in the tests. - Removed `decode_hex` and `encode_hex` helper functions. - Updated test cases to use the `hex!` macro for byte literals.
This commit sorts the dependencies in `Cargo.toml` for improved readability and organization. Sorting helps maintain a consistent format and makes it easier to locate specific dependencies.
This commit removes the lint configurations from `Cargo.toml` and transfers them to `lib.rs` using attribute macros. This change centralizes lint settings within the codebase, making it easier to manage and understand the linting rules applied to the project. - Removed lint settings from `Cargo.toml`. - Added lint attributes in `lib.rs` for better backwards compatibility.
This commit refactors the code to replace the use of the `is_none_or` method with match statements. This change improves compatibility with a lower minimum supported Rust version, since `is_none_or` is only available in stable since Rust 1.82.0. - Updated `is_support` method in `TwoCellAnchor` to use match. - Refactored `_has_vertical` and `has_horizontal` methods in `MergeCells` to use match for start and end row/column checks.
This commit sets the minimum supported Rust version in `Cargo.toml` to 1.79.0, as determined by the `cargo-msrv` tool. This version is dictated by the `bitstream-io` dependency, which is a transitive dependency of the `image` crate. Setting the minimum version ensures compatibility with the required features and functionality. - Added `rust-version = "1.79.0"` to `Cargo.toml`.
This commit removes the unused `js` feature from the `[features]` section in `Cargo.toml`. The `js` feature is no longer utilized in the project, and its removal helps to maintain a cleaner configuration. Previously, this feature led to the inclusion of the `js` feature from the `getrandom` dependency, which was removed in a prior commit. - Deleted the `js` feature entry from `Cargo.toml`.
… calendar systems * add `num-traits` dependency for numeric casts * introduce `DEFAULT_TIMEZONE` constant for consistency * implement `excel_to_date_time_object` for converting Excel timestamps to `NaiveDateTime` * add `convert_date_windows_1900` and `convert_date_mac_1904` for specific calendar systems * enhance `convert_date_crate` to support both Windows 1900 and Mac 1904 systems * include detailed documentation for new functions with examples and panic behavior
…d enhance RNG security - Introduced the `generate_random_bytes!` macro to eliminate repetitive code for generating random bytes. - Replaced `rand::thread_rng` with `rand::rngs::OsRng` to utilize a cryptographically secure random number generator. - Updated cryptographic key constants to use fixed-size arrays instead of slices for improved type safety. - Enhanced error handling in random byte generation by adding `expect` messages. - Removed unused imports and commented-out code to clean up the codebase. - Improved consistency in hash function implementations and key management. This refactoring improves code maintainability, readability, and security by centralizing random byte generation and ensuring the use of secure RNG sources.
Extract multiple static variables into the `constants` module, eliminating the need to pass them as parameters in function calls. This refactor simplifies function signatures, centralizes configuration values, and enhances maintainability. Future updates may revert this approach to accommodate support for additional algorithms as the crate evolves.
…_code at crate level - Rename private methods across multiple modules by removing leading underscores to adhere to Rust naming conventions. - Additionally, add `#![allow(dead_code)]` at the crate level to silence dead code warnings.
…sheet processing - Reorganized the `make_buffer` function for better readability and maintainability. - Replaced manual iteration with `try_for_each` for processing worksheets. - Consolidated worksheet processing logic to handle both deserialized and raw data more cleanly. - Improved comments for clarity on each processing step. - Streamlined the addition of various worksheet components (charts, drawings, comments, etc.). - Ensured proper handling of printer settings and other worksheet relationships.
…tions * Add detailed parameter descriptions and return values for all public functions * Improve documentation formatting and clarity for encryption methods * Add specific documentation for AES-256-CBC implementation details * Include clippy allow attributes for possible truncation warnings * Document HMAC and IV generation processes * Add comprehensive documentation for password-to-key conversion logic
This fixes the following `clippy` warning: > multiple versions for dependency `thiserror`: 1.0.69, 2.0.8 > for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#multiple_crate_versions `thiserror` and `thiserror-impl` are transitive dependencies: ```sh cargo tree --invert --package thiserror-impl@1.0.69 thiserror-impl v1.0.69 (proc-macro) └── thiserror v1.0.69 ├── html_parser v0.7.0 │ └── umya-spreadsheet v2.2.0 (/home/mxsrm/code/umya-spreadsheet) └── rav1e v0.7.1 └── ravif v0.11.11 └── image v0.25.5 └── umya-spreadsheet v2.2.0 (/home/mxsrm/code/umya-spreadsheet) cargo tree --invert --package thiserror-impl@2.0.8 thiserror-impl v2.0.8 (proc-macro) └── thiserror v2.0.8 ├── pest v2.7.15 │ ├── html_parser v0.7.0 │ │ └── umya-spreadsheet v2.2.0 (/home/mxsrm/code/umya-spreadsheet) │ ├── pest_derive v2.7.15 (proc-macro) │ │ └── html_parser v0.7.0 (*) │ ├── pest_generator v2.7.15 │ │ └── pest_derive v2.7.15 (proc-macro) (*) │ └── pest_meta v2.7.15 │ └── pest_generator v2.7.15 (*) └── zip v2.2.2 └── umya-spreadsheet v2.2.0 (/home/mxsrm/code/umya-spreadsheet) ```
…lization - Refactored the `is_address` function to use `OnceLock` for lazy initialization of the regex pattern, improving performance by compiling the regex only once. - Enhanced documentation for `is_address` to clarify its parameters, return values, and potential panics. - Removed the previous implementation of `is_address` to streamline the code. - Added examples in the documentation to demonstrate usage.
- Corrected the documentation comment for `get_sheet_by_name` to fix the syntax error. - Refactored the `check_sheet_name` method to use a more concise if-else structure for clarity. - Improved readability of the code while maintaining the same functionality.
You'd need to bemchmark to check if using |
- Added `assert_sha256` macro for asserting SHA-256 hash equivalence. - Added `print_sha256_hex` mmacro for easy printing of SHA-256 hashes. - Move test helper macros to `helper/utils`. - Updated the `crypt` module tests to use the new `assert_sha256` macro. - Adjusted linting configurations in `lib.rs` to allow unused macros.
- Replaces the use of `once_cell::sync::OnceCell` with `std::sync::OnceLock` in `compile_regex` macro to improve compatibility and adhere to Rust's standard library practices. - Removes now unused dependency on `once_cell`. This change ensures that the codebase uses the latest and most idiomatic way to handle lazy initialization of static variables, enhancing both performance and safety.
A bit more details on the two different crates.
My actual suggestion is to look into the API itself and remove any mutable direct access to |
Update the library documentation to be more informative and user-friendly. The changes include: - Improved descriptions and example code snippets for common use cases like reading/writing files, adding sheets, changing cell values, styling and inserting/removing rows/ columns. - Improved wording to enhance clarity and correct minor grammatical issues. - Added a more detailed description of the lazy reader. - Improved code example explanations. - Added doc comment links to relevant modules/structs (e.g. [Style](crate::structs::style)).
This commit introduces a new macro `pub_mod_use!` to streamline module declarations and exports in `structs.rs`. The macro replaces individual module declarations with a single macro call, reducing code duplication and improving readability: - Modules are now declared and exported using the `pub_mod_use!` macro. - Each module is made private within its own scope and then re-exported with the desired visibility. - The change reduces the number of lines significantly, from 494 to 170, enhancing maintainability. This refactoring will make it easier to manage the visibility of modules and their contents in future updates.
If you want to substantially change the API you would end in a near-total re-write of the whole crate. If you do that, you could implement many things more efficiently, e.g. you could abstract the whole Unit types into a single generic one using use std::{
borrow::Cow,
fmt::Display,
str::FromStr,
};
#[derive(Clone, Default, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub struct Value<T> {
value: Option<T>,
}
impl<T> Value<T> {
pub(crate) fn get_value(&self) -> Option<&T> {
self.value.as_ref()
}
pub(crate) fn set_value(&mut self, value: T) -> &mut Self {
self.value = Some(value);
self
}
pub(crate) fn remove_value(&mut self) -> &mut Self {
self.value = None;
self
}
pub(crate) fn has_value(&self) -> bool {
self.value.is_some()
}
}
impl<T: Display + Default + Clone + FromStr> Value<T> {
pub(crate) fn set_value_string<S: Into<String>>(&mut self, value: S) -> &mut Self {
match value.into().parse::<T>() {
Ok(parsed) => self.set_value(parsed),
Err(_) => self.remove_value(), // Or handle the error differently, e.g., log it
}
}
pub(crate) fn get_value_or_default(&self) -> T {
self.value.clone().unwrap_or_default()
}
}
macro_rules! create_and_export_ValueType {
($t:ty, $i:ident) => {
pub type $i = Value<$t>;
impl $i {
pub(crate) fn get_value_string(&self) -> Cow<'a, str> {
self.value
.as_ref()
.map_or_else(|| Cow::Borrowed(""), |v| Cow::Owned(v.to_string()))
}
}
};
}
create_and_export_ValueType!(bool, BoolValue);
create_and_export_ValueType!(u8, ByteValue);
create_and_export_ValueType!(i8, SignedByteValue);
create_and_export_ValueType!(f64, DoubleValue);
create_and_export_ValueType!(i16, I16Value);
create_and_export_ValueType!(i32, I32Value);
create_and_export_ValueType!(i64, I64Value);
create_and_export_ValueType!(u16, U16Value);
create_and_export_ValueType!(u32, U32Value);
create_and_export_ValueType!(u64, U64Value);
pub type StringValue<'a> = Value<Cow<'a, str>>;
impl<'a> StringValue<'a> {
pub(crate) fn get_value_string(&self) -> Cow<'a, str> {
self.value.clone().unwrap()
}
} Or you could ditch the whole |
This commit introduces the `AttrCollection` type alias, which is a vector of `AttrPair` structs. `AttrPair` holds a key-value pair for XML attributes, where the value is a `Cow<str>`. This allows for more efficient handling of string attributes, avoiding unnecessary allocations. The commit also updates all structs to use `AttrCollection` instead of `Vec<(&str, &str)>` for XML attributes. This change improves code readability and maintainability. Additionally, the `AttrPair` struct implements `From` traits for various types, making it easier to create attribute pairs.
… update writer This commit introduces a `From` implementation for `AttrPair` to convert it into a tuple of `(&str, Cow<str>)`. This allows for seamless conversion of `AttrPair` instances into a format that `quick_xml` can handle. Additionally, the commit updates the `write_start_tag` function in `src/writer/driver.rs` to use the new `From` implementation. This change simplifies the code and makes it more efficient by avoiding manual tuple creation.
Agree. Not recommended to change the current API too much as it'll be too much rewrites for existing code. |
- Changed the `argb` field in the `Color` struct from `Option<String>` to `Option<ARGB8>`. - Added helper methods to convert between hex strings and ARGB8. - Updated methods to handle the new ARGB8 type, including `set_argb`, `get_argb`, and `set_attributes`. - Modified constants to use ARGB8 for predefined colors. - Enhanced the `get_argb_with_theme` method to return ARGB8 values correctly. - Added tests for hex conversion and color setting functionality.
- Modified the `Border` struct to use `Option<Box<Color>>` for the `color` field. - Updated methods to handle the optional color, including `get_color`, `set_color`, and `set_attributes`. - Adjusted the `get_hash_code` method to handle the optional color. - Updated the `write_to` method to handle the optional color. - Improved consistency in color handling within the `Border` struct. This change drastically reduce the amount of heap allocations when cloning Worksheets.
In terms of memory efficiency (@schungx) I did some profiling with valgrind and the memory consumption upon cloning a Worksheet is huge because of the way This is the memory layout after re-implementing the structs, using the I personally use umya to parse a plan from work, iterate over it, and color cells that contain specific names (to create personalized versions for many people at once). Here massif data from Valgrind: Master (v2.2.1)This PR @ 89016c7This PR @ 3c26246We could probably do alot more optimization, but this change was minimal-invasive and did not require changing alot of internal APIs. @schungx |
It is interesting to see what happens if you remove all the No
|
The compiler knows best which functions to inline. If the end user wants optimization, they should enable `lto = true` in their Cargo.toml. See MathNya#250 (comment)
// To pass an integration test where the hex string is "#333". | ||
// https://github.com/MathNya/umya-spreadsheet/pull/113 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this should be valid behaviour? Is #333
really a valid color? I added padding so the previous integration test added in PR #113 still passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mxsrm
Hmmm, this seems to be a mistake for #333333.
I don't know if it's valid there but in CSS it's allowed. #333 is short for #333333 As per: https://www.w3schools.com/css/css_colors_hex.asp
|
This is wrong. The compiler does NOT decide. It only decides within the same crate. Without
I would strongly not recommend that. My own benchmarks have If LTO is turned off, any function without Therefore, the standard for libraries is to be liberal with
Check out this: https://matklad.github.io/2021/07/09/inline-in-rust.html
|
This reverts commit dd0d524.
Support three-digit hex color codes with and without leading hash. This change addresses issues where short hex codes were not properly expanded, ensuring compatibility with CSS standards and improving user experience in color specification. - Expand three-digit hex codes by repeating each digit. - Add integration tests for both `#RGB` and `RGB` formats. - Update existing tests to reflect new behavior. See MathNya#250 (comment)
I implented it in that way. See a496cbe. |
Not my area of expertise either. In daily life I am an interventional neuroradiologist that enjoys coding in his free time. @schungx is more knowledgable in that regard I think. |
- Changed the file paths in `integration_test.rs` to reflect the actual names of the test result files for better clarity and consistency in testing procedures. - Updated paths from 'bbb_new_sheet_value.xlsx' to 'three_digit_hex_color_with_hash.xlsx' and 'three_digit_hex_color_without_hash.xlsx' for the respective integration tests.
- Changed the return type of `get_sheet_collection_mut` from `&mut Vec<Worksheet>` to `&mut [Worksheet]` to improve type safety and flexibility in how the sheets can be manipulated. - This adjustment ensures that users of the API cannot directly resize the vector, which aligns with the intended use of this method for accessing existing sheets only.
Just different experiences that's all. I happen to be maintaining an LOB application running on limitrd hardware, so memory and CPU count a lot. For most applications I'd agree there may not be a big impact, unless we're dealing with massive spreadsheets with tens of thousands of rows, like some users who complained in issues. In my own system, the standard handler times out after 30 seconds of processing and it'll be hell trying to get around it. I can get 2000 rows within that time limit on an LTO release build. So it keeps choking on spreadsheets with 2500 rows. After the memory shrink it runs much faster probably due to better cache locality and it shows. I can now process 5000 rows without exceeding the limit, that makes it work just fine for my case. So yeah, you have to be there to know something is amiss. |
So, I have the first big truck of changes for you. I mainly did formatting (those are separate commits) and clippy changes, improving alot of things efficiency-wise. I also fixed a MRSV of minimum supported Rust version, dictated to us by our dependencies anyway.
I compressed the
helper/crypt.rs
part and pulled away all the static values we currently do not allow to be changed from user side anyway. Makes the code alot more readable.Chore
update zip dependency to version 2.2.2
add categories to Cargo.toml
allow multiple versions of
thiserror
This fixes the following
clippy
warning:thiserror
andthiserror-impl
are transitive dependencies:remove unused
js
feature from Cargo.tomlThis commit removes the unused
js
feature from the[features]
section in
Cargo.toml
. Thejs
feature is no longer utilized inthe project, and its removal helps to maintain a cleaner configuration.
Previously, this feature led to the inclusion of the
js
feature fromthe
getrandom
dependency, which was removed in a prior commit.js
feature entry fromCargo.toml
.set minimum supported Rust version to 1.79.0
This commit sets the minimum supported Rust version in
Cargo.toml
to 1.79.0, as determined by the
cargo-msrv
tool. This version isdictated by the
bitstream-io
dependency, which is a transitivedependency of the
image
crate. Setting the minimum version ensurescompatibility with the required features and functionality.
rust-version = "1.79.0"
toCargo.toml
.move lint configurations to lib.rs
This commit removes the lint configurations from
Cargo.toml
andtransfers them to
lib.rs
using attribute macros. This changecentralizes lint settings within the codebase, making it easier to
manage and understand the linting rules applied to the project.
Cargo.toml
.lib.rs
for better backwards compatibility.sort dependencies in Cargo.toml
This commit sorts the dependencies in
Cargo.toml
for improvedreadability and organization. Sorting helps maintain a consistent
format and makes it easier to locate specific dependencies.
Documentation
New Features
num-traits
dependency for numeric castsDEFAULT_TIMEZONE
constant for consistencyexcel_to_date_time_object
for converting Excel timestamps toNaiveDateTime
convert_date_windows_1900
andconvert_date_mac_1904
for specific calendar systemsconvert_date_crate
to support both Windows 1900 and Mac 1904 systemsRefactor
update is_address function for regex initialization
is_address
function to useOnceLock
for lazy initialization of the regex pattern, improving performance by compiling the regex only once.is_address
to clarify its parameters, return values, and potential panics.is_address
to streamline the code.remove unnecessary qualification
clippy lints
simplify make_buffer function and improve worksheet processing
make_buffer
function for better readability and maintainability.try_for_each
for processing worksheets.remove leading underscores from method names and allow dead_code at crate level
#![allow(dead_code)]
at the crate level to silence dead code warnings.centralize constants into a dedicated module
Extract multiple static variables into the
constants
module, eliminating the need to pass them as parameters in function calls. This refactor simplifies function signatures, centralizes configuration values, and enhances maintainability. Future updates may revert this approach to accommodate support for additional algorithms as the crate evolves.abstract random byte generation with macro and enhance RNG security
generate_random_bytes!
macro to eliminate repetitive code for generating random bytes.rand::thread_rng
withrand::rngs::OsRng
to utilize a cryptographically secure random number generator.expect
messages.This refactoring improves code maintainability, readability, and security by centralizing random byte generation and ensuring the use of secure RNG sources.
replace
is_none_or
with match statementsThis commit refactors the code to replace the use of the
is_none_or
method with match statements. This change improves compatibility with
a lower minimum supported Rust version, since
is_none_or
is onlyavailable in stable since Rust 1.82.0.
is_support
method inTwoCellAnchor
to use match._has_vertical
andhas_horizontal
methods inMergeCells
to use match for start and end row/column checks.
replace hex decoding/encoding with hex-literal
This commit simplifies the test code by replacing custom hex decoding
and encoding functions with the
hex-literal
crate. This changeimproves readability and reduces boilerplate code in the tests.
decode_hex
andencode_hex
helper functions.hex!
macro for byte literals.replace OsRng with thread_rng for random byte generation
This commit refactors the random byte generation in the
gen_random_bytes
function by replacing the use ofOsRng
withthread_rng
. This change simplifies the code and improves performance by using the thread-local random number generator.gen_random_bytes
function to utilizethread_rng()
for filling the byte vector.OsRng
import as it is no longer needed.replace ahash with std::hash::DefaultHasher
This commit removes the
ahash
dependency and replaces its usage with the standard library'sDefaultHasher
. This change simplifies the dependency tree and reduces external dependencies while maintaining the same hashing functionality.ahash
crate fromCargo.toml
.SharedStringItem
struct to usestd::hash::DefaultHasher
instead ofAHasher
fromahash
.rename cell_collection to cells, row_dimensions to rows, and column_dimensions to columns
cell_collection
tocells
for consistency and clarity.row_dimensions
torows
for improved readability.column_dimensions
tocolumns
for better understanding.replace
ThinVec
with rust-langVec
replace
lazy_static
with built-instd::sync::OnceLock
remove lazy_static, add comments
fix typos, clippy lints
reimplement and cleanup
helper/crypt.rs
replace
getrandom
withrand
for random byte generationgetrandom
crate withrand
to simplify and modernize random number generation.gen_random_*
functions to userand::thread_rng().fill()
for generating random bytes of specified lengths.gen_random_bytes
for generating random values.Style
get_sheet_by_name
to fix the syntax error.check_sheet_name
method to use a more concise if-else structure for clarity.Commit Statistics
Commit Details
view details
1f81738
)e60fa40
)6b3c774
)ea1d8a2
)4eab5d7
)505d766
)37d1139
)thiserror
(de9f38f
)8d41bc1
)f8d9659
)72faf4f
)4dcea0e
)26e7021
)98fcf7f
)b1916f7
)92a1af9
)js
feature from Cargo.toml (995f40d
)f76be7a
)is_none_or
with match statements (9511418
)6b2c88e
)517cf8c
)739446e
)b65b035
)12926d6
)cb80120
)ThinVec
with rust-langVec
(714bf07
)lazy_static
with built-instd::sync::OnceLock
(aaaec5f
)f3de769
)71f0572
)helper/crypt.rs
(30e3e9a
)getrandom
withrand
for random byte generation (a317cce
)48b864b
)5e20000
)