Dependency Updates, Documentation Enhancements, New Features, and Code Refactoring #250

mxsrm · 2024-12-22T20:40:02Z

So, I have the first big truck of changes for you. I mainly did formatting (those are separate commits) and clippy changes, improving alot of things efficiency-wise. I also fixed a MRSV of minimum supported Rust version, dictated to us by our dependencies anyway.

I compressed the helper/crypt.rs part and pulled away all the static values we currently do not allow to be changed from user side anyway. Makes the code alot more readable.

Chore

update zip dependency to version 2.2.2
add categories to Cargo.toml

allow multiple versions of thiserror
This fixes the following clippy warning:

multiple versions for dependency thiserror: 1.0.69, 2.0.8
for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#multiple_crate_versions

thiserror and thiserror-impl are transitive dependencies:

cargo tree --invert --package thiserror-impl@1.0.69

thiserror-impl v1.0.69 (proc-macro)
└── thiserror v1.0.69
    ├── html_parser v0.7.0
    │   └── umya-spreadsheet v2.2.0 (/home/mxsrm/code/umya-spreadsheet)
    └── rav1e v0.7.1
        └── ravif v0.11.11
            └── image v0.25.5
                └── umya-spreadsheet v2.2.0 (/home/mxsrm/code/umya-spreadsheet)

cargo tree --invert --package thiserror-impl@2.0.8

thiserror-impl v2.0.8 (proc-macro)
└── thiserror v2.0.8
    ├── pest v2.7.15
    │   ├── html_parser v0.7.0
    │   │   └── umya-spreadsheet v2.2.0 (/home/mxsrm/code/umya-spreadsheet)
    │   ├── pest_derive v2.7.15 (proc-macro)
    │   │   └── html_parser v0.7.0 (*)
    │   ├── pest_generator v2.7.15
    │   │   └── pest_derive v2.7.15 (proc-macro) (*)
    │   └── pest_meta v2.7.15
    │       └── pest_generator v2.7.15 (*)
    └── zip v2.2.2
        └── umya-spreadsheet v2.2.0 (/home/mxsrm/code/umya-spreadsheet)

remove unused js feature from Cargo.toml
This commit removes the unused js feature from the [features]
section in Cargo.toml. The js feature is no longer utilized in
the project, and its removal helps to maintain a cleaner configuration.
Previously, this feature led to the inclusion of the js feature from
the getrandom dependency, which was removed in a prior commit.
- Deleted the js feature entry from Cargo.toml.
set minimum supported Rust version to 1.79.0
This commit sets the minimum supported Rust version in Cargo.toml
to 1.79.0, as determined by the cargo-msrv tool. This version is
dictated by the bitstream-io dependency, which is a transitive
dependency of the image crate. Setting the minimum version ensures
compatibility with the required features and functionality.
- Added rust-version = "1.79.0" to Cargo.toml.
move lint configurations to lib.rs
This commit removes the lint configurations from Cargo.toml and
transfers them to lib.rs using attribute macros. This change
centralizes lint settings within the codebase, making it easier to
manage and understand the linting rules applied to the project.
- Removed lint settings from Cargo.toml.
- Added lint attributes in lib.rs for better backwards compatibility.
sort dependencies in Cargo.toml
This commit sorts the dependencies in Cargo.toml for improved
readability and organization. Sorting helps maintain a consistent
format and makes it easier to locate specific dependencies.

Documentation

enhance documentation for encryption-related functions
- Add detailed parameter descriptions and return values for all public functions
- Improve documentation formatting and clarity for encryption methods
- Add specific documentation for AES-256-CBC implementation details
- Include clippy allow attributes for possible truncation warnings
- Document HMAC and IV generation processes
- Add comprehensive documentation for password-to-key conversion logic

New Features

enhance Excel timestamp conversion with additional calendar systems
- add num-traits dependency for numeric casts
- introduce DEFAULT_TIMEZONE constant for consistency
- implement excel_to_date_time_object for converting Excel timestamps to NaiveDateTime
- add convert_date_windows_1900 and convert_date_mac_1904 for specific calendar systems
- enhance convert_date_crate to support both Windows 1900 and Mac 1904 systems
- include detailed documentation for new functions with examples and panic behavior

Refactor

update is_address function for regex initialization
- Refactored the is_address function to use OnceLock for lazy initialization of the regex pattern, improving performance by compiling the regex only once.
- Enhanced documentation for is_address to clarify its parameters, return values, and potential panics.
- Removed the previous implementation of is_address to streamline the code.
- Added examples in the documentation to demonstrate usage.
remove unnecessary qualification
clippy lints
simplify make_buffer function and improve worksheet processing
- Reorganized the make_buffer function for better readability and maintainability.
- Replaced manual iteration with try_for_each for processing worksheets.
- Consolidated worksheet processing logic to handle both deserialized and raw data more cleanly.
- Improved comments for clarity on each processing step.
- Streamlined the addition of various worksheet components (charts, drawings, comments, etc.).
- Ensured proper handling of printer settings and other worksheet relationships.
remove leading underscores from method names and allow dead_code at crate level
- Rename private methods across multiple modules by removing leading underscores to adhere to Rust naming conventions.
- Additionally, add #![allow(dead_code)] at the crate level to silence dead code warnings.
centralize constants into a dedicated module
Extract multiple static variables into the constants module, eliminating the need to pass them as parameters in function calls. This refactor simplifies function signatures, centralizes configuration values, and enhances maintainability. Future updates may revert this approach to accommodate support for additional algorithms as the crate evolves.
abstract random byte generation with macro and enhance RNG security
- Introduced the generate_random_bytes! macro to eliminate repetitive code for generating random bytes.
- Replaced rand::thread_rng with rand::rngs::OsRng to utilize a cryptographically secure random number generator.
- Updated cryptographic key constants to use fixed-size arrays instead of slices for improved type safety.
- Enhanced error handling in random byte generation by adding expect messages.
- Removed unused imports and commented-out code to clean up the codebase.
- Improved consistency in hash function implementations and key management.
This refactoring improves code maintainability, readability, and security by centralizing random byte generation and ensuring the use of secure RNG sources.
replace is_none_or with match statements
This commit refactors the code to replace the use of the is_none_or
method with match statements. This change improves compatibility with
a lower minimum supported Rust version, since is_none_or is only
available in stable since Rust 1.82.0.
- Updated is_support method in TwoCellAnchor to use match.
- Refactored _has_vertical and has_horizontal methods in MergeCells
  to use match for start and end row/column checks.
replace hex decoding/encoding with hex-literal
This commit simplifies the test code by replacing custom hex decoding
and encoding functions with the hex-literal crate. This change
improves readability and reduces boilerplate code in the tests.
- Removed decode_hex and encode_hex helper functions.
- Updated test cases to use the hex! macro for byte literals.
replace OsRng with thread_rng for random byte generation
This commit refactors the random byte generation in the gen_random_bytes function by replacing the use of OsRng with thread_rng. This change simplifies the code and improves performance by using the thread-local random number generator.
- Updated the gen_random_bytes function to utilize thread_rng() for filling the byte vector.
- Removed the OsRng import as it is no longer needed.
replace ahash with std::hash::DefaultHasher
This commit removes the ahash dependency and replaces its usage with the standard library's DefaultHasher. This change simplifies the dependency tree and reduces external dependencies while maintaining the same hashing functionality.
- Removed the ahash crate from Cargo.toml.
- Updated the SharedStringItem struct to use std::hash::DefaultHasher instead of AHasher from ahash.
rename cell_collection to cells, row_dimensions to rows, and column_dimensions to columns
- Renamed cell_collection to cells for consistency and clarity.
- Renamed row_dimensions to rows for improved readability.
- Renamed column_dimensions to columns for better understanding.
- Updated all method calls and references accordingly.
replace ThinVec with rust-lang Vec
replace lazy_static with built-in std::sync::OnceLock
remove lazy_static, add comments
fix typos, clippy lints
reimplement and cleanup helper/crypt.rs
replace getrandom with rand for random byte generation
- Replaced getrandom crate with rand to simplify and modernize random number generation.
- Updated gen_random_* functions to use rand::thread_rng().fill() for generating random bytes of specified lengths.
- Adjusted encryption functions to use gen_random_bytes for generating random values.

Style

fix many clippy lints
fix formatting and improve check_sheet_name logic
- Corrected the documentation comment for get_sheet_by_name to fix the syntax error.
- Refactored the check_sheet_name method to use a more concise if-else structure for clarity.
- Improved readability of the code while maintaining the same functionality.
add modern rustfmt.toml and apply it
enable more lints

Commit Statistics

33 commits contributed to the release.
11 days passed between releases.
31 commits were understood as conventional.
1 unique issue was worked on: #249

Commit Details

view details

#249
- Optimize code (1f81738)
Uncategorized
- Fix many clippy lints (e60fa40)
- Fix formatting and improve check_sheet_name logic (6b3c774)
- Update zip dependency to version 2.2.2 (ea1d8a2)
- Update is_address function for regex initialization (4eab5d7)
- Remove unnecessary qualification (505d766)
- Add categories to Cargo.toml (37d1139)
- Allow multiple versions of thiserror (de9f38f)
- Clippy lints (8d41bc1)
- Enhance documentation for encryption-related functions (f8d9659)
- Simplify make_buffer function and improve worksheet processing (72faf4f)
- Remove leading underscores from method names and allow dead_code at crate level (4dcea0e)
- Centralize constants into a dedicated module (26e7021)
- Add modern rustfmt.toml and apply it (98fcf7f)
- Abstract random byte generation with macro and enhance RNG security (b1916f7)
- Enhance Excel timestamp conversion with additional calendar systems (92a1af9)
- Remove unused js feature from Cargo.toml (995f40d)
- Set minimum supported Rust version to 1.79.0 (f76be7a)
- Replace is_none_or with match statements (9511418)
- Move lint configurations to lib.rs (6b2c88e)
- Sort dependencies in Cargo.toml (517cf8c)
- Replace hex decoding/encoding with hex-literal (739446e)
- Replace OsRng with thread_rng for random byte generation (b65b035)
- Replace ahash with std::hash::DefaultHasher (12926d6)
- Rename cell_collection to cells, row_dimensions to rows, and column_dimensions to columns (cb80120)
- Replace ThinVec with rust-lang Vec (714bf07)
- Replace lazy_static with built-in std::sync::OnceLock (aaaec5f)
- Remove lazy_static, add comments (f3de769)
- Fix typos, clippy lints (71f0572)
- Reimplement and cleanup helper/crypt.rs (30e3e9a)
- Replace getrandom with rand for random byte generation (a317cce)
- Enable more lints (48b864b)
- Feature Request - Set Worksheet Gridlines #248 (5e20000)

…ration - Replaced `getrandom` crate with `rand` to simplify and modernize random number generation. - Updated `gen_random_*` functions to use `rand::thread_rng().fill()` for generating random bytes of specified lengths. - Adjusted encryption functions to use `gen_random_bytes` for generating random values.

…to rows, and column_dimensions to columns - Renamed `cell_collection` to `cells` for consistency and clarity. - Renamed `row_dimensions` to `rows` for improved readability. - Renamed `column_dimensions` to `columns` for better understanding. - Updated all method calls and references accordingly.

This commit removes the `ahash` dependency and replaces its usage with the standard library's `DefaultHasher`. This change simplifies the dependency tree and reduces external dependencies while maintaining the same hashing functionality. - Removed the `ahash` crate from `Cargo.toml`. - Updated the `SharedStringItem` struct to use `std::hash::DefaultHasher` instead of `AHasher` from `ahash`.

…tion This commit refactors the random byte generation in the `gen_random_bytes` function by replacing the use of `OsRng` with `thread_rng`. This change simplifies the code and improves performance by using the thread-local random number generator. - Updated the `gen_random_bytes` function to utilize `thread_rng()` for filling the byte vector. - Removed the `OsRng` import as it is no longer needed.

This commit simplifies the test code by replacing custom hex decoding and encoding functions with the `hex-literal` crate. This change improves readability and reduces boilerplate code in the tests. - Removed `decode_hex` and `encode_hex` helper functions. - Updated test cases to use the `hex!` macro for byte literals.

This commit sorts the dependencies in `Cargo.toml` for improved readability and organization. Sorting helps maintain a consistent format and makes it easier to locate specific dependencies.

This commit removes the lint configurations from `Cargo.toml` and transfers them to `lib.rs` using attribute macros. This change centralizes lint settings within the codebase, making it easier to manage and understand the linting rules applied to the project. - Removed lint settings from `Cargo.toml`. - Added lint attributes in `lib.rs` for better backwards compatibility.

This commit refactors the code to replace the use of the `is_none_or` method with match statements. This change improves compatibility with a lower minimum supported Rust version, since `is_none_or` is only available in stable since Rust 1.82.0. - Updated `is_support` method in `TwoCellAnchor` to use match. - Refactored `_has_vertical` and `has_horizontal` methods in `MergeCells` to use match for start and end row/column checks.

This commit sets the minimum supported Rust version in `Cargo.toml` to 1.79.0, as determined by the `cargo-msrv` tool. This version is dictated by the `bitstream-io` dependency, which is a transitive dependency of the `image` crate. Setting the minimum version ensures compatibility with the required features and functionality. - Added `rust-version = "1.79.0"` to `Cargo.toml`.

This commit removes the unused `js` feature from the `[features]` section in `Cargo.toml`. The `js` feature is no longer utilized in the project, and its removal helps to maintain a cleaner configuration. Previously, this feature led to the inclusion of the `js` feature from the `getrandom` dependency, which was removed in a prior commit. - Deleted the `js` feature entry from `Cargo.toml`.

… calendar systems * add `num-traits` dependency for numeric casts * introduce `DEFAULT_TIMEZONE` constant for consistency * implement `excel_to_date_time_object` for converting Excel timestamps to `NaiveDateTime` * add `convert_date_windows_1900` and `convert_date_mac_1904` for specific calendar systems * enhance `convert_date_crate` to support both Windows 1900 and Mac 1904 systems * include detailed documentation for new functions with examples and panic behavior

…d enhance RNG security - Introduced the `generate_random_bytes!` macro to eliminate repetitive code for generating random bytes. - Replaced `rand::thread_rng` with `rand::rngs::OsRng` to utilize a cryptographically secure random number generator. - Updated cryptographic key constants to use fixed-size arrays instead of slices for improved type safety. - Enhanced error handling in random byte generation by adding `expect` messages. - Removed unused imports and commented-out code to clean up the codebase. - Improved consistency in hash function implementations and key management. This refactoring improves code maintainability, readability, and security by centralizing random byte generation and ensuring the use of secure RNG sources.

Extract multiple static variables into the `constants` module, eliminating the need to pass them as parameters in function calls. This refactor simplifies function signatures, centralizes configuration values, and enhances maintainability. Future updates may revert this approach to accommodate support for additional algorithms as the crate evolves.

…_code at crate level - Rename private methods across multiple modules by removing leading underscores to adhere to Rust naming conventions. - Additionally, add `#![allow(dead_code)]` at the crate level to silence dead code warnings.

…sheet processing - Reorganized the `make_buffer` function for better readability and maintainability. - Replaced manual iteration with `try_for_each` for processing worksheets. - Consolidated worksheet processing logic to handle both deserialized and raw data more cleanly. - Improved comments for clarity on each processing step. - Streamlined the addition of various worksheet components (charts, drawings, comments, etc.). - Ensured proper handling of printer settings and other worksheet relationships.

…tions * Add detailed parameter descriptions and return values for all public functions * Improve documentation formatting and clarity for encryption methods * Add specific documentation for AES-256-CBC implementation details * Include clippy allow attributes for possible truncation warnings * Document HMAC and IV generation processes * Add comprehensive documentation for password-to-key conversion logic

This fixes the following `clippy` warning: > multiple versions for dependency `thiserror`: 1.0.69, 2.0.8 > for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#multiple_crate_versions `thiserror` and `thiserror-impl` are transitive dependencies: ```sh cargo tree --invert --package thiserror-impl@1.0.69 thiserror-impl v1.0.69 (proc-macro) └── thiserror v1.0.69 ├── html_parser v0.7.0 │ └── umya-spreadsheet v2.2.0 (/home/mxsrm/code/umya-spreadsheet) └── rav1e v0.7.1 └── ravif v0.11.11 └── image v0.25.5 └── umya-spreadsheet v2.2.0 (/home/mxsrm/code/umya-spreadsheet) cargo tree --invert --package thiserror-impl@2.0.8 thiserror-impl v2.0.8 (proc-macro) └── thiserror v2.0.8 ├── pest v2.7.15 │ ├── html_parser v0.7.0 │ │ └── umya-spreadsheet v2.2.0 (/home/mxsrm/code/umya-spreadsheet) │ ├── pest_derive v2.7.15 (proc-macro) │ │ └── html_parser v0.7.0 (*) │ ├── pest_generator v2.7.15 │ │ └── pest_derive v2.7.15 (proc-macro) (*) │ └── pest_meta v2.7.15 │ └── pest_generator v2.7.15 (*) └── zip v2.2.2 └── umya-spreadsheet v2.2.0 (/home/mxsrm/code/umya-spreadsheet) ```

…lization - Refactored the `is_address` function to use `OnceLock` for lazy initialization of the regex pattern, improving performance by compiling the regex only once. - Enhanced documentation for `is_address` to clarify its parameters, return values, and potential panics. - Removed the previous implementation of `is_address` to streamline the code. - Added examples in the documentation to demonstrate usage.

- Corrected the documentation comment for `get_sheet_by_name` to fix the syntax error. - Refactored the `check_sheet_name` method to use a more concise if-else structure for clarity. - Improved readability of the code while maintaining the same functionality.

schungx · 2024-12-26T14:22:55Z

I'm happy to reimplement it if @MathNya thinks it useful. Or implement smallvec instead.

SmallVec solves a different problem: avoiding allocations. It does not save memory. A SmallVec can be larger than a Vec depending on your config.

You'd need to bemchmark to check if using SmallVec is worth it as it stores small number of items inline to avoid allocations, at the expense of larger memory footprint.

- Added `assert_sha256` macro for asserting SHA-256 hash equivalence. - Added `print_sha256_hex` mmacro for easy printing of SHA-256 hashes. - Move test helper macros to `helper/utils`. - Updated the `crypt` module tests to use the new `assert_sha256` macro. - Adjusted linting configurations in `lib.rs` to allow unused macros.

- Replaces the use of `once_cell::sync::OnceCell` with `std::sync::OnceLock` in `compile_regex` macro to improve compatibility and adhere to Rust's standard library practices. - Removes now unused dependency on `once_cell`. This change ensures that the codebase uses the latest and most idiomatic way to handle lazy initialization of static variables, enhancing both performance and safety.

schungx · 2024-12-27T00:57:35Z

A bit more details on the two different crates.

SmallVec I suggest we use on transitory purposes, for example preparing a list for processing inside a function. The list is destroyed after use and never goes outside of the function. Using SmallVec this way keeps everything on the stack for direct access and cache locality, and avoids all allocations if the number of items is often small. I have benchmarked a lot of such usages and it usually yields 10-15% speed improvements alone.

ThinVec should be used where every byte counts. If using a Vec does not affect a type's size (e.g. when another variant is larger) or the Vec is usually empty (where you can use Option<Box<Vec>>), then there is no need. I tend to use it in cases where having a Vec automatically makes a LOT of types larger, for example in a fundamental type which is embedded everywhere inside a lot of other data structures.

My actual suggestion is to look into the API itself and remove any mutable direct access to Vec's. For example, all those direct get_xxx_mut(&mut self) -> &mut Vec<T> should be replaced by some indirect write API. In that case, if there is no more need for the outside program to know what type is inside, you can then change everything to Box<[T]> which is two words and saves 8 bytes from a Vec since its length is immutable.

Update the library documentation to be more informative and user-friendly. The changes include: - Improved descriptions and example code snippets for common use cases like reading/writing files, adding sheets, changing cell values, styling and inserting/removing rows/ columns. - Improved wording to enhance clarity and correct minor grammatical issues. - Added a more detailed description of the lazy reader. - Improved code example explanations. - Added doc comment links to relevant modules/structs (e.g. [Style](crate::structs::style)).

This commit introduces a new macro `pub_mod_use!` to streamline module declarations and exports in `structs.rs`. The macro replaces individual module declarations with a single macro call, reducing code duplication and improving readability: - Modules are now declared and exported using the `pub_mod_use!` macro. - Each module is made private within its own scope and then re-exported with the desired visibility. - The change reduces the number of lines significantly, from 494 to 170, enhancing maintainability. This refactoring will make it easier to manage the visibility of modules and their contents in future updates.

mxsrm · 2024-12-28T17:10:37Z

If you want to substantially change the API you would end in a near-total re-write of the whole crate. If you do that, you could implement many things more efficiently, e.g. you could abstract the whole Unit types into a single generic one using Cow<'_, str>, e.g.:

use std::{
    borrow::Cow,
    fmt::Display,
    str::FromStr,
};

#[derive(Clone, Default, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub struct Value<T> {
    value: Option<T>,
}

impl<T> Value<T> {
    pub(crate) fn get_value(&self) -> Option<&T> {
        self.value.as_ref()
    }

    pub(crate) fn set_value(&mut self, value: T) -> &mut Self {
        self.value = Some(value);
        self
    }

    pub(crate) fn remove_value(&mut self) -> &mut Self {
        self.value = None;
        self
    }

    pub(crate) fn has_value(&self) -> bool {
        self.value.is_some()
    }
}

impl<T: Display + Default + Clone + FromStr> Value<T> {
    pub(crate) fn set_value_string<S: Into<String>>(&mut self, value: S) -> &mut Self {
        match value.into().parse::<T>() {
            Ok(parsed) => self.set_value(parsed),
            Err(_) => self.remove_value(), // Or handle the error differently, e.g., log it
        }
    }

    pub(crate) fn get_value_or_default(&self) -> T {
        self.value.clone().unwrap_or_default()
    }
}

macro_rules! create_and_export_ValueType {
    ($t:ty, $i:ident) => {
        pub type $i = Value<$t>;

        impl $i {
            pub(crate) fn get_value_string(&self) -> Cow<'a, str> {
                self.value
                    .as_ref()
                    .map_or_else(|| Cow::Borrowed(""), |v| Cow::Owned(v.to_string()))
            }
        }
    };
}

create_and_export_ValueType!(bool, BoolValue);
create_and_export_ValueType!(u8, ByteValue);
create_and_export_ValueType!(i8, SignedByteValue);
create_and_export_ValueType!(f64, DoubleValue);
create_and_export_ValueType!(i16, I16Value);
create_and_export_ValueType!(i32, I32Value);
create_and_export_ValueType!(i64, I64Value);
create_and_export_ValueType!(u16, U16Value);
create_and_export_ValueType!(u32, U32Value);
create_and_export_ValueType!(u64, U64Value);

pub type StringValue<'a> = Value<Cow<'a, str>>;
impl<'a> StringValue<'a> {
    pub(crate) fn get_value_string(&self) -> Cow<'a, str> {
        self.value.clone().unwrap()
    }
}

Or you could ditch the whole Value<T> idea and work using Traits that you implement directly onto Option<T>. I started on that adventure once or twice, but you just need to change too much to do it incrementally. Many functions want &str from the get_value_string method, others expect String etc.

This commit introduces the `AttrCollection` type alias, which is a vector of `AttrPair` structs. `AttrPair` holds a key-value pair for XML attributes, where the value is a `Cow<str>`. This allows for more efficient handling of string attributes, avoiding unnecessary allocations. The commit also updates all structs to use `AttrCollection` instead of `Vec<(&str, &str)>` for XML attributes. This change improves code readability and maintainability. Additionally, the `AttrPair` struct implements `From` traits for various types, making it easier to create attribute pairs.

… update writer This commit introduces a `From` implementation for `AttrPair` to convert it into a tuple of `(&str, Cow<str>)`. This allows for seamless conversion of `AttrPair` instances into a format that `quick_xml` can handle. Additionally, the commit updates the `write_start_tag` function in `src/writer/driver.rs` to use the new `From` implementation. This change simplifies the code and makes it more efficient by avoiding manual tuple creation.

schungx · 2024-12-29T00:55:13Z

Agree. Not recommended to change the current API too much as it'll be too much rewrites for existing code.

- Changed the `argb` field in the `Color` struct from `Option<String>` to `Option<ARGB8>`. - Added helper methods to convert between hex strings and ARGB8. - Updated methods to handle the new ARGB8 type, including `set_argb`, `get_argb`, and `set_attributes`. - Modified constants to use ARGB8 for predefined colors. - Enhanced the `get_argb_with_theme` method to return ARGB8 values correctly. - Added tests for hex conversion and color setting functionality.

- Modified the `Border` struct to use `Option<Box<Color>>` for the `color` field. - Updated methods to handle the optional color, including `get_color`, `set_color`, and `set_attributes`. - Adjusted the `get_hash_code` method to handle the optional color. - Updated the `write_to` method to handle the optional color. - Improved consistency in color handling within the `Border` struct. This change drastically reduce the amount of heap allocations when cloning Worksheets.

mxsrm · 2024-12-29T17:32:08Z

In terms of memory efficiency (@schungx) I did some profiling with valgrind and the memory consumption upon cloning a Worksheet is huge because of the way Color, Border and Borders are represented. This is v2.2.1:

This is the memory layout after re-implementing the structs, using the rgb crate as basis to represent colors:

I personally use umya to parse a plan from work, iterate over it, and color cells that contain specific names (to create personalized versions for many people at once).

Here massif data from Valgrind:

Master (v2.2.1)

This PR @ `89016c7`

This PR @ 3c26246

We could probably do alot more optimization, but this change was minimal-invasive and did not require changing alot of internal APIs.

@schungx
If you look at the first and last screenshot you see that removing thin_vec and using Vec did not matter much in Terms of Heap allocations. 122,6MB+23,9MB and some change @ thinvec versus 122.6MB and some change @ std_vec.

mxsrm · 2024-12-29T17:56:50Z

It is interesting to see what happens if you remove all the #[inline] instructions and let the compiler decide for himself (otherwise same code as this PR @ 3c26246).

No `#[inline]`, lto = false

No `#[inline]`, lto = true

So letting the compiler decide for itself and using lto if possible in the final application is better than force-inlining that many functions for speed and memory-allocations. I will push a commit to this PR to remove all #[inline] instructions.

The compiler knows best which functions to inline. If the end user wants optimization, they should enable `lto = true` in their Cargo.toml. See MathNya#250 (comment)

mxsrm · 2024-12-29T18:02:45Z

src/structs/color.rs

+            // To pass an integration test where the hex string is "#333".
+            // https://github.com/MathNya/umya-spreadsheet/pull/113


I am not sure if this should be valid behaviour? Is #333 really a valid color? I added padding so the previous integration test added in PR #113 still passes.

This is the code.

@mxsrm
Hmmm, this seems to be a mistake for #333333.

MathNya · 2024-12-31T08:45:04Z

@schungx
@mxsrm
Thank you for your support through the end of the year.
I am not very knowledgeable about performance enhancement and have left it to you.
I would be grateful if you could continue to support us if you would like.
I look forward to working with you again next year.

c-git · 2024-12-31T08:58:16Z

I don't know if it's valid there but in CSS it's allowed. #333 is short for #333333

As per: https://www.w3schools.com/css/css_colors_hex.asp

3 Digit HEX Value

Sometimes you will see a 3-digit hex code in the CSS source.
The 3-digit hex code is a shorthand for some 6-digit hex codes.
The 3-digit hex code has the following form:
#rgb
Where r, g, and b represent the red, green, and blue components with values between 0 and f.

schungx · 2024-12-31T13:51:16Z

It is interesting to see what happens if you remove all the #[inline] instructions and let the compiler decide for himself

This is wrong. The compiler does NOT decide. It only decides within the same crate. Without #[inline], the compiler will NEVER inline across crate boundaries. You have to turn on LTO for those to inline.

I will push a commit to this PR to remove all #[inline] instructions.

I would strongly not recommend that. My own benchmarks have #[inline] substantially speeding up the linking phase when LTO is turned on, otherwise LLVM needs to inline tons of simple accessor functions where they would have been inlined by Rust much earlier on.

If LTO is turned off, any function without #[inline] will never be inlined, and there are a LOT of simple accessor functions. It makes things much slower. Try timing a tight loop iterating over thousands of cells.

Therefore, the standard for libraries is to be liberal with #[inline]. Look at the standard library -- it is practically littered with #[inline].

#[inline] is never for memory, it is intended for speed. Without #[inline], my release build links in 4 hours with LTO, while with #[inline] it is less than 1 hour. Without LTO, the program runs much more slowly. It is a no-win situation.

Check out this: https://matklad.github.io/2021/07/09/inline-in-rust.html

Fourth, when building libraries, proactively add #[inline] to small non-generic functions. Pay special attention to impls: Deref, AsRef and the like often benefit from inlining. A library can’t anticipate all usages upfront, it makes sense to not prematurely pessimize future users. Note that #[inline] is not transitive: if a trivial public function calls a trivial private function, you need to #[inline] both. See this benchmark for details.

This reverts commit dd0d524.

Support three-digit hex color codes with and without leading hash. This change addresses issues where short hex codes were not properly expanded, ensuring compatibility with CSS standards and improving user experience in color specification. - Expand three-digit hex codes by repeating each digit. - Add integration tests for both `#RGB` and `RGB` formats. - Update existing tests to reflect new behavior. See MathNya#250 (comment)

mxsrm · 2024-12-31T15:29:07Z

I don't know if it's valid there but in CSS it's allowed. #333 is short for #333333

As per: https://www.w3schools.com/css/css_colors_hex.asp

3 Digit HEX Value

Sometimes you will see a 3-digit hex code in the CSS source.
The 3-digit hex code is a shorthand for some 6-digit hex codes.
The 3-digit hex code has the following form:
#rgb
Where r, g, and b represent the red, green, and blue components with values between 0 and f.

I implented it in that way. See a496cbe.

mxsrm · 2024-12-31T15:32:56Z

@schungx @mxsrm Thank you for your support through the end of the year. I am not very knowledgeable about performance enhancement and have left it to you.

Not my area of expertise either. In daily life I am an interventional neuroradiologist that enjoys coding in his free time. @schungx is more knowledgable in that regard I think.

- Changed the file paths in `integration_test.rs` to reflect the actual names of the test result files for better clarity and consistency in testing procedures. - Updated paths from 'bbb_new_sheet_value.xlsx' to 'three_digit_hex_color_with_hash.xlsx' and 'three_digit_hex_color_without_hash.xlsx' for the respective integration tests.

- Changed the return type of `get_sheet_collection_mut` from `&mut Vec<Worksheet>` to `&mut [Worksheet]` to improve type safety and flexibility in how the sheets can be manipulated. - This adjustment ensures that users of the API cannot directly resize the vector, which aligns with the intended use of this method for accessing existing sheets only.

schungx · 2025-01-01T02:25:34Z

Not my area of expertise either. In daily life I am an interventional neuroradiologist that enjoys coding in his free time. @schungx is more knowledgable in that regard I think.

Just different experiences that's all.

I happen to be maintaining an LOB application running on limitrd hardware, so memory and CPU count a lot. For most applications I'd agree there may not be a big impact, unless we're dealing with massive spreadsheets with tens of thousands of rows, like some users who complained in issues.

In my own system, the standard handler times out after 30 seconds of processing and it'll be hell trying to get around it. I can get 2000 rows within that time limit on an LTO release build. So it keeps choking on spreadsheets with 2500 rows.

After the memory shrink it runs much faster probably due to better cache locality and it shows. I can now process 5000 rows without exceeding the limit, that makes it work just fine for my case.

So yeah, you have to be there to know something is amiss.

mxsrm added 30 commits December 21, 2024 17:36

style: enable more lints

48b864b

refactor: reimplement and cleanup helper/crypt.rs

30e3e9a

refactor: fix typos, clippy lints

71f0572

refactor(helper/coordinate.rs): remove lazy_static, add comments

f3de769

refactor: replace lazy_static with built-in std::sync::OnceLock

aaaec5f

refactor: replace ThinVec with rust-lang Vec

714bf07

chore(deps): sort dependencies in Cargo.toml

517cf8c

This commit sorts the dependencies in `Cargo.toml` for improved readability and organization. Sorting helps maintain a consistent format and makes it easier to locate specific dependencies.

style: add modern rustfmt.toml and apply it

98fcf7f

refactor: clippy lints

8d41bc1

chore(clippy): add categories to Cargo.toml

37d1139

refactor(writer/xlsx): remove unnecessary qualification

505d766

chore(deps): update zip dependency to version 2.2.2

ea1d8a2

mxsrm added 2 commits December 26, 2024 22:20

mxsrm added 2 commits December 27, 2024 18:50

mxsrm added 2 commits December 29, 2024 00:07

mxsrm added 2 commits December 29, 2024 18:20

perf: remove manual inline markers

dd0d524

The compiler knows best which functions to inline. If the end user wants optimization, they should enable `lto = true` in their Cargo.toml. See MathNya#250 (comment)

mxsrm commented Dec 29, 2024

View reviewed changes

refactor: use compile_time HashMap phf::Map

d9f9d7a

mxsrm added 2 commits December 31, 2024 16:14

Revert "perf: remove manual inline markers"

694c1d6

This reverts commit dd0d524.

refactor(structs/attributes): inline short functions

80d99ec

mxsrm added 2 commits December 31, 2024 18:49

build: add justfile

8b8945c

mxsrm force-pushed the nextgen branch from 0e551f6 to 8b8945c Compare December 31, 2024 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dependency Updates, Documentation Enhancements, New Features, and Code Refactoring #250

Dependency Updates, Documentation Enhancements, New Features, and Code Refactoring #250

mxsrm commented Dec 22, 2024

schungx commented Dec 26, 2024

schungx commented Dec 27, 2024 •

edited

Loading

mxsrm commented Dec 28, 2024 •

edited

Loading

schungx commented Dec 29, 2024

mxsrm commented Dec 29, 2024 •

edited

Loading

mxsrm commented Dec 29, 2024 •

edited

Loading

mxsrm Dec 29, 2024

mxsrm Dec 29, 2024

MathNya Dec 31, 2024

MathNya commented Dec 31, 2024

c-git commented Dec 31, 2024

3 Digit HEX Value

schungx commented Dec 31, 2024 •

edited

Loading

mxsrm commented Dec 31, 2024

3 Digit HEX Value

mxsrm commented Dec 31, 2024

schungx commented Jan 1, 2025 •

edited

Loading

		// To pass an integration test where the hex string is "#333".
		// https://github.com/MathNya/umya-spreadsheet/pull/113

Dependency Updates, Documentation Enhancements, New Features, and Code Refactoring #250

Are you sure you want to change the base?

Dependency Updates, Documentation Enhancements, New Features, and Code Refactoring #250

Conversation

mxsrm commented Dec 22, 2024

Chore

Documentation

New Features

Refactor

Style

Commit Statistics

Commit Details

schungx commented Dec 26, 2024

schungx commented Dec 27, 2024 • edited Loading

mxsrm commented Dec 28, 2024 • edited Loading

schungx commented Dec 29, 2024

mxsrm commented Dec 29, 2024 • edited Loading

Master (v2.2.1)

This PR @ 89016c7

This PR @ 3c26246

mxsrm commented Dec 29, 2024 • edited Loading

No #[inline], lto = false

No #[inline], lto = true

mxsrm Dec 29, 2024

Choose a reason for hiding this comment

mxsrm Dec 29, 2024

Choose a reason for hiding this comment

MathNya Dec 31, 2024

Choose a reason for hiding this comment

MathNya commented Dec 31, 2024

c-git commented Dec 31, 2024

3 Digit HEX Value

schungx commented Dec 31, 2024 • edited Loading

mxsrm commented Dec 31, 2024

3 Digit HEX Value

mxsrm commented Dec 31, 2024

schungx commented Jan 1, 2025 • edited Loading

schungx commented Dec 27, 2024 •

edited

Loading

mxsrm commented Dec 28, 2024 •

edited

Loading

mxsrm commented Dec 29, 2024 •

edited

Loading

This PR @ `89016c7`

mxsrm commented Dec 29, 2024 •

edited

Loading

No `#[inline]`, lto = false

No `#[inline]`, lto = true

schungx commented Dec 31, 2024 •

edited

Loading

schungx commented Jan 1, 2025 •

edited

Loading