Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev: optimize memory by using 32-byte words #51

Open
enitrat opened this issue Aug 4, 2023 · 2 comments
Open

dev: optimize memory by using 32-byte words #51

enitrat opened this issue Aug 4, 2023 · 2 comments
Labels
enhancement New feature or request low-priority Low Priority Issue - Has been deprioritised

Comments

@enitrat
Copy link
Contributor

enitrat commented Aug 4, 2023

Description

Memory is one of the core elements of the EVM. Having it as optimized as possible would reduce the overall costs of all transactions ran through the EVM.

Here is my proposal: We can refactor the memory to be 32-bytes words based instead of the current 16-bytes words based implementation.

I started to implement the new version to run some benchmarks. Below are the related code snippets.

16-bytes words memory (current implementation)

    fn store(ref self: Memory, element: u256, offset: usize) {
        let x = testing::get_available_gas();
        gas::withdraw_gas().unwrap();
        let new_min_bytes_len = helpers::ceil_bytes_len_to_next_32_bytes_word(offset + 32);

        let new_bytes_len = if new_min_bytes_len > self.bytes_len {
            new_min_bytes_len
        } else {
            self.bytes_len
        };
        self.bytes_len = new_bytes_len;

        // Check alignment of offset to bytes16 chunks
        let (chunk_index, offset_in_chunk) = u32_safe_divmod(offset, u32_as_non_zero(16));

        if offset_in_chunk == 0 {
            // Offset is aligned. This is the simplest and most efficient case,
            // so we optimize for it.
            self.items.store_u256(element, chunk_index);
            return ();
        }

        // Offset is misaligned.
        // |   W0   |   W1   |   w2   |
        //     |  EL_H  |  EL_L  |
        // ^---^
        //   |-- mask = 256 ** offset_in_chunk

        let mask: u256 = helpers::pow256_rev(offset_in_chunk);
        let mask_c: u256 = utils::pow(256, 16).into() / mask;

        // Split the 2 input bytes16 chunks at offset_in_chunk.

        let (el_hh, el_hl) = u256_safe_div_rem(
            u256 { low: element.high, high: 0 }, u256_as_non_zero(mask_c)
        );

        let (el_lh, el_ll) = u256_safe_div_rem(
            u256 { low: element.low, high: 0 }, u256_as_non_zero(mask_c)
        );

        // Read the words at chunk_index, chunk_index + 2.
        let w0: u128 = self.items.get(chunk_index.into());
        let w2: u128 = self.items.get(chunk_index.into() + 2);

        // Compute the new words
        let w0_h: u256 = (w0.into() / mask);
        let w2_l: u256 = (w2.into() / mask);

        // We can convert them back to felt252 as we know they fit in one word.
        let new_w0: u128 = (w0_h.into() * mask + el_hh).try_into().unwrap();
        let new_w1: u128 = (el_hl.into() * mask + el_lh).try_into().unwrap();
        let new_w2: u128 = (el_ll.into() * mask + w2_l).try_into().unwrap();

        // Write the new words
        self.items.insert(chunk_index.into(), new_w0);
        self.items.insert(chunk_index.into() + 1, new_w1);
        self.items.insert(chunk_index.into() + 2, new_w2);
        (x - testing::get_available_gas()).print();
    }

32-bytes words memory (optimisation proposal)

    fn store(ref self: Memory, element: u256, offset: usize) {
        let x = testing::get_available_gas();
        gas::withdraw_gas().unwrap();
        let new_min_bytes_len = helpers::ceil_bytes_len_to_next_32_bytes_word(offset + 32);

        let new_bytes_len = if new_min_bytes_len > self.bytes_len {
            new_min_bytes_len
        } else {
            self.bytes_len
        };
        self.bytes_len = new_bytes_len;

        // Check alignment of offset to 32-bytes chunks
        let (chunk_index, offset_in_chunk) = u32_safe_divmod(offset, u32_as_non_zero(32));

        if offset_in_chunk == 0 {
            // Offset is aligned. This is the simplest and most efficient case,
            // so we optimize for it.
            self.items.insert(chunk_index.into(), NullableExtensionTrait::new(element));
            return ();
        }

        // Offset is misaligned.
        // |   W0     |   W1   |
        //     |     EL     |
        // ^---^
        //   |-- mask = 256 ** offset_in_chunk

        let mask: u256 = helpers::pow256_rev(offset_in_chunk);
        let mask_c: u256 = utils::pow(256, 32).into() / mask;

        // Split the input two chunks at offset_in_chunk.

        let (el_h, el_l) = u256_safe_div_rem(element, u256_as_non_zero(mask_c));

        // Read the words at chunk_index, chunk_index + 1.
        let w0: u256 = self.items.get(chunk_index.into()).deref_or_0();
        let w1: u256 = self.items.get(chunk_index.into() + 1).deref_or_0();

        // Compute the new words
        let w0: u256 = (w0.into() / mask);
        let w1: u256 = (w1.into() / mask);

        // We can convert them back to felt252 as we know they fit in one word.
        let new_w0: u256 = (w0.into() * mask + el_h);
        let new_w1: u256 = (el_l.into() * mask + w1);

        // Write the new words

        self.items.insert(chunk_index.into(), NullableExtensionTrait::new(new_w0));
        self.items.insert(chunk_index.into() + 1, NullableExtensionTrait::new(new_w1));
        (x - testing::get_available_gas()).print();
    }
}

Benchmarks

I ran tests on both implementations and computed the actual gas usage of both functions. Here is the output

❯ scarb test -f test_store_should_add_an_element_to_the_memory_with_offset
testing kakarot ...
running 2 tests
[DEBUG(memory_32_bytes_chunks)]	                               	(raw: 0x36cfe

[DEBUG(memory_16_bytes_chunks)]	                               	(raw: 0x38e78

This leads to a ~4% gas decrease for each store operation

Proposal

Replace the 16-bytes words implementation with the 32-bytes words one.

@enitrat enitrat added the enhancement New feature or request label Aug 4, 2023
@github-project-automation github-project-automation bot moved this to 🆕 Backlog in Kakarot on Starknet Aug 4, 2023
@Eikix Eikix added the low-priority Low Priority Issue - Has been deprioritised label Aug 29, 2023
@Eikix
Copy link
Member

Eikix commented Sep 8, 2023

Important update: we are gathering some bugs in the Kakarot v0 codebase, we need to make sure each issue and each PR in Kakarot-ssj is aware of the lists of known bugs. Look at this link everytime you take an issue and check your issue isn't targeted by a known bug.

@Eikix
Copy link
Member

Eikix commented Sep 8, 2023

Important update: we are gathering some bugs in the Kakarot v0 codebase, we need to make sure each issue and each PR in Kakarot-ssj is aware of the lists of known bugs. Look at this tracking issue everytime you take an issue and check your issue isn't targeted by a known bug. Will add this reminder in many places to make sure we keep track of known bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request low-priority Low Priority Issue - Has been deprioritised
Projects
No open projects
Status: 🆕 Backlog
Development

No branches or pull requests

2 participants