dev: optimize memory by using 32-byte words #51

enitrat · 2023-08-04T18:28:52Z

Description

Memory is one of the core elements of the EVM. Having it as optimized as possible would reduce the overall costs of all transactions ran through the EVM.

Here is my proposal: We can refactor the memory to be 32-bytes words based instead of the current 16-bytes words based implementation.

I started to implement the new version to run some benchmarks. Below are the related code snippets.

16-bytes words memory (current implementation)

    fn store(ref self: Memory, element: u256, offset: usize) {
        let x = testing::get_available_gas();
        gas::withdraw_gas().unwrap();
        let new_min_bytes_len = helpers::ceil_bytes_len_to_next_32_bytes_word(offset + 32);

        let new_bytes_len = if new_min_bytes_len > self.bytes_len {
            new_min_bytes_len
        } else {
            self.bytes_len
        };
        self.bytes_len = new_bytes_len;

        // Check alignment of offset to bytes16 chunks
        let (chunk_index, offset_in_chunk) = u32_safe_divmod(offset, u32_as_non_zero(16));

        if offset_in_chunk == 0 {
            // Offset is aligned. This is the simplest and most efficient case,
            // so we optimize for it.
            self.items.store_u256(element, chunk_index);
            return ();
        }

        // Offset is misaligned.
        // |   W0   |   W1   |   w2   |
        //     |  EL_H  |  EL_L  |
        // ^---^
        //   |-- mask = 256 ** offset_in_chunk

        let mask: u256 = helpers::pow256_rev(offset_in_chunk);
        let mask_c: u256 = utils::pow(256, 16).into() / mask;

        // Split the 2 input bytes16 chunks at offset_in_chunk.

        let (el_hh, el_hl) = u256_safe_div_rem(
            u256 { low: element.high, high: 0 }, u256_as_non_zero(mask_c)
        );

        let (el_lh, el_ll) = u256_safe_div_rem(
            u256 { low: element.low, high: 0 }, u256_as_non_zero(mask_c)
        );

        // Read the words at chunk_index, chunk_index + 2.
        let w0: u128 = self.items.get(chunk_index.into());
        let w2: u128 = self.items.get(chunk_index.into() + 2);

        // Compute the new words
        let w0_h: u256 = (w0.into() / mask);
        let w2_l: u256 = (w2.into() / mask);

        // We can convert them back to felt252 as we know they fit in one word.
        let new_w0: u128 = (w0_h.into() * mask + el_hh).try_into().unwrap();
        let new_w1: u128 = (el_hl.into() * mask + el_lh).try_into().unwrap();
        let new_w2: u128 = (el_ll.into() * mask + w2_l).try_into().unwrap();

        // Write the new words
        self.items.insert(chunk_index.into(), new_w0);
        self.items.insert(chunk_index.into() + 1, new_w1);
        self.items.insert(chunk_index.into() + 2, new_w2);
        (x - testing::get_available_gas()).print();
    }

32-bytes words memory (optimisation proposal)

    fn store(ref self: Memory, element: u256, offset: usize) {
        let x = testing::get_available_gas();
        gas::withdraw_gas().unwrap();
        let new_min_bytes_len = helpers::ceil_bytes_len_to_next_32_bytes_word(offset + 32);

        let new_bytes_len = if new_min_bytes_len > self.bytes_len {
            new_min_bytes_len
        } else {
            self.bytes_len
        };
        self.bytes_len = new_bytes_len;

        // Check alignment of offset to 32-bytes chunks
        let (chunk_index, offset_in_chunk) = u32_safe_divmod(offset, u32_as_non_zero(32));

        if offset_in_chunk == 0 {
            // Offset is aligned. This is the simplest and most efficient case,
            // so we optimize for it.
            self.items.insert(chunk_index.into(), NullableExtensionTrait::new(element));
            return ();
        }

        // Offset is misaligned.
        // |   W0     |   W1   |
        //     |     EL     |
        // ^---^
        //   |-- mask = 256 ** offset_in_chunk

        let mask: u256 = helpers::pow256_rev(offset_in_chunk);
        let mask_c: u256 = utils::pow(256, 32).into() / mask;

        // Split the input two chunks at offset_in_chunk.

        let (el_h, el_l) = u256_safe_div_rem(element, u256_as_non_zero(mask_c));

        // Read the words at chunk_index, chunk_index + 1.
        let w0: u256 = self.items.get(chunk_index.into()).deref_or_0();
        let w1: u256 = self.items.get(chunk_index.into() + 1).deref_or_0();

        // Compute the new words
        let w0: u256 = (w0.into() / mask);
        let w1: u256 = (w1.into() / mask);

        // We can convert them back to felt252 as we know they fit in one word.
        let new_w0: u256 = (w0.into() * mask + el_h);
        let new_w1: u256 = (el_l.into() * mask + w1);

        // Write the new words

        self.items.insert(chunk_index.into(), NullableExtensionTrait::new(new_w0));
        self.items.insert(chunk_index.into() + 1, NullableExtensionTrait::new(new_w1));
        (x - testing::get_available_gas()).print();
    }
}

Benchmarks

I ran tests on both implementations and computed the actual gas usage of both functions. Here is the output

❯ scarb test -f test_store_should_add_an_element_to_the_memory_with_offset
testing kakarot ...
running 2 tests
[DEBUG(memory_32_bytes_chunks)]	                               	(raw: 0x36cfe

[DEBUG(memory_16_bytes_chunks)]	                               	(raw: 0x38e78

This leads to a ~4% gas decrease for each store operation

Proposal

Replace the 16-bytes words implementation with the 32-bytes words one.

The text was updated successfully, but these errors were encountered:

Eikix · 2023-09-08T05:06:04Z

Important update: we are gathering some bugs in the Kakarot v0 codebase, we need to make sure each issue and each PR in Kakarot-ssj is aware of the lists of known bugs. Look at this link everytime you take an issue and check your issue isn't targeted by a known bug.

Eikix · 2023-09-08T05:12:39Z

Important update: we are gathering some bugs in the Kakarot v0 codebase, we need to make sure each issue and each PR in Kakarot-ssj is aware of the lists of known bugs. Look at this tracking issue everytime you take an issue and check your issue isn't targeted by a known bug. Will add this reminder in many places to make sure we keep track of known bugs.

enitrat added the enhancement New feature or request label Aug 4, 2023

github-project-automation bot added this to Kakarot on Starknet Aug 4, 2023

github-project-automation bot moved this to 🆕 Backlog in Kakarot on Starknet Aug 4, 2023

Eikix added this to the Q4-2023 - Continuous Improvement - Codebase Improvement milestone Aug 29, 2023

Eikix added the low-priority Low Priority Issue - Has been deprioritised label Aug 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dev: optimize memory by using 32-byte words #51

dev: optimize memory by using 32-byte words #51

enitrat commented Aug 4, 2023

Eikix commented Sep 8, 2023

Eikix commented Sep 8, 2023

dev: optimize memory by using 32-byte words #51

dev: optimize memory by using 32-byte words #51

Comments

enitrat commented Aug 4, 2023

Description

16-bytes words memory (current implementation)

32-bytes words memory (optimisation proposal)

Benchmarks

Proposal

Eikix commented Sep 8, 2023

Eikix commented Sep 8, 2023