Convert relocation table to HashMap<usize, MaybeRelocatable> #1862

notlesh · 2024-10-30T21:36:47Z

Convert the relocation table to `MaybeRelocatable`s in order to support ints

Description

This PR converts the relocation table from a HashMap<usize, Relocatable> to a HashMap<usize, MaybeRelocatable> in order to support integers.

The original add_relocation_rule fns have been left untouched so that they still only support Relocatables and a new fn add_relocation_rule_maybe_relocatable() has been added to support using a MaybeRelocatable. This should prevent this change from affecting most users.

See #1860 for more details on the motivation for this change.

Description of the pull request changes and motivation.

Checklist

Linked to Github Issue
Unit tests added
Integration tests added.
This change requires new documentation.
- Documentation has been added/updated.
- CHANGELOG has been updated.

pefontana · 2024-11-01T15:50:21Z

Thanks for the contribution @notlesh !
We will take a look at the changes

Oppen · 2024-11-04T17:25:26Z

vm/src/vm/vm_core.rs

+    /// Will return an error if any of the following conditions are not met:
+    ///   - Source address's segment must be negative (temporary).
+    ///   - Source address's offset must be zero.
+    ///   - There shouldn't already be relocation at the source segment.


Shouldn't or mustn't? If the former, in which scenario would it be allowed? If the latter, can this function check it and return an error?

"Mustn't" would be more appropriate here, although this verbiage was copied verbatim (it's already used a couple times in the codebase), so I hesitate to make it different.

This condition is checked in fn Memory::add_relocation_rule_maybe_relocatable(), which this fn calls:

cairo-vm/vm/src/vm/vm_memory/memory.rs

Line 338 in 159f67d

return Err(MemoryError::DuplicatedRelocation(src_ptr.segment_index));

Oppen · 2024-11-04T17:27:41Z

This seems to cause a significant slowdown see workflow. Is there any way to avoid the hit for more conforming programs? Is it possible to fix the calling code instead?

notlesh · 2024-11-04T17:51:52Z

This seems to cause a significant slowdown see workflow. Is there any way to avoid the hit for more conforming programs? Is it possible to fix the calling code instead?

Good catch, that's a significant performance decrease. I'm a bit surprised by it, actually, because it shouldn't be adding significant overhead. I'll try to run locally to understand it better.

I agree that such a performance impact is not really acceptable for such a corner case... 😅

We have tried fixing the calling code; we come up with a workaround but unfortunately it doesn't always work. The OS makes an assert after calling add_relocation_rule() which makes workarounds very tricky:

%{ memory.add_relocation_rule(src_ptr=ids.src_ptr, dest_ptr=ids.dest_ptr) %}
assert src_ptr = dest_ptr;

In this particular case, dest_ptr is an Int(0) which leaves us with only two options:

src_ptr must be Int(0) (which is what the above-mentioned workaround accomplishes)
- (this also breaks the way add_relocation_rule() is supposed to work)
src_ptr must have a relocation rule which effectively makes it an Int(0) (which this PR accomplishes)

Oppen · 2024-11-04T21:25:00Z

This seems to cause a significant slowdown see workflow. Is there any way to avoid the hit for more conforming programs? Is it possible to fix the calling code instead?

Good catch, that's a significant performance decrease. I'm a bit surprised by it, actually, because it shouldn't be adding significant overhead. I'll try to run locally to understand it better.

I agree that such a performance impact is not really acceptable for such a corner case... 😅

We have tried fixing the calling code; we come up with a workaround but unfortunately it doesn't always work. The OS makes an assert after calling add_relocation_rule() which makes workarounds very tricky:
%{ memory.add_relocation_rule(src_ptr=ids.src_ptr, dest_ptr=ids.dest_ptr) %}
assert src_ptr = dest_ptr;
In this particular case, dest_ptr is an Int(0) which leaves us with only two options:

src_ptr must be Int(0) (which is what the above-mentioned workaround accomplishes)

(this also breaks the way add_relocation_rule() is supposed to work)

src_ptr must have a relocation rule which effectively makes it an Int(0) (which this PR accomplishes)

IIUC, you only need to cover the Int(0) case? Maybe we can get away with just special casing that?

notlesh · 2024-11-04T21:59:23Z

IIUC, you only need to cover the Int(0) case? Maybe we can get away with just special casing that?

Yes, this crossed my mind. The cases I've seen the OS fail on come from cast(0, felt*) which should mean Int(0) is the only case we care about. If so, we could perhaps have a magical value for segment_index == usize::MAX or something like that which triggers the desired behavior.

I suspect this overhead would be similar to my original approach, however. I run into compile errors when running cargo bench, but I'll try to sort that out and measure the perf locally.

Speaking of perf, I noticed there is a lot of noise in the measurements here, maybe the delta measured in this PR is within this noise?

Oppen · 2024-11-05T22:59:32Z

IIUC, you only need to cover the Int(0) case? Maybe we can get away with just special casing that?

Yes, this crossed my mind. The cases I've seen the OS fail on come from cast(0, felt*) which should mean Int(0) is the only case we care about. If so, we could perhaps have a magical value for segment_index == usize::MAX or something like that which triggers the desired behavior.

Yes, that or the offset. Both are invalid as Relocatable fields so using them as sentinels should be alright.
Note that your original approach significantly increases the size of values (MaybeRelocatable is 40 bytes), which tends to be bad for cache locality, and in turn has a more complex logic for addition. The special-case approach, on the other hand, only adds one branch AFAICT. I expect some hit, but I expect it to be barely noticeable.

I suspect this overhead would be similar to my original approach, however. I run into compile errors when running cargo bench, but I'll try to sort that out and measure the perf locally.

I'm not sure cargo bench will give you useful information. Our benchmarks are end to end done with Hyperfine.

Speaking of perf, I noticed there is a lot of noise in the measurements here, maybe the delta measured in this PR is within this noise?

Erm, not quite. Those measurements are really not looked at. As you noticed, criterion is too noisy for meaningful comparison of commits in a CI. That page is obsolete and we no longer refer to it for performance because of that.
What we use are the inline comments that the CI itself generated (@pefontana copied its output here because for external contributors we can't publish the comment due to lack of credentials), which runs in the same machine the base and the merged refs for proper comparison under Hyperfine.
That said, sometimes there is noise in the Hyperfine run, but most of the time is within 1-2%. When in doubt we can trigger the workflow again. If you suspect this is the case we might run it again and see if it was just a temporary artifact.

pefontana · 2024-11-06T18:05:47Z

Yes let me post the benchmarks again, it seems like a ~15% performance degradation

Benchmark Results for unmodified programs :rocket:

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base big_factorial | 2.542 ± 0.077 | 2.462 | 2.686 | 1.00 |
| Head big_factorial | 2.845 ± 0.013 | 2.829 | 2.868 | 1.12 ± 0.03 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base big_fibonacci | 2.439 ± 0.023 | 2.407 | 2.471 | 1.00 |
| Head big_fibonacci | 2.826 ± 0.052 | 2.802 | 2.971 | 1.16 ± 0.02 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base blake2s_integration_benchmark | 9.317 ± 0.432 | 9.047 | 10.517 | 1.00 |
| Head blake2s_integration_benchmark | 10.417 ± 0.138 | 10.272 | 10.744 | 1.12 ± 0.05 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base compare_arrays_200000 | 2.602 ± 0.043 | 2.554 | 2.693 | 1.00 |
| Head compare_arrays_200000 | 2.973 ± 0.032 | 2.915 | 3.023 | 1.14 ± 0.02 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base dict_integration_benchmark | 1.701 ± 0.024 | 1.675 | 1.753 | 1.00 |
| Head dict_integration_benchmark | 1.967 ± 0.047 | 1.930 | 2.085 | 1.16 ± 0.03 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base field_arithmetic_get_square_benchmark | 1.441 ± 0.049 | 1.410 | 1.571 | 1.00 |
| Head field_arithmetic_get_square_benchmark | 1.620 ± 0.023 | 1.603 | 1.683 | 1.12 ± 0.04 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base integration_builtins | 9.329 ± 0.077 | 9.171 | 9.415 | 1.00 |
| Head integration_builtins | 10.505 ± 0.084 | 10.400 | 10.634 | 1.13 ± 0.01 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base keccak_integration_benchmark | 9.619 ± 0.125 | 9.476 | 9.916 | 1.00 |
| Head keccak_integration_benchmark | 10.887 ± 0.183 | 10.744 | 11.278 | 1.13 ± 0.02 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base linear_search | 2.642 ± 0.094 | 2.558 | 2.865 | 1.00 |
| Head linear_search | 2.973 ± 0.036 | 2.937 | 3.058 | 1.12 ± 0.04 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base math_cmp_and_pow_integration_benchmark | 1.757 ± 0.023 | 1.734 | 1.801 | 1.00 |
| Head math_cmp_and_pow_integration_benchmark | 2.022 ± 0.036 | 1.996 | 2.122 | 1.15 ± 0.03 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base math_integration_benchmark | 1.717 ± 0.018 | 1.695 | 1.743 | 1.00 |
| Head math_integration_benchmark | 1.952 ± 0.009 | 1.940 | 1.968 | 1.14 ± 0.01 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base memory_integration_benchmark | 1.440 ± 0.038 | 1.408 | 1.518 | 1.00 |
| Head memory_integration_benchmark | 1.651 ± 0.024 | 1.632 | 1.697 | 1.15 ± 0.03 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base operations_with_data_structures_benchmarks | 1.803 ± 0.017 | 1.785 | 1.832 | 1.00 |
| Head operations_with_data_structures_benchmarks | 2.054 ± 0.031 | 2.025 | 2.115 | 1.14 ± 0.02 |

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| Base pedersen | 597.0 ± 6.8 | 590.4 | 610.0 | 1.00 |
| Head pedersen | 653.2 ± 9.3 | 647.7 | 679.2 | 1.09 ± 0.02 |

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| Base poseidon_integration_benchmark | 704.0 ± 5.6 | 695.4 | 712.2 | 1.00 |
| Head poseidon_integration_benchmark | 768.1 ± 8.3 | 757.5 | 784.7 | 1.09 ± 0.01 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base secp_integration_benchmark | 2.073 ± 0.029 | 2.048 | 2.143 | 1.00 |
| Head secp_integration_benchmark | 2.247 ± 0.026 | 2.221 | 2.296 | 1.08 ± 0.02 |

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| Base set_integration_benchmark | 713.2 ± 11.0 | 701.9 | 735.5 | 1.00 |
| Head set_integration_benchmark | 737.5 ± 2.4 | 734.4 | 741.1 | 1.03 ± 0.02 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base uint256_integration_benchmark | 5.027 ± 0.057 | 4.972 | 5.110 | 1.00 |
| Head uint256_integration_benchmark | 5.819 ± 0.157 | 5.690 | 6.222 | 1.16 ± 0.03 |

pefontana · 2024-11-08T19:51:48Z

Hi @notlesh !
We discussed a litle wit the team, what do you think about this:
To use the OS you already use the feature flag extensive_hints no?
Can you move this changes into that feature flag? With this we wont get any performance degradation and also the code will look simpler.
Also I think it will be nice if you could have the two implementations for the functions, to keep the things cleaner, like:

#[cfg(feature = "extensive_hints")]
    fn relocate_address(
        addr: Relocatable,
        relocation_rules: &HashMap<usize, Relocatable>,
#[cfg(feature = "extensive_hints")]
    fn relocate_address(
        addr: Relocatable,
        relocation_rules: &HashMap<usize, MaybeRelocatable>,

Tell me what you think

notlesh · 2024-11-08T21:48:52Z

Hi @notlesh ! We discussed a litle wit the team, what do you think about this: To use the OS you already use the feature flag extensive_hints no? Can you move this changes into that feature flag? With this we wont get any performance degradation and also the code will look simpler. Also I think it will be nice if you could have the two implementations for the functions, to keep the things cleaner, like:
#[cfg(feature = "extensive_hints")]
    fn relocate_address(
        addr: Relocatable,
        relocation_rules: &HashMap<usize, Relocatable>,
#[cfg(feature = "extensive_hints")]
    fn relocate_address(
        addr: Relocatable,
        relocation_rules: &HashMap<usize, MaybeRelocatable>,
        
Tell me what you think

Hey, I think those are great ideas. It certainly solves the perf issue and it gives the VM a way to behave exactly like the pythonic impl.

That said, we are also discussing some small changes to the OS itself that would prevent relocation-to-integers, so this change on the VM may not be needed. I'll keep you posted!

Edit: yes, we are using the extensive_hints feature

pefontana · 2024-11-11T18:55:22Z

Hi @notlesh ! We discussed a litle wit the team, what do you think about this: To use the OS you already use the feature flag extensive_hints no? Can you move this changes into that feature flag? With this we wont get any performance degradation and also the code will look simpler. Also I think it will be nice if you could have the two implementations for the functions, to keep the things cleaner, like:
#[cfg(feature = "extensive_hints")]
    fn relocate_address(
        addr: Relocatable,
        relocation_rules: &HashMap<usize, Relocatable>,
#[cfg(feature = "extensive_hints")]
    fn relocate_address(
        addr: Relocatable,
        relocation_rules: &HashMap<usize, MaybeRelocatable>,
        
Tell me what you think
Hey, I think those are great ideas. It certainly solves the perf issue and it gives the VM a way to behave exactly like the pythonic impl.

That said, we are also discussing some small changes to the OS itself that would prevent relocation-to-integers, so this change on the VM may not be needed. I'll keep you posted!

Edit: yes, we are using the extensive_hints feature

Great @notlesh !
Hope the OS changes works!

notlesh · 2024-11-11T21:38:11Z

@pefontana cf36b4a is a first pass at putting this all behind extensive_hints. It could use some clean up, but feel free to run benchmarks against it to confirm it fixes the perf issue.

pefontana · 2024-11-19T17:17:06Z

@notlesh Can you solve merge conflicts, so I can approve the CI run?

JulianGCalderon · 2024-11-20T17:37:06Z

@notlesh the performance hit seems to be fixed, but some builds in the CI are failing. Could you fix them?

notlesh · 2024-11-21T18:32:32Z

@notlesh the performance hit seems to be fixed, but some builds in the CI are failing. Could you fix them?

I think I took care of them all now.

vm/src/vm/vm_memory/memory.rs

Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>

Convert relocation table to HashMap<usize, MaybeRelocatable>

d0c001d

notlesh requested review from igaray, Oppen, fmoletta, juanbono and pefontana as code owners October 30, 2024 21:36

notlesh added 3 commits October 31, 2024 08:59

Update CHANGELOG

3a40ec5

Add documentation

1def915

Add a couple basic tests

e87cd13

Oppen reviewed Nov 4, 2024

View reviewed changes

notlesh mentioned this pull request Nov 4, 2024

Preview upstream retdata fix keep-starknet-strange/snos#419

Open

First pass at hiding relocation table mods behind extensive_hints

cf36b4a

Merge branch 'main' into notlesh/allow-ints-in-relocation-table

6e5ea89

notlesh added 3 commits November 21, 2024 10:36

fmt

a8a8640

Run tests only for feature = "extensive_hints"

e124462

Merge branch 'main' into notlesh/allow-ints-in-relocation-table

c48bde1

Silence clippy warning about useless conversion

e6fe0be

gabrielbosio reviewed Nov 22, 2024

View reviewed changes

vm/src/vm/vm_memory/memory.rs Outdated Show resolved Hide resolved

Only use Into if "extensive_hints"

72d27c8

Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert relocation table to HashMap<usize, MaybeRelocatable> #1862

Convert relocation table to HashMap<usize, MaybeRelocatable> #1862

notlesh commented Oct 30, 2024 •

edited

Loading

pefontana commented Nov 1, 2024

Oppen Nov 4, 2024

notlesh Nov 4, 2024

Oppen commented Nov 4, 2024

notlesh commented Nov 4, 2024 •

edited

Loading

Oppen commented Nov 4, 2024

notlesh commented Nov 4, 2024

Oppen commented Nov 5, 2024

pefontana commented Nov 6, 2024

pefontana commented Nov 8, 2024 •

edited

Loading

notlesh commented Nov 8, 2024 •

edited

Loading

pefontana commented Nov 11, 2024

notlesh commented Nov 11, 2024

pefontana commented Nov 19, 2024

JulianGCalderon commented Nov 20, 2024

notlesh commented Nov 21, 2024

Convert relocation table to HashMap<usize, MaybeRelocatable> #1862

Are you sure you want to change the base?

Convert relocation table to HashMap<usize, MaybeRelocatable> #1862

Conversation

notlesh commented Oct 30, 2024 • edited Loading

Convert the relocation table to MaybeRelocatables in order to support ints

Description

Checklist

pefontana commented Nov 1, 2024

Oppen Nov 4, 2024

Choose a reason for hiding this comment

notlesh Nov 4, 2024

Choose a reason for hiding this comment

Oppen commented Nov 4, 2024

notlesh commented Nov 4, 2024 • edited Loading

Oppen commented Nov 4, 2024

notlesh commented Nov 4, 2024

Oppen commented Nov 5, 2024

pefontana commented Nov 6, 2024

pefontana commented Nov 8, 2024 • edited Loading

notlesh commented Nov 8, 2024 • edited Loading

pefontana commented Nov 11, 2024

notlesh commented Nov 11, 2024

pefontana commented Nov 19, 2024

JulianGCalderon commented Nov 20, 2024

notlesh commented Nov 21, 2024

notlesh commented Oct 30, 2024 •

edited

Loading

Convert the relocation table to `MaybeRelocatable`s in order to support ints

notlesh commented Nov 4, 2024 •

edited

Loading

pefontana commented Nov 8, 2024 •

edited

Loading

notlesh commented Nov 8, 2024 •

edited

Loading