Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert relocation table to HashMap<usize, MaybeRelocatable> #1862

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

notlesh
Copy link

@notlesh notlesh commented Oct 30, 2024

Convert the relocation table to MaybeRelocatables in order to support ints

Description

Fixes #1860.

This PR converts the relocation table from a HashMap<usize, Relocatable> to a HashMap<usize, MaybeRelocatable> in order to support integers.

The original add_relocation_rule fns have been left untouched so that they still only support Relocatables and a new fn add_relocation_rule_maybe_relocatable() has been added to support using a MaybeRelocatable. This should prevent this change from affecting most users.

See #1860 for more details on the motivation for this change.

Description of the pull request changes and motivation.

Checklist

  • Linked to Github Issue
  • Unit tests added
  • Integration tests added.
  • This change requires new documentation.
    • Documentation has been added/updated.
    • CHANGELOG has been updated.

@pefontana
Copy link
Collaborator

Thanks for the contribution @notlesh !
We will take a look at the changes

/// Will return an error if any of the following conditions are not met:
/// - Source address's segment must be negative (temporary).
/// - Source address's offset must be zero.
/// - There shouldn't already be relocation at the source segment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't or mustn't? If the former, in which scenario would it be allowed? If the latter, can this function check it and return an error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Mustn't" would be more appropriate here, although this verbiage was copied verbatim (it's already used a couple times in the codebase), so I hesitate to make it different.

This condition is checked in fn Memory::add_relocation_rule_maybe_relocatable(), which this fn calls:

return Err(MemoryError::DuplicatedRelocation(src_ptr.segment_index));

@Oppen
Copy link
Contributor

Oppen commented Nov 4, 2024

This seems to cause a significant slowdown see workflow. Is there any way to avoid the hit for more conforming programs? Is it possible to fix the calling code instead?

@notlesh
Copy link
Author

notlesh commented Nov 4, 2024

This seems to cause a significant slowdown see workflow. Is there any way to avoid the hit for more conforming programs? Is it possible to fix the calling code instead?

Good catch, that's a significant performance decrease. I'm a bit surprised by it, actually, because it shouldn't be adding significant overhead. I'll try to run locally to understand it better.

I agree that such a performance impact is not really acceptable for such a corner case... 😅

We have tried fixing the calling code; we come up with a workaround but unfortunately it doesn't always work. The OS makes an assert after calling add_relocation_rule() which makes workarounds very tricky:

%{ memory.add_relocation_rule(src_ptr=ids.src_ptr, dest_ptr=ids.dest_ptr) %}
assert src_ptr = dest_ptr;

In this particular case, dest_ptr is an Int(0) which leaves us with only two options:

  • src_ptr must be Int(0) (which is what the above-mentioned workaround accomplishes)
    • (this also breaks the way add_relocation_rule() is supposed to work)
  • src_ptr must have a relocation rule which effectively makes it an Int(0) (which this PR accomplishes)

@Oppen
Copy link
Contributor

Oppen commented Nov 4, 2024

This seems to cause a significant slowdown see workflow. Is there any way to avoid the hit for more conforming programs? Is it possible to fix the calling code instead?

Good catch, that's a significant performance decrease. I'm a bit surprised by it, actually, because it shouldn't be adding significant overhead. I'll try to run locally to understand it better.

I agree that such a performance impact is not really acceptable for such a corner case... 😅

We have tried fixing the calling code; we come up with a workaround but unfortunately it doesn't always work. The OS makes an assert after calling add_relocation_rule() which makes workarounds very tricky:

%{ memory.add_relocation_rule(src_ptr=ids.src_ptr, dest_ptr=ids.dest_ptr) %}
assert src_ptr = dest_ptr;

In this particular case, dest_ptr is an Int(0) which leaves us with only two options:

  • src_ptr must be Int(0) (which is what the above-mentioned workaround accomplishes)

    • (this also breaks the way add_relocation_rule() is supposed to work)
  • src_ptr must have a relocation rule which effectively makes it an Int(0) (which this PR accomplishes)

IIUC, you only need to cover the Int(0) case? Maybe we can get away with just special casing that?

@notlesh
Copy link
Author

notlesh commented Nov 4, 2024

IIUC, you only need to cover the Int(0) case? Maybe we can get away with just special casing that?

Yes, this crossed my mind. The cases I've seen the OS fail on come from cast(0, felt*) which should mean Int(0) is the only case we care about. If so, we could perhaps have a magical value for segment_index == usize::MAX or something like that which triggers the desired behavior.

I suspect this overhead would be similar to my original approach, however. I run into compile errors when running cargo bench, but I'll try to sort that out and measure the perf locally.

Speaking of perf, I noticed there is a lot of noise in the measurements here, maybe the delta measured in this PR is within this noise?

@Oppen
Copy link
Contributor

Oppen commented Nov 5, 2024

IIUC, you only need to cover the Int(0) case? Maybe we can get away with just special casing that?

Yes, this crossed my mind. The cases I've seen the OS fail on come from cast(0, felt*) which should mean Int(0) is the only case we care about. If so, we could perhaps have a magical value for segment_index == usize::MAX or something like that which triggers the desired behavior.

Yes, that or the offset. Both are invalid as Relocatable fields so using them as sentinels should be alright.
Note that your original approach significantly increases the size of values (MaybeRelocatable is 40 bytes), which tends to be bad for cache locality, and in turn has a more complex logic for addition. The special-case approach, on the other hand, only adds one branch AFAICT. I expect some hit, but I expect it to be barely noticeable.

I suspect this overhead would be similar to my original approach, however. I run into compile errors when running cargo bench, but I'll try to sort that out and measure the perf locally.

I'm not sure cargo bench will give you useful information. Our benchmarks are end to end done with Hyperfine.

Speaking of perf, I noticed there is a lot of noise in the measurements here, maybe the delta measured in this PR is within this noise?

Erm, not quite. Those measurements are really not looked at. As you noticed, criterion is too noisy for meaningful comparison of commits in a CI. That page is obsolete and we no longer refer to it for performance because of that.
What we use are the inline comments that the CI itself generated (@pefontana copied its output here because for external contributors we can't publish the comment due to lack of credentials), which runs in the same machine the base and the merged refs for proper comparison under Hyperfine.
That said, sometimes there is noise in the Hyperfine run, but most of the time is within 1-2%. When in doubt we can trigger the workflow again. If you suspect this is the case we might run it again and see if it was just a temporary artifact.

@pefontana
Copy link
Collaborator

Yes let me post the benchmarks again, it seems like a ~15% performance degradation

Benchmark Results for unmodified programs :rocket:

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base big_factorial | 2.542 ± 0.077 | 2.462 | 2.686 | 1.00 |
| Head big_factorial | 2.845 ± 0.013 | 2.829 | 2.868 | 1.12 ± 0.03 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base big_fibonacci | 2.439 ± 0.023 | 2.407 | 2.471 | 1.00 |
| Head big_fibonacci | 2.826 ± 0.052 | 2.802 | 2.971 | 1.16 ± 0.02 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base blake2s_integration_benchmark | 9.317 ± 0.432 | 9.047 | 10.517 | 1.00 |
| Head blake2s_integration_benchmark | 10.417 ± 0.138 | 10.272 | 10.744 | 1.12 ± 0.05 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base compare_arrays_200000 | 2.602 ± 0.043 | 2.554 | 2.693 | 1.00 |
| Head compare_arrays_200000 | 2.973 ± 0.032 | 2.915 | 3.023 | 1.14 ± 0.02 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base dict_integration_benchmark | 1.701 ± 0.024 | 1.675 | 1.753 | 1.00 |
| Head dict_integration_benchmark | 1.967 ± 0.047 | 1.930 | 2.085 | 1.16 ± 0.03 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base field_arithmetic_get_square_benchmark | 1.441 ± 0.049 | 1.410 | 1.571 | 1.00 |
| Head field_arithmetic_get_square_benchmark | 1.620 ± 0.023 | 1.603 | 1.683 | 1.12 ± 0.04 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base integration_builtins | 9.329 ± 0.077 | 9.171 | 9.415 | 1.00 |
| Head integration_builtins | 10.505 ± 0.084 | 10.400 | 10.634 | 1.13 ± 0.01 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base keccak_integration_benchmark | 9.619 ± 0.125 | 9.476 | 9.916 | 1.00 |
| Head keccak_integration_benchmark | 10.887 ± 0.183 | 10.744 | 11.278 | 1.13 ± 0.02 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base linear_search | 2.642 ± 0.094 | 2.558 | 2.865 | 1.00 |
| Head linear_search | 2.973 ± 0.036 | 2.937 | 3.058 | 1.12 ± 0.04 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base math_cmp_and_pow_integration_benchmark | 1.757 ± 0.023 | 1.734 | 1.801 | 1.00 |
| Head math_cmp_and_pow_integration_benchmark | 2.022 ± 0.036 | 1.996 | 2.122 | 1.15 ± 0.03 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base math_integration_benchmark | 1.717 ± 0.018 | 1.695 | 1.743 | 1.00 |
| Head math_integration_benchmark | 1.952 ± 0.009 | 1.940 | 1.968 | 1.14 ± 0.01 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base memory_integration_benchmark | 1.440 ± 0.038 | 1.408 | 1.518 | 1.00 |
| Head memory_integration_benchmark | 1.651 ± 0.024 | 1.632 | 1.697 | 1.15 ± 0.03 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base operations_with_data_structures_benchmarks | 1.803 ± 0.017 | 1.785 | 1.832 | 1.00 |
| Head operations_with_data_structures_benchmarks | 2.054 ± 0.031 | 2.025 | 2.115 | 1.14 ± 0.02 |

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| Base pedersen | 597.0 ± 6.8 | 590.4 | 610.0 | 1.00 |
| Head pedersen | 653.2 ± 9.3 | 647.7 | 679.2 | 1.09 ± 0.02 |

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| Base poseidon_integration_benchmark | 704.0 ± 5.6 | 695.4 | 712.2 | 1.00 |
| Head poseidon_integration_benchmark | 768.1 ± 8.3 | 757.5 | 784.7 | 1.09 ± 0.01 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base secp_integration_benchmark | 2.073 ± 0.029 | 2.048 | 2.143 | 1.00 |
| Head secp_integration_benchmark | 2.247 ± 0.026 | 2.221 | 2.296 | 1.08 ± 0.02 |

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| Base set_integration_benchmark | 713.2 ± 11.0 | 701.9 | 735.5 | 1.00 |
| Head set_integration_benchmark | 737.5 ± 2.4 | 734.4 | 741.1 | 1.03 ± 0.02 |

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| Base uint256_integration_benchmark | 5.027 ± 0.057 | 4.972 | 5.110 | 1.00 |
| Head uint256_integration_benchmark | 5.819 ± 0.157 | 5.690 | 6.222 | 1.16 ± 0.03 |

@pefontana
Copy link
Collaborator

pefontana commented Nov 8, 2024

Hi @notlesh !
We discussed a litle wit the team, what do you think about this:
To use the OS you already use the feature flag extensive_hints no?
Can you move this changes into that feature flag? With this we wont get any performance degradation and also the code will look simpler.
Also I think it will be nice if you could have the two implementations for the functions, to keep the things cleaner, like:

#[cfg(feature = "extensive_hints")]
    fn relocate_address(
        addr: Relocatable,
        relocation_rules: &HashMap<usize, Relocatable>,
#[cfg(feature = "extensive_hints")]
    fn relocate_address(
        addr: Relocatable,
        relocation_rules: &HashMap<usize, MaybeRelocatable>,
        

Tell me what you think

@notlesh
Copy link
Author

notlesh commented Nov 8, 2024

Hi @notlesh ! We discussed a litle wit the team, what do you think about this: To use the OS you already use the feature flag extensive_hints no? Can you move this changes into that feature flag? With this we wont get any performance degradation and also the code will look simpler. Also I think it will be nice if you could have the two implementations for the functions, to keep the things cleaner, like:

#[cfg(feature = "extensive_hints")]
    fn relocate_address(
        addr: Relocatable,
        relocation_rules: &HashMap<usize, Relocatable>,
#[cfg(feature = "extensive_hints")]
    fn relocate_address(
        addr: Relocatable,
        relocation_rules: &HashMap<usize, MaybeRelocatable>,
        

Tell me what you think

Hey, I think those are great ideas. It certainly solves the perf issue and it gives the VM a way to behave exactly like the pythonic impl.

That said, we are also discussing some small changes to the OS itself that would prevent relocation-to-integers, so this change on the VM may not be needed. I'll keep you posted!

Edit: yes, we are using the extensive_hints feature

@pefontana
Copy link
Collaborator

Hi @notlesh ! We discussed a litle wit the team, what do you think about this: To use the OS you already use the feature flag extensive_hints no? Can you move this changes into that feature flag? With this we wont get any performance degradation and also the code will look simpler. Also I think it will be nice if you could have the two implementations for the functions, to keep the things cleaner, like:

#[cfg(feature = "extensive_hints")]
    fn relocate_address(
        addr: Relocatable,
        relocation_rules: &HashMap<usize, Relocatable>,
#[cfg(feature = "extensive_hints")]
    fn relocate_address(
        addr: Relocatable,
        relocation_rules: &HashMap<usize, MaybeRelocatable>,
        

Tell me what you think

Hey, I think those are great ideas. It certainly solves the perf issue and it gives the VM a way to behave exactly like the pythonic impl.

That said, we are also discussing some small changes to the OS itself that would prevent relocation-to-integers, so this change on the VM may not be needed. I'll keep you posted!

Edit: yes, we are using the extensive_hints feature

Great @notlesh !
Hope the OS changes works!

@notlesh
Copy link
Author

notlesh commented Nov 11, 2024

@pefontana cf36b4a is a first pass at putting this all behind extensive_hints. It could use some clean up, but feel free to run benchmarks against it to confirm it fixes the perf issue.

@pefontana
Copy link
Collaborator

@notlesh Can you solve merge conflicts, so I can approve the CI run?

@JulianGCalderon
Copy link
Contributor

@notlesh the performance hit seems to be fixed, but some builds in the CI are failing. Could you fix them?

@notlesh
Copy link
Author

notlesh commented Nov 21, 2024

@notlesh the performance hit seems to be fixed, but some builds in the CI are failing. Could you fix them?

I think I took care of them all now.

Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory operations should sometimes accept an Integer
5 participants