Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement copy on write optimization for arrays in brillig #3118

Closed
wants to merge 12 commits into from

Conversation

jfecher
Copy link
Contributor

@jfecher jfecher commented Oct 11, 2023

Description

Problem*

Use of array_set in a loop can greatly slow down brillig code since under the hood on master, brillig will currently copy the entire array in order to return a new array with one element changed.

Resolves (couldn't find an existing issue for this, do we have one?)

Summary*

This PR adds the copy on write optimization to brillig in order to optimize this usecase. To implement this, arrays in brillig are now reference counted and there is a new Instruction::IncrementRc instruction in SSA to increment reference count (1). During an array_set in brillig, the reference count is checked. If it is 1, brillig will mutate the existing array instead of copying to a new one. If not, brillig will fall back to creating a new array, although crucially this new array will always have reference count 1, allowing each subsequent set to be a mutated one. This leads to exponential speedup in loops using array_set.

The reference count is increased when creating a new variable via let and when passing values to functions. The reference count is also (currently) never decreased - so the analysis is rather pessimistic. We could look into changing this in the future, but for now it is sufficient since the first array_set performs a copy and the rest will still be optimized.

The following example:

fn main() {
    let array = [0; 5000];
    assert(foo(array) != 100);
    assert(array[5] == 0);
}

unconstrained fn foo<N>(mut array: [Field; N]) -> Field {
    let copy = array;
    let mut sum = 0;

    for i in 0 .. array.len() {
        array[i] = i;
        sum += i;
    }

    assert(copy[5] == 0);
    sum
}

Runs in 58s on master (release) on my machine, and 0.8s with the optimization on this branch. Results may vary of course, and will be slower when compiling nargo on the debug profile.

Note that this test also includes array aliases so we can see the mutations are safe to perform without affecting other values.

Documentation

  • This PR requires documentation updates when merged.

    • I will submit a noir-lang/docs PR.
    • I will request for and support Dev Rel's help in documenting this PR.

Additional Context

Regarding (1), a possible alternate design is to encode & expose reference counts directly in SSA rather than keeping them only in brillig. For example, arrays could instead be represented as a tuple of (reference_count, array) where reference_count: &mut Field. From here, we could have each array_set either translate to an if under the hood:

  v1 = load reference_count
  v2 = eq v1, Field 1
  jmp_if v2, then: b2, else: b3
b2:
  // array has 1 rc
  array_set_mut array, index, value
  jmp b4(array)
b3:
  // array has > 1 rc
  v3 = array_set array, index, value
  jmp b4(v3)
b4(v4: [Field]):
  ... continue with v4 for the result of the array set ...

This would greatly bloat the ssa however, so I decided against it. An alternative would be modifying the array_set instruction to accept the reference count as an argument:

v1 = load reference_count
v2 = array_set array, index, value, v1

Then internally (in brillig, as it is ignored in ssa) we would check the reference count for each array set operation similarly to how is currently done in this PR.


Additionally, this is still a draft because I've left some TODO's / messy bits in the code to clean up. The core approach should be ready for review though or we can discuss whether the alternative approach above would be better.

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

@jfecher
Copy link
Contributor Author

jfecher commented Oct 11, 2023

Looks like there's an odd value in one of the brillig tests. Anyone have any insight on this? Very possible I added the wrong expected registers for each reference count in some of the brillig tests.

@TomAFrench
Copy link
Member

This is gonna need an update the ACIR serialisation logic in bb unfortunately.

@kevaundray
Copy link
Contributor

I think the approach here is reasonable -- Alvaro who has been thinking about this for a while is away for a week, so he won't be able to chime in at the moment.

The original issue was because of the noir-rsa program, could we get a comparison vs master with that program: https://github.com/SetProtocol/noir-rsa ?

@jfecher
Copy link
Contributor Author

jfecher commented Oct 12, 2023

When testing on the noir-rsa repo, I'm currently getting an out of bounds read for brillig memory on this PR for some reason. When handling the Load opcode it is trying to read memory at index 2262, but the length is also only 2262.

Copy link
Contributor

@sirasistant sirasistant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only skimmed through the code, I might have misunderstood something. Will make a more complete review later today or early tomorrow 😄

Comment on lines 580 to 583
RegisterOrMemory::HeapArray(HeapArray { pointer, .. }) => {
self.load_instruction(pointer, variable_pointer);
}
RegisterOrMemory::HeapVector(HeapVector { pointer, size }) => {
RegisterOrMemory::HeapVector(HeapVector { pointer, size, reference_count: _ }) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should read reference_count from stored array and vector variables

Comment on lines 623 to 629
RegisterOrMemory::HeapArray(HeapArray { pointer, size, reference_count: _ }) => {
self.store_instruction(variable_pointer, pointer);
let size_constant = self.make_constant(Value::from(size));
self.store_instruction(size_pointer, size_constant);
self.deallocate_register(size_constant);
}
RegisterOrMemory::HeapVector(HeapVector { pointer, size }) => {
RegisterOrMemory::HeapVector(HeapVector { pointer, size, reference_count: _ }) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

symmetric to the load, I think we need to store those

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also would need to update allocate_variable_instruction implementation to take into account that now variables can be up to 3 items (pointer, length and reference count)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand these load and store methods or how I'd extend them to add a reference count. It looks like they only take in a source_variable for the array pointer, and a address_register that we use to store the size? I'm not sure why we'd store the size where the address goes, nor am I sure how this would be extended to store a reference count as well.

acvm-repo/brillig/src/opcodes.rs Outdated Show resolved Hide resolved
@jfecher
Copy link
Contributor Author

jfecher commented Oct 17, 2023

I've found that this PR fails the following program at the assert:

global M = 5000;

fn main() {
    let array = [0; M];
    let res = foo(array);
    assert(res == M * 2, "foo(array) != M * 2");
}

unconstrained fn foo<N>(mut array: [Field; N]) -> Field {
    let copy = array;
    let mut sum = 0;

    // dep::std::println(array.len());

    for i in 0 .. array.len() {
        array[i] = 2;
        sum += array[i];
    }

    sum
}

Yet, if the println is uncommented, it will then pass.

Edit: Nevermind, this was fixed by fixing extract_registers

@jfecher
Copy link
Contributor Author

jfecher commented Oct 17, 2023

After some debugging, it looks like the rc == 1 checks in the brillig vm are all being seen as array_size == 1 or 5000 == 1 in this case, so they're never triggered. In the most recent example above.

@sirasistant
Copy link
Contributor

After more thinking I think the approach taken for brillig_gen won't work. If we add one more register to store the reference count, consider the following SSA:

brillig fn foo f1 {
  b0(v0: [Field; 5000]):
    v57 = allocate
    store v0 at v57
    inc_rc v0
    v58 = allocate
    store Field 0 at v58
    inc_rc v0
    jmp b1(Field 0)
  b1(v6: Field):
    v59 = lt v6, Field 5000
    jmpif v59 then: b2, else: b3
  b2():
    v61 = load v57
    v62 = array_set v61, index v6, value Field 2
    v63 = add v6, Field 1
    store v62 at v57
    v64 = load v58
    v65 = array_get v62, index v6
    v66 = add v64, v65
    store v66 at v58
    jmp b1(v63)
  b3():
    v60 = load v58
    return v60
}

v0 would get assigned two registers, let's say, R0 and R1. R0 would have the pointer to the items and R1 would have the reference count value. When storing v0 inside v57, we'd store in memory the pointer and the initial RC value (1). When processing inc_rc v0 we'd increase R1 but the reference counter stored in memory in the previous instruction wouldn't get incremented.
I see two approaches to be able to implement this:

  • make arrays and slices a reference structure. They would take a single register that would be a reference to (pointer, reference_count) or (pointer, size, reference_count) in the case of slices.
  • make reference count a reference itself, so it's shared across all places the array is stored

@jfecher
Copy link
Contributor Author

jfecher commented Oct 18, 2023

@sirasistant ah, good point. Instead of storing the rc in a register then, I'll try out the original idea I had which was to store it before the first element of the array.

@Savio-Sou
Copy link
Collaborator

Superseded by #3522.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants