Cache randomization calls #175

alwilson · 2023-02-16T00:48:28Z

Adds a cache of the objects generated by a do_randomize/randomize calls (VariableBoundVisitor, RandInfo, Randomizer, field_model_l, constraint_l, Boolector, and Boolector expressions) keyed on a string consisting of: randstate obj ID, PrettyPrintModel output with values included, constraint names with enable bit, and variable names with rand_mode.

Skips caching on calls using ModelGenerator or distribution constraints. ModelGenerator b/c every call is a new soft constraint, or at least that's what I saw in testing. Distribution constraints b/c it's expensive to generate the bounds, array constraints, and distribution constraints for every call just to decide not to cache.

Uses copy.deepcopy on field_model_l and constraint_l to keep constraint generation and expansion and Boolector objects independent. Everything is deepcopied except the val object inside FieldScalarModel, so that value lookups with the original rand_obj work, and the var Boolector object since that cannot be deepcopied or pickled. The do_randomize call also does some manipulation on FieldArrayModel to add a variable, latest_field_l, and a function, get_field_l, to help in returning the field_l. There are references to parents in variables which I think causes deepcopy to go much further than intended to the point of copying the entire parent rand_obj. Not sure if deepcopy is the way to go, but it was an easy way to fork the function and stop state from getting shared or overwritten.

Also adds a test for pre_randomize to exercise disabling/enabling constraints and make sure the caching doesn't mix them up.

Results in a modest speed up on test_perf.py (taking the 3rd run to avoid import overhead):
master - 17.0s
rand_cache - 8.0s

On test_dist.py, which should be skipping caching, although still needs the PrettyPrintModel call to do so. I've seen these vary by about 1s, so I think the overhead for the pretty print hopefully is small enough that it works for now. A faster way to explore constraints would hopefully make that magnitudes faster.
master - 15.7
rand_cache - 14.9s

mballance · 2023-02-16T01:08:42Z

@alwilson, thanks for the all the work you've done on this! This looks like a sizable body of work, and it's going to take me a little while to do a review and get my head around it.
While I'm doing so, could you collect the set of open issues that you're aware of (I saw a few 'TODO' notes in the diff)? It might also be interesting to the cache-enabled PyVSC out with the riscv-dv project to get another data point on improvements.
We will need to determine whether to have caching on by default. To this end, can you collect timing for running the test suite with cache eanbled and with cache disabled?

Thanks again for your work towards improving PyVSC's performance!

alwilson · 2023-02-16T17:48:40Z

@mballance The riscv-dv pygen tests are looking like a good exercise of these changes! The FieldArrayModel hack I had to copy references to field_l still seems to be broken and I have more changes that seem to resolve it. The tests I tried also explode in memory usage and get slower as memory grows. It looks like their testbench deepcopies rand_obj in places, which I think is multiplying the randomization caches which already deepcopy too much. So I think there's some deepcopy work that needs to be done to prevent cases like this from happening. Also only caching after the 2nd or so call would help on memory usage for one-off calls never seen again.

I can summarize those TODO / HACK comments in another comment.

alwilson · 2023-02-17T20:28:10Z

I tried out riscv-dv's pygen / pyflow but I've only gotten one of the tests to run successfully. I tried it on master as well and only got that first test mentioned in the README to work. I knocked down the number of instances and instructions to run a bit faster too, but the variance between successive runs is pretty wide. Hard to measure any improvement. I did add some debug statements and saw that only some randomization calls could be cached. I'll keep looking into it, but I bet it's the dist constraints and with constraints preventing reuse.
python3 run.py --test=riscv_arithmetic_basic_test --simulator=pyflow --steps=gen

Do you know what state riscv-dv tests are in? I'm wondering if I picked the wrong tests or just happened to test the tree at a bad time. The failures I saw were solver failure issues that I didn't look too far into.

After this latest commit I've cleaned up more of the TODO / HACK comments since I was so liberal with them. I've listed them below along with their file/function. Most of the concerns are around deepcopy and syncing the FieldScalarModel and FieldArrayModel made in the deepcopy with the original references so that retreiving model values work, but making it so that we can also roll back those changes on the next randomize call. I also have some concerns about abusing the PrettyPrintModel to get a hashable string for caching. Seems like there should be a faster way to do so, and one that doesn't depend on a human-readable string that might change in the future depending on debug needs. Some scattered notes on the actual randomize function since there seems to be unused features that would make caching more complicated, but are disabled for now? And possibly unused self.btor references.

field_scalar_model.py:
def __deepcopy__(self, memo):
    # TODO This is a workaround for the deepcopy in do_randomize somewhere getting
    #      access to Boolector objects, which in turn can't be deepcopied.
    result.var = None

randomizer.py:
class Randomizer(RandIF):
    def __init__(self, randstate, debug=0, lint=0, solve_fail_debug=0, 
        # TODO Reset btor cache after so many uses to circumvent Boolector instance
        #      slowing down as more expressions permanently become part of the model/formula.
        self.btor_cache_uses = 100
        
    def randomize(self, ri : RandInfo, bound_m : Dict[FieldModel,VariableBoundModel], cache_enabled: bool):
        # TODO What is going on with max_fields? It would probably
        #      break this caching setup.
        #max_fields = 20

                btor = Boolector()
                # TODO Is self.btor used anywhere?
                # self.btor = btor

                    # TODO: Is this part of a disabled feature to solve randsets together?
                    # rs_i += 1
                    if n_fields > max_fields or rs.order != -1:
                        break

                # TODO Is self.btor used anywhere?
                # self.btor = btor

                # TODO Does some of this need to be done while caching, too?
                if not cache_enabled:
                    reset_v = DynamicExprResetVisitor()
                    for f in rs.all_fields():
                        f.set_used_rand(False, 0)
                        f.dispose() # Get rid of the solver var, since we're done with it
                        f.accept(reset_v)
                    # for f in rs.nontarget_field_s:
                    #     f.dispose()
                    for c in rs.constraints():
                        c.accept(reset_v)
                    RandSetDisposeVisitor().dispose(rs)

do_randomize:
            # HACK Clear out field_l in FieldArrayModel from previous cache
            for fm in field_model_l:
                if hasattr(fm, 'field_l'):
                    for f in fm.field_l:
                        if hasattr(f, 'field_l') and hasattr(f, 'old_field_l'):
                            # Revert to original value
                            f.field_l = f.old_field_l
                        elif hasattr(f, 'field_l'):
                            # Save off old, original value
                            f.old_field_l = f.field_l

            # Create a unique string for this call based on object ids and mode bits
            # TODO Is there more than rand_mode and constraint_mode to cover here?
            # TODO Can we cache the base constraints so that with constraints have a prebuilt
            #      model and such to build off of?
            call_hash = Randomizer.get_pretty_call_hash(randstate, field_model_l_og, constraint_l_og)

            # Skip dist constraints b/c they cost building bounds and array/dist constraints first
            # HACK What's the best way to detect if there are dist constraints?
            if ' dist { ' in call_hash:
                cache_enabled = False

            # Make copy of field and constraint models, together to keep FieldScalarModels the same
            # TODO The deepcopy() in FieldScalarModel keeps the val reference, is that the best way?
            if cache_enabled:
                (field_model_l, constraint_l) = copy.deepcopy((field_model_l, constraint_l))

        # HACK Fill out field_l in FieldArrayModels so that array lookups work in model
        if cache_enabled:
           field_model_l = Randomizer.randomize_cache[call_hash].field_model_l
           for fm_new, fm_og in zip(field_model_l, field_model_l_og):
               if hasattr(fm_og, 'field_l'):
                   for f_new, f_og in zip(fm_new.field_l, fm_og.field_l):
                       if hasattr(f_og, 'field_l'):
                           f_og.field_l = f_new.field_l


def get_pretty_call_hash(randstate, field_model_l, constraint_l):
            # TODO This pretty print is an expensive call. Need a better way
            #      to construct a unique ID/hash that doesn't depend on
            #      object lifetimes. Can some of this be cached?
            call_hash += ModelPrettyPrinter.print(fm, print_values=True)

alwilson · 2023-02-17T20:52:54Z

Yeah, the runtimes for at least that one riscv-dv test I'm running is all over the place. Hard to compare even several runtimes.

riscv_arithmetic_basic_test

1000 instructions - 1 iteration - runtime (s)
master     avg: 24.6 - 18.6, 26.1, 23.1, 30.5
rand_cache avg: 21.1 -  9.7, 31.2, 13.0, 30.5

3000 instructions - 1 iteration - runtime (s)
master     avg: 61.8 - 97.5, 22.0, 111.4, 16.1
rand_cache avg: 56.2 - 95.9, 62.8,  21.0, 45.2

Locally this appears to make randomization calls 25-50% faster for the first couple hundred calls, but falls off as the Boolector instances grow. Noted the 'and' and 'or' non-overloadable operators and an issue with one of the examples calling randomization() using an empty with block.

Around 100 uses seemed to be the sweet spot for the constraints I was testing, but that probably depends on how many Assume() and Sat() calls each randomize call results in.

Performing deepcopy independently on fields and with constraints resulted in different FieldScalarModels which create separate and duplicate btor vars for the same intended var. This deepcopies them together and expands the FieldScalarModel deepcopy to keep val as a reference so that value lookups continue to work.

The pretty print of the fields and constraints to use as a key in the randomization cache is expensive. Since each call has unique objects for the fields and constraints we can use those instead. This also grabs the obj id of the with constraint (block?) and the expressions/constraints within. Saving the inspect.stack() frame in the randobj_interposer causes deepcopy/pickle issues in the Randomization call. I'm not entirely sure why yet, but saving the filename str and lineno int in the SourceInfo and saving that to reuse later instead seems safer and avoids the fairly large penalty I saw in py-spy.

Usually unconstrained variables get marked as having been randomized, but since caching reuses them we don't want to mark them. Or at least this fixes that, but leaving a TODO to check back.

Mulitple HACKs in this that need to be reconsidered, but this passes all tests except the covergroup ones and the nested / sub constraint test.

The segmented_randomization test was reusing constraints with nested vsc.attr objects, which are non-rand and change the constraints with each call if they are different. Added a bit to the pretty printer to grab that. The order of set_val in nested post_randomize calls was setting values from the parent randomization call after the post-rand randomization call. This worked before caching b/c they were the same Boolector var, but now they are distinct. Swapped the order in FieldCompositeModel since it seemed to be backward.

I think it's possibly and intended that pre_randomize could change the constraints for a randomize call, so it should be called everytime before generating the call_hash. This causes some issues with deepcopy since rand_obj was trying to prevent deepcopying self in the parent reference of other objects. Workaround for now is just to skip deepcopying the Boolector var object in FieldScalarModel. This might not be the correct thing to do, but passes unit tests for now.

Generating the dist constraints and then trying to grab the object ids for ConstraintOverrideModels inside constraints didn't seem very robust. Now it just looks for the ' dist { ' string the pretty printer uses to detect dist constraints and skip caching on them. Seems cleaner for now to just skip them.

Testing out risc-dv discovered a memory leak issue where caching every call and then being deepcopied would explode memory. Adding a min number of calls before caching limits that from happening. The real fix would be to look into deepcopy and make sure the cache is dropped (or even shared?) when deepcopied. The risc-dv pygen tests also discovered more issues with FieldArrayModel and syncing the top level references to FieldScalarModel to the cached / deepcopied instances. This still looks hacky, but is restricted to just the Randomizer class when caching is enabled. Lots of little whitespace and TODO/HACK message cleanup.

Sometimes the rand stability test would fail due to a call hash repeating triggering the cache_enabled path. This is probably due to the randstate obj ID being reused, so this adds the initial seed to that as well. Not sure if this is the proper fix yet.

alwilson force-pushed the rand_cache branch from 860a115 to aa54915 Compare February 22, 2023 17:18

alwilson added 13 commits March 7, 2023 18:42

Drop randomization cache key to mitigate Boolector model slowdown

5819bb9

Around 100 uses seemed to be the sweet spot for the constraints I was testing, but that probably depends on how many Assume() and Sat() calls each randomize call results in.

Fix re-randomizing unconstrained variables

ba1e85c

Usually unconstrained variables get marked as having been randomized, but since caching reuses them we don't want to mark them. Or at least this fixes that, but leaving a TODO to check back.

Fix FieldArrayModel accesses; Dist constraint caching; Misc cleanup

01800dc

Mulitple HACKs in this that need to be reconsidered, but this passes all tests except the covergroup ones and the nested / sub constraint test.

Disable randomize cache for GeneratorModel; Another PrettyPrint change

5d693a1

Add cache_enabled bit

8841f32

alwilson force-pushed the rand_cache branch from aa54915 to cc0f365 Compare March 8, 2023 01:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache randomization calls #175

Cache randomization calls #175

alwilson commented Feb 16, 2023

mballance commented Feb 16, 2023

alwilson commented Feb 16, 2023

alwilson commented Feb 17, 2023 •

edited

Loading

alwilson commented Feb 17, 2023

Cache randomization calls #175

Are you sure you want to change the base?

Cache randomization calls #175

Conversation

alwilson commented Feb 16, 2023

mballance commented Feb 16, 2023

alwilson commented Feb 16, 2023

alwilson commented Feb 17, 2023 • edited Loading

alwilson commented Feb 17, 2023

alwilson commented Feb 17, 2023 •

edited

Loading