Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache randomization calls #175

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

alwilson
Copy link
Contributor

Adds a cache of the objects generated by a do_randomize/randomize calls (VariableBoundVisitor, RandInfo, Randomizer, field_model_l, constraint_l, Boolector, and Boolector expressions) keyed on a string consisting of: randstate obj ID, PrettyPrintModel output with values included, constraint names with enable bit, and variable names with rand_mode.

Skips caching on calls using ModelGenerator or distribution constraints. ModelGenerator b/c every call is a new soft constraint, or at least that's what I saw in testing. Distribution constraints b/c it's expensive to generate the bounds, array constraints, and distribution constraints for every call just to decide not to cache.

Uses copy.deepcopy on field_model_l and constraint_l to keep constraint generation and expansion and Boolector objects independent. Everything is deepcopied except the val object inside FieldScalarModel, so that value lookups with the original rand_obj work, and the var Boolector object since that cannot be deepcopied or pickled. The do_randomize call also does some manipulation on FieldArrayModel to add a variable, latest_field_l, and a function, get_field_l, to help in returning the field_l. There are references to parents in variables which I think causes deepcopy to go much further than intended to the point of copying the entire parent rand_obj. Not sure if deepcopy is the way to go, but it was an easy way to fork the function and stop state from getting shared or overwritten.

Also adds a test for pre_randomize to exercise disabling/enabling constraints and make sure the caching doesn't mix them up.

Results in a modest speed up on test_perf.py (taking the 3rd run to avoid import overhead):
master - 17.0s
rand_cache - 8.0s

On test_dist.py, which should be skipping caching, although still needs the PrettyPrintModel call to do so. I've seen these vary by about 1s, so I think the overhead for the pretty print hopefully is small enough that it works for now. A faster way to explore constraints would hopefully make that magnitudes faster.
master - 15.7
rand_cache - 14.9s

@mballance
Copy link
Member

@alwilson, thanks for the all the work you've done on this! This looks like a sizable body of work, and it's going to take me a little while to do a review and get my head around it.
While I'm doing so, could you collect the set of open issues that you're aware of (I saw a few 'TODO' notes in the diff)? It might also be interesting to the cache-enabled PyVSC out with the riscv-dv project to get another data point on improvements.
We will need to determine whether to have caching on by default. To this end, can you collect timing for running the test suite with cache eanbled and with cache disabled?

Thanks again for your work towards improving PyVSC's performance!

@alwilson
Copy link
Contributor Author

@mballance The riscv-dv pygen tests are looking like a good exercise of these changes! The FieldArrayModel hack I had to copy references to field_l still seems to be broken and I have more changes that seem to resolve it. The tests I tried also explode in memory usage and get slower as memory grows. It looks like their testbench deepcopies rand_obj in places, which I think is multiplying the randomization caches which already deepcopy too much. So I think there's some deepcopy work that needs to be done to prevent cases like this from happening. Also only caching after the 2nd or so call would help on memory usage for one-off calls never seen again.

I can summarize those TODO / HACK comments in another comment.

@alwilson
Copy link
Contributor Author

alwilson commented Feb 17, 2023

I tried out riscv-dv's pygen / pyflow but I've only gotten one of the tests to run successfully. I tried it on master as well and only got that first test mentioned in the README to work. I knocked down the number of instances and instructions to run a bit faster too, but the variance between successive runs is pretty wide. Hard to measure any improvement. I did add some debug statements and saw that only some randomization calls could be cached. I'll keep looking into it, but I bet it's the dist constraints and with constraints preventing reuse.
python3 run.py --test=riscv_arithmetic_basic_test --simulator=pyflow --steps=gen

Do you know what state riscv-dv tests are in? I'm wondering if I picked the wrong tests or just happened to test the tree at a bad time. The failures I saw were solver failure issues that I didn't look too far into.

After this latest commit I've cleaned up more of the TODO / HACK comments since I was so liberal with them. I've listed them below along with their file/function. Most of the concerns are around deepcopy and syncing the FieldScalarModel and FieldArrayModel made in the deepcopy with the original references so that retreiving model values work, but making it so that we can also roll back those changes on the next randomize call. I also have some concerns about abusing the PrettyPrintModel to get a hashable string for caching. Seems like there should be a faster way to do so, and one that doesn't depend on a human-readable string that might change in the future depending on debug needs. Some scattered notes on the actual randomize function since there seems to be unused features that would make caching more complicated, but are disabled for now? And possibly unused self.btor references.

field_scalar_model.py:
def __deepcopy__(self, memo):
    # TODO This is a workaround for the deepcopy in do_randomize somewhere getting
    #      access to Boolector objects, which in turn can't be deepcopied.
    result.var = None

randomizer.py:
class Randomizer(RandIF):
    def __init__(self, randstate, debug=0, lint=0, solve_fail_debug=0, 
        # TODO Reset btor cache after so many uses to circumvent Boolector instance
        #      slowing down as more expressions permanently become part of the model/formula.
        self.btor_cache_uses = 100
        
    def randomize(self, ri : RandInfo, bound_m : Dict[FieldModel,VariableBoundModel], cache_enabled: bool):
        # TODO What is going on with max_fields? It would probably
        #      break this caching setup.
        #max_fields = 20

                btor = Boolector()
                # TODO Is self.btor used anywhere?
                # self.btor = btor

                    # TODO: Is this part of a disabled feature to solve randsets together?
                    # rs_i += 1
                    if n_fields > max_fields or rs.order != -1:
                        break

                # TODO Is self.btor used anywhere?
                # self.btor = btor

                # TODO Does some of this need to be done while caching, too?
                if not cache_enabled:
                    reset_v = DynamicExprResetVisitor()
                    for f in rs.all_fields():
                        f.set_used_rand(False, 0)
                        f.dispose() # Get rid of the solver var, since we're done with it
                        f.accept(reset_v)
                    # for f in rs.nontarget_field_s:
                    #     f.dispose()
                    for c in rs.constraints():
                        c.accept(reset_v)
                    RandSetDisposeVisitor().dispose(rs)

do_randomize:
            # HACK Clear out field_l in FieldArrayModel from previous cache
            for fm in field_model_l:
                if hasattr(fm, 'field_l'):
                    for f in fm.field_l:
                        if hasattr(f, 'field_l') and hasattr(f, 'old_field_l'):
                            # Revert to original value
                            f.field_l = f.old_field_l
                        elif hasattr(f, 'field_l'):
                            # Save off old, original value
                            f.old_field_l = f.field_l

            # Create a unique string for this call based on object ids and mode bits
            # TODO Is there more than rand_mode and constraint_mode to cover here?
            # TODO Can we cache the base constraints so that with constraints have a prebuilt
            #      model and such to build off of?
            call_hash = Randomizer.get_pretty_call_hash(randstate, field_model_l_og, constraint_l_og)

            # Skip dist constraints b/c they cost building bounds and array/dist constraints first
            # HACK What's the best way to detect if there are dist constraints?
            if ' dist { ' in call_hash:
                cache_enabled = False

            # Make copy of field and constraint models, together to keep FieldScalarModels the same
            # TODO The deepcopy() in FieldScalarModel keeps the val reference, is that the best way?
            if cache_enabled:
                (field_model_l, constraint_l) = copy.deepcopy((field_model_l, constraint_l))

        # HACK Fill out field_l in FieldArrayModels so that array lookups work in model
        if cache_enabled:
           field_model_l = Randomizer.randomize_cache[call_hash].field_model_l
           for fm_new, fm_og in zip(field_model_l, field_model_l_og):
               if hasattr(fm_og, 'field_l'):
                   for f_new, f_og in zip(fm_new.field_l, fm_og.field_l):
                       if hasattr(f_og, 'field_l'):
                           f_og.field_l = f_new.field_l


def get_pretty_call_hash(randstate, field_model_l, constraint_l):
            # TODO This pretty print is an expensive call. Need a better way
            #      to construct a unique ID/hash that doesn't depend on
            #      object lifetimes. Can some of this be cached?
            call_hash += ModelPrettyPrinter.print(fm, print_values=True)

@alwilson
Copy link
Contributor Author

Yeah, the runtimes for at least that one riscv-dv test I'm running is all over the place. Hard to compare even several runtimes.

riscv_arithmetic_basic_test

1000 instructions - 1 iteration - runtime (s)
master     avg: 24.6 - 18.6, 26.1, 23.1, 30.5
rand_cache avg: 21.1 -  9.7, 31.2, 13.0, 30.5

3000 instructions - 1 iteration - runtime (s)
master     avg: 61.8 - 97.5, 22.0, 111.4, 16.1
rand_cache avg: 56.2 - 95.9, 62.8,  21.0, 45.2

alwilson added 13 commits March 7, 2023 18:42
Locally this appears to make randomization calls
25-50% faster for the first couple hundred calls,
but falls off as the Boolector instances grow.

Noted the 'and' and 'or' non-overloadable operators
and an issue with one of the examples calling
randomization() using an empty with block.
Around 100 uses seemed to be the sweet spot for the
constraints I was testing, but that probably depends
on how many Assume() and Sat() calls each randomize
call results in.
Performing deepcopy independently on fields and with constraints
resulted in different FieldScalarModels which create separate and
duplicate btor vars for the same intended var. This deepcopies
them together and expands the FieldScalarModel deepcopy to keep
val as a reference so that value lookups continue to work.
The pretty print of the fields and constraints to use as a
key in the randomization cache is expensive. Since each call
has unique objects for the fields and constraints we can use
those instead. This also grabs the obj id of the with constraint
(block?) and the expressions/constraints within.

Saving the inspect.stack() frame in the randobj_interposer causes
deepcopy/pickle issues in the Randomization call. I'm not entirely
sure why yet, but saving the filename str and lineno int in the
SourceInfo and saving that to reuse later instead seems safer and
avoids the fairly large penalty I saw in py-spy.
Usually unconstrained variables get marked as
having been randomized, but since caching reuses
them we don't want to mark them. Or at least this
fixes that, but leaving a TODO to check back.
Mulitple HACKs in this that need to be reconsidered, but this passes
all tests except the covergroup ones and the nested / sub constraint
test.
The segmented_randomization test was reusing constraints with
nested vsc.attr objects, which are non-rand and change the
constraints with each call if they are different. Added a
bit to the pretty printer to grab that.

The order of set_val in nested post_randomize calls was
setting values from the parent randomization call after
the post-rand randomization call. This worked before caching
b/c they were the same Boolector var, but now they are
distinct. Swapped the order in FieldCompositeModel since
it seemed to be backward.
I think it's possibly and intended that pre_randomize could
change the constraints for a randomize call, so it should
be called everytime before generating the call_hash.

This causes some issues with deepcopy since rand_obj
was trying to prevent deepcopying self in the parent
reference of other objects. Workaround for now is
just to skip deepcopying the Boolector var object in
FieldScalarModel. This might not be the correct thing
to do, but passes unit tests for now.
Generating the dist constraints and then trying to grab
the object ids for ConstraintOverrideModels inside constraints
didn't seem very robust. Now it just looks for the ' dist { '
string the pretty printer uses to detect dist constraints
and skip caching on them. Seems cleaner for now to just skip them.
Testing out risc-dv discovered a memory leak issue where caching every
call and then being deepcopied would explode memory. Adding a min
number of calls before caching limits that from happening. The real
fix would be to look into deepcopy and make sure the cache is dropped
(or even shared?) when deepcopied.

The risc-dv pygen tests also discovered more issues with FieldArrayModel
and syncing the top level references to FieldScalarModel to the cached /
deepcopied instances. This still looks hacky, but is restricted to just
the Randomizer class when caching is enabled.

Lots of little whitespace and TODO/HACK message cleanup.
Sometimes the rand stability test would fail due to a call hash
repeating triggering the cache_enabled path. This is probably due
to the randstate obj ID being reused, so this adds the initial seed
to that as well. Not sure if this is the proper fix yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants