Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

callbacks: Improve performance of _store_outcomes by ~35x #232

Merged
merged 9 commits into from
Apr 9, 2023

Conversation

EwoutH
Copy link
Collaborator

@EwoutH EwoutH commented Mar 19, 2023

On a performance profile of the optimized lake model I noticed that the _store_case function took up a large portion of the runtime (46.4% of 5000 runs, or 9535ms).

This commit optimizes the DefaultCallback class, including its _store_case function by:

  • Replacing Pandas DataFrames with NumPy structured arrays for faster data access and manipulation.
  • Using NumPy functions and data structures where possible.
  • Reducing unnecessary function calls.

This results the _store_case function only needing 267ms (or 2.3% of the runtime) on 5000 experiment runs, a speedup of ~35x. When experimenting with the lake_problem a total speedup of about 60% will be achieved (500 to 800 it/s on my setup). The faster/simpler the model experimented with, the more of this speedup will be noticed.

On a performance profile of the optimized lake model I noticed that the _store_case function took up a large portion of the runtime (46.4% of 5000 runs, or 9535ms).

This commit optimizes the DefaultCallback class by:
- Replacing Pandas DataFrames with NumPy structured arrays for faster data access and manipulation.
- Using NumPy functions and data structures where possible.
- Reducing unnecessary function calls.

This results the _store_case function only needing 267ms (or 2.3% of the runtime) on 5000 experiment runs, a speedup of ~35x. When experimenting with the lake_problem a total speedup of about 60% will be achieved (500 to 800 it/s on my setup). The faster/simpler the model experimented with, the more of this speedup will be noticed.
@EwoutH EwoutH added this to the 2.4.0 milestone Mar 19, 2023
@EwoutH EwoutH requested a review from quaquel March 19, 2023 12:29
@EwoutH
Copy link
Collaborator Author

EwoutH commented Mar 19, 2023

I needed to make a few small adjustments to make the test set pass, mainly in transforming the data.

I don't think this breaks any (important) behaviour, but please review carefully, especially the changes to the test code.

@EwoutH
Copy link
Collaborator Author

EwoutH commented Mar 21, 2023

@quaquel did you have a chance to look at this PR?

@quaquel
Copy link
Owner

quaquel commented Mar 21, 2023

Not yet. Focus has been on fixing pynetlogo.

@quaquel
Copy link
Owner

quaquel commented Mar 21, 2023

I don't see anything obviously strange with this suggested modification.

Out of curiosity, why did you choose to use recarrays? Looking at my old code, I see some obvious places for performance increases, even when sticking to pandas.

@EwoutH
Copy link
Collaborator Author

EwoutH commented Mar 21, 2023

Out of curiosity, why did you choose to use recarrays?

I tried some different options, those benchmarked the fastest. It was actually suggested by ChatGPT!

While Pandas DataFrames are way more flexible, in general it's also slower than any kind of NumPy array. I think in this case a recarray is a nice middle ground, it's very fast but also keeps the code human readable and interpretable.

The biggest performance win is in initializing the empty array from the start, and filling it a full row at the time. The .at lookups in Pandas are also expensive, especially the larger number when iterating through the dictionary.

@quaquel
Copy link
Owner

quaquel commented Mar 21, 2023

I ran a quick benchmark this afternoon focusing on getting rid of the at statements, but trying to set the full row in one go is actually slower than looping, as far as I can tell. See below the code I tested. I want to run a few more tests and do some line profiling to fully understand what is going on here.

Way back when I used recarrays (because pandas did not exist when I started on the workbench) before shifting to pandas a few years back for the 2.x branch of the workbench.

def _store_case(experiment):
    exp = experiment.scenario.data | experiment.policy.data
    
    exp["scenario"] = experiment.scenario.name
    exp["policy"] = experiment.policy.name
    exp["model"] = experiment.model_name
    
    index = experiment.experiment_id
    
    cases.loc[index] = exp

@quaquel
Copy link
Owner

quaquel commented Mar 21, 2023

Ok, I did some research

def _store_case(experiment):
    exp = experiment.scenario.data | experiment.policy.data
    
    exp["scenario"] = experiment.scenario.name
    exp["policy"] = experiment.policy.name
    exp["model"] = experiment.model_name
    
    index = experiment.experiment_id
    
    cases.replace({index:exp})

is 50% faster than the current implementation of _store_case. Was based on this blog post.

Your recarray based implementation, on _store_case is still an order of magnitude faster. So it seems worthwhile to use a recarray as an internal object for fast assignment.

@quaquel
Copy link
Owner

quaquel commented Mar 21, 2023

I noticed one potential problem with your implementation

        self.cases[index] = (
            tuple(scenario.values())
            + tuple(policy.values())
            + (scenario.name, policy.name, experiment.model_name)
        )

There is no guarantee whatsoever that scenarios.values() will return the values in the same order as expected by the columns. The order of values can even differ from one experiment to the next. To ensure alignment, you must extract the scenario values by looping over the keys in the correct order. Alternatively, the underlying implementation of Scenario and Policy could shift to an OrderedDict, which also ensures a predictable order.

@EwoutH
Copy link
Collaborator Author

EwoutH commented Mar 21, 2023

Thanks for the careful review, and interesting to see the deep dive!

There is no guarantee whatsoever that scenarios.values() will return the values in the same order as expected by the columns.

Good catch! Since Python 3.7, dictionaries are guaranteed to be insertion ordered. Which means as long as we always insert them in this order, we should be fine (maybe we can even use tuples in that case). Are we currently doing that?

@quaquel
Copy link
Owner

quaquel commented Mar 21, 2023

I did not know about the 3.7 change. That is good to know.

That is not a trivial question. I had to go through the code quite a bit to answer this. But yes, a fixed order is used for sampling, and this order is identical to the order of uncertainties/levers that are passed to callback.

However, you can create custom scenarios/policies, and in that case, the order is not guaranteed

Going to

        self.cases[index] = (
            tuple([scenario[u] for u in self.uncertainties])
            + tuple([policy[l] for l in self.levers])
            + (scenario.name, policy.name, experiment.model_name)
        )

to guarantee order, with self.uncertainties and self.levers in the init

        self.uncertainties = [u.name for u in uncertainties]
        self.levers = [l.name for l in levers]

solves this and is even marginally faster.

…vers

Co-Authored-By: Jan Kwakkel <j.h.kwakkel@tudelft.nl>
@EwoutH
Copy link
Collaborator Author

EwoutH commented Mar 22, 2023

Thanks, implemented the suggested changes!

PS: When reviewing a PR, the GitHub UI has a very neat feature that you can actually suggest code changes which can be accepted with a single click. There is a little button you have to spot, you can see it in step 3 of Reviewing proposed changes in a pull request. Even multi-line code suggestions are possible!

@quaquel
Copy link
Owner

quaquel commented Mar 23, 2023

I quickly tested your code on some of the examples. The experiment data frame is wrong. It has a multi-index over the columns, which should not be there. It will break most of the analysis package. The content is also full of None's.

@EwoutH
Copy link
Collaborator Author

EwoutH commented Mar 23, 2023

Thanks for reviewing. How well is this currently covered by the tests?

I will dive a bit deeper into it.

@EwoutH
Copy link
Collaborator Author

EwoutH commented Mar 23, 2023

Okay I dove in deeper, validated you behaviour and I think the current limitation is that we are updating a DataFrame each experiment using an expensive .at lookup. We can basically do (some combination) of the following things:

  • Update less often, ideally only once
  • Use a more performant data structure, like a NumPy array

Both require however more extensive changes to the surrounding code and tests, so I would propose we discuss this sometime in person.

In the mean time, I have cherrypicked the speedups in util.combine() in this PR: #233. Those changes are far less extensive and can already provide a nice speedup to that function.

Also checkout https://github.com/quaquel/epa1361/pull/4, I validated that PR well and I think it might be able to open some new possibilities for the course since it's so much faster.

@quaquel
Copy link
Owner

quaquel commented Mar 24, 2023

It is clear that some tests are missing, so that is the logical next step.

I do like moving to a more performant setup with the recarray.

@EwoutH
Copy link
Collaborator Author

EwoutH commented Mar 24, 2023

Are you back in The Netherlands already? Can we meet sometime next week?

@quaquel
Copy link
Owner

quaquel commented Apr 9, 2023

I merged master back into this pull request and resolved the discrepancies. I also moved to pd.DataFrame.from_records. So let's see if the unit tests are passing now.

make dtypes an attribute
@coveralls
Copy link

Coverage Status

Coverage: 81.152% (-0.07%) from 81.227% when pulling e2f55f6 on EwoutH:performance-1 into 081b41f on quaquel:master.

Copy link
Owner

@quaquel quaquel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some test after merging master. All unit tests, including the updated ones, pass. Also e.g. the sobol example works fine now. This is ready to be merged.

@EwoutH
Copy link
Collaborator Author

EwoutH commented Apr 9, 2023

Awesome, thanks for this! Let me do a final conformation of improved performance and then I will merge.

@EwoutH
Copy link
Collaborator Author

EwoutH commented Apr 9, 2023

New performance benchmarks. We might lost a little bit here and there, but by far the most of the performance improvement is still there. According to my profiles, it speeds up _store_outcomes by about 25x. When running a very simple model, the whole workbench is now over 2.5x as fast.

Old

  • Profile: Total time 11.172 ms (48.8%), own time 559 ms (2.4%)
  • Now runs about 15.000 iterations of the simple Python model per second.

New

  • Profile: Total time 424 ms (3.7%), own time 90 ms (0.8%).
  • Now runs about 40.000 iterations of the simple Python model per second.

@EwoutH EwoutH merged commit a608a2e into quaquel:master Apr 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants