callbacks: Improve performance of _store_outcomes by ~35x #232

EwoutH · 2023-03-19T12:29:47Z

On a performance profile of the optimized lake model I noticed that the _store_case function took up a large portion of the runtime (46.4% of 5000 runs, or 9535ms).

This commit optimizes the DefaultCallback class, including its _store_case function by:

Replacing Pandas DataFrames with NumPy structured arrays for faster data access and manipulation.
Using NumPy functions and data structures where possible.
Reducing unnecessary function calls.

This results the _store_case function only needing 267ms (or 2.3% of the runtime) on 5000 experiment runs, a speedup of ~35x. When experimenting with the lake_problem a total speedup of about 60% will be achieved (500 to 800 it/s on my setup). The faster/simpler the model experimented with, the more of this speedup will be noticed.

On a performance profile of the optimized lake model I noticed that the _store_case function took up a large portion of the runtime (46.4% of 5000 runs, or 9535ms). This commit optimizes the DefaultCallback class by: - Replacing Pandas DataFrames with NumPy structured arrays for faster data access and manipulation. - Using NumPy functions and data structures where possible. - Reducing unnecessary function calls. This results the _store_case function only needing 267ms (or 2.3% of the runtime) on 5000 experiment runs, a speedup of ~35x. When experimenting with the lake_problem a total speedup of about 60% will be achieved (500 to 800 it/s on my setup). The faster/simpler the model experimented with, the more of this speedup will be noticed.

for more information, see https://pre-commit.ci

Return a correctly indexed DataFrame and make some small adjustments to the test set. Please review carefully.

for more information, see https://pre-commit.ci

EwoutH · 2023-03-19T13:40:48Z

I needed to make a few small adjustments to make the test set pass, mainly in transforming the data.

I don't think this breaks any (important) behaviour, but please review carefully, especially the changes to the test code.

EwoutH · 2023-03-21T07:13:15Z

@quaquel did you have a chance to look at this PR?

quaquel · 2023-03-21T07:16:36Z

Not yet. Focus has been on fixing pynetlogo.

quaquel · 2023-03-21T14:59:27Z

I don't see anything obviously strange with this suggested modification.

Out of curiosity, why did you choose to use recarrays? Looking at my old code, I see some obvious places for performance increases, even when sticking to pandas.

EwoutH · 2023-03-21T17:33:51Z

Out of curiosity, why did you choose to use recarrays?

I tried some different options, those benchmarked the fastest. It was actually suggested by ChatGPT!

While Pandas DataFrames are way more flexible, in general it's also slower than any kind of NumPy array. I think in this case a recarray is a nice middle ground, it's very fast but also keeps the code human readable and interpretable.

The biggest performance win is in initializing the empty array from the start, and filling it a full row at the time. The .at lookups in Pandas are also expensive, especially the larger number when iterating through the dictionary.

quaquel · 2023-03-21T19:26:00Z

I ran a quick benchmark this afternoon focusing on getting rid of the at statements, but trying to set the full row in one go is actually slower than looping, as far as I can tell. See below the code I tested. I want to run a few more tests and do some line profiling to fully understand what is going on here.

Way back when I used recarrays (because pandas did not exist when I started on the workbench) before shifting to pandas a few years back for the 2.x branch of the workbench.

def _store_case(experiment):
    exp = experiment.scenario.data | experiment.policy.data
    
    exp["scenario"] = experiment.scenario.name
    exp["policy"] = experiment.policy.name
    exp["model"] = experiment.model_name
    
    index = experiment.experiment_id
    
    cases.loc[index] = exp

quaquel · 2023-03-21T19:42:55Z

Ok, I did some research

def _store_case(experiment):
    exp = experiment.scenario.data | experiment.policy.data
    
    exp["scenario"] = experiment.scenario.name
    exp["policy"] = experiment.policy.name
    exp["model"] = experiment.model_name
    
    index = experiment.experiment_id
    
    cases.replace({index:exp})

is 50% faster than the current implementation of _store_case. Was based on this blog post.

Your recarray based implementation, on _store_case is still an order of magnitude faster. So it seems worthwhile to use a recarray as an internal object for fast assignment.

quaquel · 2023-03-21T19:53:15Z

I noticed one potential problem with your implementation

        self.cases[index] = (
            tuple(scenario.values())
            + tuple(policy.values())
            + (scenario.name, policy.name, experiment.model_name)
        )

There is no guarantee whatsoever that scenarios.values() will return the values in the same order as expected by the columns. The order of values can even differ from one experiment to the next. To ensure alignment, you must extract the scenario values by looping over the keys in the correct order. Alternatively, the underlying implementation of Scenario and Policy could shift to an OrderedDict, which also ensures a predictable order.

EwoutH · 2023-03-21T20:27:25Z

Thanks for the careful review, and interesting to see the deep dive!

There is no guarantee whatsoever that scenarios.values() will return the values in the same order as expected by the columns.

Good catch! Since Python 3.7, dictionaries are guaranteed to be insertion ordered. Which means as long as we always insert them in this order, we should be fine (maybe we can even use tuples in that case). Are we currently doing that?

quaquel · 2023-03-21T20:43:40Z

I did not know about the 3.7 change. That is good to know.

That is not a trivial question. I had to go through the code quite a bit to answer this. But yes, a fixed order is used for sampling, and this order is identical to the order of uncertainties/levers that are passed to callback.

However, you can create custom scenarios/policies, and in that case, the order is not guaranteed

Going to

        self.cases[index] = (
            tuple([scenario[u] for u in self.uncertainties])
            + tuple([policy[l] for l in self.levers])
            + (scenario.name, policy.name, experiment.model_name)
        )

to guarantee order, with self.uncertainties and self.levers in the init

        self.uncertainties = [u.name for u in uncertainties]
        self.levers = [l.name for l in levers]

solves this and is even marginally faster.

ema_workbench/em_framework/callbacks.py

…vers Co-Authored-By: Jan Kwakkel <j.h.kwakkel@tudelft.nl>

EwoutH · 2023-03-22T20:22:41Z

Thanks, implemented the suggested changes!

PS: When reviewing a PR, the GitHub UI has a very neat feature that you can actually suggest code changes which can be accepted with a single click. There is a little button you have to spot, you can see it in step 3 of Reviewing proposed changes in a pull request. Even multi-line code suggestions are possible!

quaquel · 2023-03-23T07:11:13Z

I quickly tested your code on some of the examples. The experiment data frame is wrong. It has a multi-index over the columns, which should not be there. It will break most of the analysis package. The content is also full of None's.

EwoutH · 2023-03-23T09:23:51Z

Thanks for reviewing. How well is this currently covered by the tests?

I will dive a bit deeper into it.

EwoutH · 2023-03-23T11:55:36Z

Okay I dove in deeper, validated you behaviour and I think the current limitation is that we are updating a DataFrame each experiment using an expensive .at lookup. We can basically do (some combination) of the following things:

Update less often, ideally only once
Use a more performant data structure, like a NumPy array

Both require however more extensive changes to the surrounding code and tests, so I would propose we discuss this sometime in person.

In the mean time, I have cherrypicked the speedups in util.combine() in this PR: #233. Those changes are far less extensive and can already provide a nice speedup to that function.

Also checkout https://github.com/quaquel/epa1361/pull/4, I validated that PR well and I think it might be able to open some new possibilities for the course since it's so much faster.

quaquel · 2023-03-24T19:49:01Z

It is clear that some tests are missing, so that is the logical next step.

I do like moving to a more performant setup with the recarray.

EwoutH · 2023-03-24T20:56:00Z

Are you back in The Netherlands already? Can we meet sometime next week?

quaquel · 2023-04-09T08:55:27Z

I merged master back into this pull request and resolved the discrepancies. I also moved to pd.DataFrame.from_records. So let's see if the unit tests are passing now.

make dtypes an attribute

coveralls · 2023-04-09T09:16:59Z

Coverage: 81.152% (-0.07%) from 81.227% when pulling e2f55f6 on EwoutH:performance-1 into 081b41f on quaquel:master.

quaquel

I did some test after merging master. All unit tests, including the updated ones, pass. Also e.g. the sobol example works fine now. This is ready to be merged.

EwoutH · 2023-04-09T15:54:31Z

Awesome, thanks for this! Let me do a final conformation of improved performance and then I will merge.

EwoutH · 2023-04-09T16:21:35Z

New performance benchmarks. We might lost a little bit here and there, but by far the most of the performance improvement is still there. According to my profiles, it speeds up _store_outcomes by about 25x. When running a very simple model, the whole workbench is now over 2.5x as fast.

Old

Profile: Total time 11.172 ms (48.8%), own time 559 ms (2.4%)
Now runs about 15.000 iterations of the simple Python model per second.

New

Profile: Total time 424 ms (3.7%), own time 90 ms (0.8%).
Now runs about 40.000 iterations of the simple Python model per second.

EwoutH added enhancement performance labels Mar 19, 2023

EwoutH added this to the 2.4.0 milestone Mar 19, 2023

EwoutH requested a review from quaquel March 19, 2023 12:29

pre-commit-ci bot and others added 5 commits March 19, 2023 12:30

[pre-commit.ci] auto fixes from pre-commit.com hooks

56de0c9

for more information, see https://pre-commit.ci

callbacks: Return a DataFrame instead of a NumPy array

bcd26ba

Return correct DataFrame and update test set

9191e69

Return a correctly indexed DataFrame and make some small adjustments to the test set. Please review carefully.

[pre-commit.ci] auto fixes from pre-commit.com hooks

097275a

for more information, see https://pre-commit.ci

callback: Replace NaN with None

b387c1d

quaquel reviewed Mar 22, 2023

View reviewed changes

ema_workbench/em_framework/callbacks.py Show resolved Hide resolved

callbacks: Update _store_case to ensure order of uncertainties and le…

2777ad8

…vers Co-Authored-By: Jan Kwakkel <j.h.kwakkel@tudelft.nl>

EwoutH mentioned this pull request Mar 23, 2023

util: Speed up combine by ~8x #233

Merged

quaquel mentioned this pull request Mar 28, 2023

Pairs plotting dependent on load_results #236

Closed

quaquel mentioned this pull request Apr 3, 2023

test for get_results and categorical fix #241

Merged

Merge branch 'master' into performance-1

e1ca551

Update callbacks.py

e2f55f6

make dtypes an attribute

quaquel approved these changes Apr 9, 2023

View reviewed changes

EwoutH merged commit a608a2e into quaquel:master Apr 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

callbacks: Improve performance of _store_outcomes by ~35x #232

callbacks: Improve performance of _store_outcomes by ~35x #232

EwoutH commented Mar 19, 2023

EwoutH commented Mar 19, 2023

EwoutH commented Mar 21, 2023

quaquel commented Mar 21, 2023

quaquel commented Mar 21, 2023

EwoutH commented Mar 21, 2023

quaquel commented Mar 21, 2023

quaquel commented Mar 21, 2023 •

edited

Loading

quaquel commented Mar 21, 2023

EwoutH commented Mar 21, 2023

quaquel commented Mar 21, 2023 •

edited

Loading

EwoutH commented Mar 22, 2023

quaquel commented Mar 23, 2023

EwoutH commented Mar 23, 2023

EwoutH commented Mar 23, 2023

quaquel commented Mar 24, 2023

EwoutH commented Mar 24, 2023

quaquel commented Apr 9, 2023

coveralls commented Apr 9, 2023

quaquel left a comment

EwoutH commented Apr 9, 2023

EwoutH commented Apr 9, 2023

callbacks: Improve performance of _store_outcomes by ~35x #232

callbacks: Improve performance of _store_outcomes by ~35x #232

Conversation

EwoutH commented Mar 19, 2023

EwoutH commented Mar 19, 2023

EwoutH commented Mar 21, 2023

quaquel commented Mar 21, 2023

quaquel commented Mar 21, 2023

EwoutH commented Mar 21, 2023

quaquel commented Mar 21, 2023

quaquel commented Mar 21, 2023 • edited Loading

quaquel commented Mar 21, 2023

EwoutH commented Mar 21, 2023

quaquel commented Mar 21, 2023 • edited Loading

EwoutH commented Mar 22, 2023

quaquel commented Mar 23, 2023

EwoutH commented Mar 23, 2023

EwoutH commented Mar 23, 2023

quaquel commented Mar 24, 2023

EwoutH commented Mar 24, 2023

quaquel commented Apr 9, 2023

coveralls commented Apr 9, 2023

quaquel left a comment

Choose a reason for hiding this comment

EwoutH commented Apr 9, 2023

EwoutH commented Apr 9, 2023

quaquel commented Mar 21, 2023 •

edited

Loading

quaquel commented Mar 21, 2023 •

edited

Loading