-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
callbacks: Improve performance of _store_outcomes by ~35x #232
Conversation
On a performance profile of the optimized lake model I noticed that the _store_case function took up a large portion of the runtime (46.4% of 5000 runs, or 9535ms). This commit optimizes the DefaultCallback class by: - Replacing Pandas DataFrames with NumPy structured arrays for faster data access and manipulation. - Using NumPy functions and data structures where possible. - Reducing unnecessary function calls. This results the _store_case function only needing 267ms (or 2.3% of the runtime) on 5000 experiment runs, a speedup of ~35x. When experimenting with the lake_problem a total speedup of about 60% will be achieved (500 to 800 it/s on my setup). The faster/simpler the model experimented with, the more of this speedup will be noticed.
for more information, see https://pre-commit.ci
Return a correctly indexed DataFrame and make some small adjustments to the test set. Please review carefully.
for more information, see https://pre-commit.ci
I needed to make a few small adjustments to make the test set pass, mainly in transforming the data. I don't think this breaks any (important) behaviour, but please review carefully, especially the changes to the test code. |
@quaquel did you have a chance to look at this PR? |
Not yet. Focus has been on fixing pynetlogo. |
I don't see anything obviously strange with this suggested modification. Out of curiosity, why did you choose to use recarrays? Looking at my old code, I see some obvious places for performance increases, even when sticking to pandas. |
I tried some different options, those benchmarked the fastest. It was actually suggested by ChatGPT! While Pandas DataFrames are way more flexible, in general it's also slower than any kind of NumPy array. I think in this case a recarray is a nice middle ground, it's very fast but also keeps the code human readable and interpretable. The biggest performance win is in initializing the empty array from the start, and filling it a full row at the time. The |
I ran a quick benchmark this afternoon focusing on getting rid of the at statements, but trying to set the full row in one go is actually slower than looping, as far as I can tell. See below the code I tested. I want to run a few more tests and do some line profiling to fully understand what is going on here. Way back when I used recarrays (because pandas did not exist when I started on the workbench) before shifting to pandas a few years back for the 2.x branch of the workbench. def _store_case(experiment):
exp = experiment.scenario.data | experiment.policy.data
exp["scenario"] = experiment.scenario.name
exp["policy"] = experiment.policy.name
exp["model"] = experiment.model_name
index = experiment.experiment_id
cases.loc[index] = exp |
Ok, I did some research def _store_case(experiment):
exp = experiment.scenario.data | experiment.policy.data
exp["scenario"] = experiment.scenario.name
exp["policy"] = experiment.policy.name
exp["model"] = experiment.model_name
index = experiment.experiment_id
cases.replace({index:exp}) is 50% faster than the current implementation of _store_case. Was based on this blog post. Your recarray based implementation, on _store_case is still an order of magnitude faster. So it seems worthwhile to use a recarray as an internal object for fast assignment. |
I noticed one potential problem with your implementation self.cases[index] = (
tuple(scenario.values())
+ tuple(policy.values())
+ (scenario.name, policy.name, experiment.model_name)
) There is no guarantee whatsoever that scenarios.values() will return the values in the same order as expected by the columns. The order of values can even differ from one experiment to the next. To ensure alignment, you must extract the scenario values by looping over the keys in the correct order. Alternatively, the underlying implementation of Scenario and Policy could shift to an OrderedDict, which also ensures a predictable order. |
Thanks for the careful review, and interesting to see the deep dive!
Good catch! Since Python 3.7, dictionaries are guaranteed to be insertion ordered. Which means as long as we always insert them in this order, we should be fine (maybe we can even use tuples in that case). Are we currently doing that? |
I did not know about the 3.7 change. That is good to know. That is not a trivial question. I had to go through the code quite a bit to answer this. But yes, a fixed order is used for sampling, and this order is identical to the order of uncertainties/levers that are passed to callback. However, you can create custom scenarios/policies, and in that case, the order is not guaranteed Going to self.cases[index] = (
tuple([scenario[u] for u in self.uncertainties])
+ tuple([policy[l] for l in self.levers])
+ (scenario.name, policy.name, experiment.model_name)
) to guarantee order, with self.uncertainties and self.levers in the init self.uncertainties = [u.name for u in uncertainties]
self.levers = [l.name for l in levers] solves this and is even marginally faster. |
…vers Co-Authored-By: Jan Kwakkel <j.h.kwakkel@tudelft.nl>
Thanks, implemented the suggested changes! PS: When reviewing a PR, the GitHub UI has a very neat feature that you can actually suggest code changes which can be accepted with a single click. There is a little button you have to spot, you can see it in step 3 of Reviewing proposed changes in a pull request. Even multi-line code suggestions are possible! |
I quickly tested your code on some of the examples. The experiment data frame is wrong. It has a multi-index over the columns, which should not be there. It will break most of the analysis package. The content is also full of None's. |
Thanks for reviewing. How well is this currently covered by the tests? I will dive a bit deeper into it. |
Okay I dove in deeper, validated you behaviour and I think the current limitation is that we are updating a DataFrame each experiment using an expensive
Both require however more extensive changes to the surrounding code and tests, so I would propose we discuss this sometime in person. In the mean time, I have cherrypicked the speedups in Also checkout https://github.com/quaquel/epa1361/pull/4, I validated that PR well and I think it might be able to open some new possibilities for the course since it's so much faster. |
It is clear that some tests are missing, so that is the logical next step. I do like moving to a more performant setup with the recarray. |
Are you back in The Netherlands already? Can we meet sometime next week? |
I merged master back into this pull request and resolved the discrepancies. I also moved to pd.DataFrame.from_records. So let's see if the unit tests are passing now. |
make dtypes an attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some test after merging master. All unit tests, including the updated ones, pass. Also e.g. the sobol example works fine now. This is ready to be merged.
Awesome, thanks for this! Let me do a final conformation of improved performance and then I will merge. |
New performance benchmarks. We might lost a little bit here and there, but by far the most of the performance improvement is still there. According to my profiles, it speeds up Old
New
|
On a performance profile of the optimized lake model I noticed that the
_store_case
function took up a large portion of the runtime (46.4% of 5000 runs, or 9535ms).This commit optimizes the
DefaultCallback
class, including its_store_case
function by:This results the
_store_case
function only needing 267ms (or 2.3% of the runtime) on 5000 experiment runs, a speedup of ~35x. When experimenting with thelake_problem
a total speedup of about 60% will be achieved (500 to 800 it/s on my setup). The faster/simpler the model experimented with, the more of this speedup will be noticed.