Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split AgentSet into map and do to seperate return types #2237

Merged
merged 5 commits into from
Aug 22, 2024

Conversation

quaquel
Copy link
Member

@quaquel quaquel commented Aug 21, 2024

Motivation

In #2220 @Corvince suggested to seperate AgentSet.do into 2 seperate methods based on their return type. AgentSet.do executes the callable/method/function and returns the original AgentSet. AgentSet.mapexecutes the callable/method/function and returns the results. At the moment both behaviors are in AgentSet.do and can be controlled with the return_results keyword argument.

Implementation details.

The code is straightforward. I wrap all kwargs to AgentSet.do into a dict. Check if return_results is in it. If it is, a DepractionWarning is issued. The rest of the code proceeds as normal.

Usage

# some form of staged activation
# because do returns the agentset we can chain calls
some_agentset.do("step1").do("step2").do("step3")

# map returns the result of the method so here a list of unique_ids is returned
# no further method chaining of AgentSet methods is possible. 
some_agentset.map(lambda x: x.unique_id)

@quaquel quaquel added the enhancement Release notes label label Aug 21, 2024
@quaquel quaquel requested review from EwoutH and Corvince August 21, 2024 17:44
@quaquel quaquel marked this pull request as ready for review August 21, 2024 17:45
Copy link

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
Schelling small 🔵 +0.2% [-0.2%, +0.5%] 🔵 -1.1% [-1.2%, -0.9%]
Schelling large 🔵 +0.3% [-0.2%, +0.8%] 🔵 -1.0% [-2.6%, +0.4%]
WolfSheep small 🔵 +0.3% [-0.8%, +1.4%] 🔵 -0.5% [-0.8%, -0.1%]
WolfSheep large 🔵 +0.2% [-0.0%, +0.5%] 🔵 -0.6% [-1.6%, +0.4%]
BoidFlockers small 🔵 -3.2% [-3.8%, -2.6%] 🔵 -3.1% [-3.6%, -2.6%]
BoidFlockers large 🟢 -4.0% [-4.7%, -3.3%] 🟢 -4.4% [-5.3%, -3.6%]

Copy link

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
Schelling small 🔵 -1.1% [-1.5%, -0.8%] 🔵 +1.1% [+0.8%, +1.3%]
Schelling large 🔵 +0.1% [-0.4%, +0.7%] 🔵 +5.6% [+2.9%, +8.3%]
WolfSheep small 🔵 -0.2% [-1.5%, +1.0%] 🔵 +0.4% [+0.1%, +0.7%]
WolfSheep large 🔵 +0.6% [+0.2%, +1.1%] 🔵 +2.0% [+0.8%, +3.2%]
BoidFlockers small 🔵 +1.6% [+0.8%, +2.3%] 🔵 +0.9% [+0.1%, +1.6%]
BoidFlockers large 🔵 +0.8% [+0.3%, +1.3%] 🔵 +0.4% [-0.1%, +0.9%]

@rht
Copy link
Contributor

rht commented Aug 21, 2024

While this PR simplifies the typing and type checking, what about consistency with pandas/Polars behavior? It seems that Polars' is simplest in that it only allows returning a DF. This PR's do and apply combine into pandas' apply behavior, because the latter has multiple return types.

@EwoutH
Copy link
Member

EwoutH commented Aug 21, 2024

In general I'm in favor of splitting methods when they have different outputs. I think in this case I also support it.

some_agentset.apply(lambda x: x.unique_id)

I almost feel like AgentSet should be the input to some function here. Like f(agentset) = ..., instead of agentset.apply(f) = ...


On a totally different note, I'm taking a few days of Mesa until Monday. Don't let my absence block anything.

@quaquel
Copy link
Member Author

quaquel commented Aug 22, 2024

While this PR simplifies the typing and type checking, what about consistency with pandas/Polars behavior? It seems that Polars' is simplest in that it only allows returning a DF. This PR's do and apply combine into pandas' apply behavior, because the latter has multiple return types.

MESA is not some DataFrame library but we can draw inspiration from their API. While working with dataframes, it makes sense that each operation returns a new dataframe. You are manipulating data. In MESA, sometimes we want to manipulate agents (so we use do in this PR), sometimes we want to get the return from this manipulation (so we use apply in this PR).

For a further motivation for this PR, please check this comment which gives a clear example of how confusing method chaining can become. So rather than controlling the return type with a keyword argument (as we currently do), it seems easier to split the two return types into two seperate methods. This makes it easier to see what is going on.

Note that I also use the exact same seperation between apply and do in the groupby PR. This makes the MESA API internally consistent.

@rht
Copy link
Contributor

rht commented Aug 22, 2024

MESA is not some DataFrame library but we can draw inspiration from their API.
...
For a further motivation for this PR, please check #2220 (comment) which gives a clear example of how confusing method chaining can become.

I find the AgentSet and DF to be analogous for the use case, i.e. the method chaining. But even in the DF libraries, there are disagreements. I forgot to emphasize the pandas API choice and the Polars API choice are actually very different from each other. It seems that this PR is taking the Polars approach. Understandably for Polars, because it is written in Rust and needs a consistent typing.

@rht
Copy link
Contributor

rht commented Aug 22, 2024

One last concern is that do and apply have similar connotation, given that do was based on DF apply to begin with. Would it be less confusing to instead name it as do and output? An LLM may have to disambiguate that do and apply are separate, in the context of Mesa only.

@quaquel
Copy link
Member Author

quaquel commented Aug 22, 2024

One last concern is that do and apply have similar connotation, given that do was based on DF apply to begin with. Would it be less confusing to instead name it as do and output? An LLM may have to disambiguate that do and apply are separate, in the context of Mesa only.

I am fine with giving apply a different name if we can come up with one. However, a method name should be a verb in my view because it does something. So, I am not in favor of changing it to output.

As an aside, I don't care about LLMs so for me that is not a good argument for changing the name. In my experience they produce crapy code. Students can use them in my MESA exam but does that rely on them exclusively all fail the exam. In my own usage, I only find them useful for boilerplate stuff. But even then, it is often quicker for me to just write it out with more traditional autocomplete.

@Corvince
Copy link
Contributor

I think the naming is much more confusing than I initially thought.

In Pandas "apply" executes a function along rows (default) or columns. We don't have that in AgentSet. Pandas "map" function executes elemen-wise, which might be much more appropriate. However, pandas does not have a "map" function for groupby objects.
Polars has a "map_groups" function that applies a function per group.

the built-in function "map" also applies a function to an iterator and results in a new iterator of the return values.

So I think "map" might actually be a much better name, with the exception that pandas groupby doesnt use that name. Would you still be okay with that name @quaquel (for consistency we then should rename apply to map for groupby)

@quaquel
Copy link
Member Author

quaquel commented Aug 22, 2024

Changing apply to map is perfectly fine with me. If we agree, I'll do it both here and in the groupby PR to keep the API consistent.

@EwoutH
Copy link
Member

EwoutH commented Aug 22, 2024

Oh I like map, it’s also a lot more distinct from do than apply was.

@quaquel
Copy link
Member Author

quaquel commented Aug 22, 2024

I have changed it (3 maintainers in favor seems enough).

@quaquel quaquel mentioned this pull request Aug 22, 2024
@quaquel quaquel changed the title Split AgentSet into apply and do to seperate return types Split AgentSet into map and do to seperate return types Aug 22, 2024
@quaquel quaquel merged commit 2898c01 into projectmesa:main Aug 22, 2024
10 of 12 checks passed
@EwoutH EwoutH added the backport-candidate PRs we might want to backport to an earlier branch label Aug 22, 2024
@EwoutH EwoutH added the deprecation When a new deprecation is introduced label Aug 30, 2024
@EwoutH EwoutH mentioned this pull request Aug 30, 2024
7 tasks
EwoutH pushed a commit to EwoutH/mesa that referenced this pull request Sep 24, 2024
…2237)

* seperate original do into map and do in AgentSet based on their difference in return type.
@quaquel quaquel deleted the agentset_apply_do branch November 4, 2024 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-candidate PRs we might want to backport to an earlier branch deprecation When a new deprecation is introduced enhancement Release notes label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants