Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add at_reg_id_event #642

Merged
merged 22 commits into from
Mar 1, 2023
Merged

add at_reg_id_event #642

merged 22 commits into from
Mar 1, 2023

Conversation

aleeciu
Copy link
Collaborator

@aleeciu aleeciu commented Feb 5, 2023

This PR addresses #595

@emanuel-schmid emanuel-schmid marked this pull request as draft February 7, 2023 12:02
climada/engine/impact.py Outdated Show resolved Hide resolved
climada/engine/impact.py Outdated Show resolved Hide resolved
climada/engine/impact.py Outdated Show resolved Hide resolved
@peanutfun peanutfun marked this pull request as ready for review February 15, 2023 16:58
climada/engine/impact.py Outdated Show resolved Hide resolved
climada/engine/impact.py Outdated Show resolved Hide resolved
climada/engine/impact.py Outdated Show resolved Hide resolved
climada/engine/impact.py Show resolved Hide resolved
climada/engine/impact.py Show resolved Hide resolved
climada/engine/impact.py Outdated Show resolved Hide resolved
climada/engine/impact.py Outdated Show resolved Hide resolved
climada/engine/test/test_impact.py Outdated Show resolved Hide resolved
aleeciu and others added 8 commits February 27, 2023 16:21
Co-authored-by: Lukas Riedel <34276446+peanutfun@users.noreply.github.com>
Co-authored-by: Lukas Riedel <34276446+peanutfun@users.noreply.github.com>
Co-authored-by: Lukas Riedel <34276446+peanutfun@users.noreply.github.com>
Co-authored-by: Lukas Riedel <34276446+peanutfun@users.noreply.github.com>
climada/engine/impact.py Outdated Show resolved Hide resolved
Comment on lines 420 to 421
elif not isinstance(agg_regions, pd.Series):
agg_regions = pd.Series(agg_regions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make it a pandas series instead of a numpy array?

Copy link
Collaborator Author

@aleeciu aleeciu Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for two reasons:

  • I find it cleaner as you know what impact belongs to what region simply by looking at the dataframe
  • I like the pd.Series.unique() method because it retains the order. Doing the same with numpy is more cumbersome, to my knowledge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that this is a misuse of a pandas series.

I find it cleaner as you know what impact belongs to what region simply by looking at the dataframe
That is independent on using numpy arrays or pandas series. But using a pandas series when it is not needed is imho actually less clean.
I like the pd.Series.unique() method because it retains the order. Doing the same with numpy is more cumbersome, to my knowledge.
I would argue that this is suboptimal as an important point is now hidden in the subtle internal working of the pandas.series.unique() vs the np.unique() methods.

@peanutfun : what do you think?

Copy link
Collaborator Author

@aleeciu aleeciu Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • on the first point you are right: having a pd.Series is not strictly needed to generate a pd.DataFrame

  • on the second: I am transforming agg_regions into pd.Series only when the user supplies it as an np.array or list. But the user can supply a pd.Series in the first place. So I don't think it's a misuse, I just wrote the code in a way that it works with a pd.Series.

So I think your doubt is rather: should we use pandas for things that can be done with numpy? Not sure I have a clear answer to that, as this would probably apply to any code that make use of pandas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that pd.series are objects that are made to be used in pandas dataframes, but they are not made to be used a standalone objects. They are essentially numpy arrays, with some more. Thus, instead of making an unclear use of the pd.series, I suggest to use numpy arrays, and to make the ordering requirement clear (as currently this is a hidden feature).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see how the order is important because the order in the agg_regions parameter is determined by the "order" of the exposure points. Also, the order of columns in the final data frame does not appear relevant to me 😅

I would actually argue that the unordered result of pandas.Series.unique does not need further explanation, whereas I feel we should add a note to the docstring that the unique items will be ordered when using np.unique. So, I think the easiest solution is to just stick to the current implementation.

@chahank: Following our discussion on the "clean" indexing, I just realized that pandas.Series.unique returns a numpy array, so everything is fine from that perspective.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the order does not count. I am now using only numpy arrays.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that this is a misuse of a pandas series.

Not necessarily. According to the docs a Series is a "one-dimensional ndarray with axis labels (including time series)". Sounds useful also outside of a DataFrame.
🤷

Copy link
Member

@peanutfun peanutfun Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much agree with @emanuel-schmid here 😃

I'm fine with the current implementation. However, please make sure to only call np.unique once, and store the result for later use. It is currently executed twice and will be costly for large arrays. I can also take care of that when merging. Are we ready to go ahead?

aleeciu and others added 2 commits February 28, 2023 11:53
Co-authored-by: Chahan M. Kropf <chahan.kropf@usys.ethz.ch>
climada/engine/impact.py Outdated Show resolved Hide resolved
@aleeciu
Copy link
Collaborator Author

aleeciu commented Feb 28, 2023

I guess we are ready to merge

@emanuel-schmid
Copy link
Collaborator

Yes, looks good. 😄 I did some cosmetics in the doc string. If your happy with them @aleeciu, let me know or just commit and merge.

Co-authored-by: Emanuel Schmid <51439563+emanuel-schmid@users.noreply.github.com>
@emanuel-schmid
Copy link
Collaborator

Oh and one more thing we need to change the CHANGELOG.md

@peanutfun
Copy link
Member

Will do and handle the merge! 👌

@peanutfun peanutfun merged commit 187a7d5 into develop Mar 1, 2023
@emanuel-schmid emanuel-schmid deleted the feature/impact_at_reg_id branch March 6, 2023 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants