Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix plot_singles and optimize memory #4566

Merged
merged 4 commits into from
Nov 15, 2023
Merged

Conversation

spxiwh
Copy link
Contributor

@spxiwh spxiwh commented Nov 14, 2023

pycbc_plot_singles_vs_params was broken by recent changes in matplotlib. The note about "behaviour of hexbin" here https://matplotlib.org/stable/api/prev_api_changes/api_changes_3.8.1.html specifically broke thinks. My understanding is that before we set mincnt = 0, but that actually meant "minimum count is 1 trigger in that bin". Now it means "minimum count is 0 triggers in that bin", and the code fails if you try to take the maximum of 0 triggers.

I put in a version-dependent switch on the mincnt parameter to fix this. This resolved the problem in a single test case, I'm running a bunch more in a full workflow now.

.... It also took ages to debug this, because the code reads way more triggers than it needs to. I've implemented the HFile mask thing in this code as well. This makes the code run significantly faster, and with much less memory usage. The filter_func still uses a lot of memory, but given that we are considering #4524 #4545 I leave that to be made nicer when that change goes in.

@spxiwh spxiwh added the v23_release_branch PRs applied to the v2.3.X release branch or to be cherry-picked if merging to master label Nov 14, 2023
@tdent
Copy link
Contributor

tdent commented Nov 14, 2023

Since mpl 3.8 seems to finally have implemented the 'don't try to reduce cells with 0 points in' default, we should be able to reduce the complexity considerably and only keep the mincnt setting for backward compatibility. See suggestion (will be typed in a few minutes).

Also I would rather use reduce_C_function=np.max as we should be preferring numpy functions for arrays.

Copy link
Contributor

@tdent tdent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments/suggestions

@spxiwh
Copy link
Contributor Author

spxiwh commented Nov 14, 2023

Hopefully this addresses the feedback now?

@tdent
Copy link
Contributor

tdent commented Nov 14, 2023

Did you try using np.max instead of plain max ?

@spxiwh
Copy link
Contributor Author

spxiwh commented Nov 14, 2023

Missed that, sorry, will try.

data_mask = np.zeros(n_triggers_orig, dtype=bool)
data_mask[idx] = True

filts = [f for f in [opts.filter_string] if f is not None]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the square bracket surrounding opts.filter_string intended here? This line does not make much sense to me (but I am quite tired right now…)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears the same as the existing code and join works with a (non-empty) list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(could be a tuple as well, no difference to the outcome)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code does work, but I think having moved the SNR mask, the following two lines:

filts = [f for f in [opts.filter_string] if f is not None]
filter_func = ' & '.join(filts) if filts else None

basically reduce to:

filter_func = opts.filter_string

I'll just remove these lines and send opts.filter_string directly to the relevant function call. Then I'll merge unless there are objections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I missed the point that this previously added the SNR cut to the filter option and now doesn't ..

@spxiwh
Copy link
Contributor Author

spxiwh commented Nov 14, 2023

Seems to work fine with np.max, updated to that.

@tdent
Copy link
Contributor

tdent commented Nov 15, 2023

It's still approved (assuming it continues to work fine)

@spxiwh spxiwh merged commit 9b8ecb0 into gwastro:master Nov 15, 2023
31 checks passed
@spxiwh spxiwh deleted the pr_fix_plot_singles branch November 15, 2023 11:45
spxiwh added a commit to spxiwh/pycbc that referenced this pull request Nov 15, 2023
* Fix plot_singles and optimize memory

* Tom's comment

* Use np.max

* Remove redundant lines
spxiwh added a commit that referenced this pull request Nov 15, 2023
* Update singularity image if release (#4563)

* Fix plot_singles and optimize memory (#4566)

* Fix plot_singles and optimize memory

* Tom's comment

* Use np.max

* Remove redundant lines

* Bump release number to v2.3.2
maxtrevor pushed a commit to maxtrevor/pycbc that referenced this pull request Dec 11, 2023
* Fix plot_singles and optimize memory

* Tom's comment

* Use np.max

* Remove redundant lines
bhooshan-gadre pushed a commit to bhooshan-gadre/pycbc that referenced this pull request Dec 19, 2023
* Fix plot_singles and optimize memory

* Tom's comment

* Use np.max

* Remove redundant lines
@spxiwh spxiwh removed the v23_release_branch PRs applied to the v2.3.X release branch or to be cherry-picked if merging to master label Jan 8, 2024
bhooshan-gadre pushed a commit to bhooshan-gadre/pycbc that referenced this pull request Mar 4, 2024
* Fix plot_singles and optimize memory

* Tom's comment

* Use np.max

* Remove redundant lines
lpathak97 pushed a commit to lpathak97/pycbc that referenced this pull request Mar 13, 2024
* Fix plot_singles and optimize memory

* Tom's comment

* Use np.max

* Remove redundant lines
acorreia61201 pushed a commit to acorreia61201/pycbc that referenced this pull request Apr 4, 2024
* Fix plot_singles and optimize memory

* Tom's comment

* Use np.max

* Remove redundant lines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants