Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

O4b idq offline update #4603

Merged
merged 30 commits into from
Jan 26, 2024
Merged

Conversation

maxtrevor
Copy link
Contributor

@maxtrevor maxtrevor commented Jan 10, 2024

This is a rework of the entire offline dq workflow
The goals of this rework are to

  • Simplify the code and logical structure of the workflow
  • Allow for reweighting of triggers in a window around autogated times
  • Allow for the us of DQ flags with arbitrary start/end times (previously they had to start/end on an integer second)
  • Re-implement the dq ranking statistic in a way that simplifies future use in low-latency

@maxtrevor maxtrevor requested review from tdent and spxiwh January 10, 2024 22:13
@tdent
Copy link
Contributor

tdent commented Jan 11, 2024

Some comments/requests before really starting review ..

Concerning the content of changes I am dubious whether 'future use in low latency' belongs in this PR. Are you developing that in parallel? If so could the changes related to low latency idq be put in that development branch, not this one? It is preferable for a single PR to deal with no more than a single conceptual change or improvement, possibly in addition to restructuring, this one seems to contain one and a half improvements or more (add gating segments, restructuring, plus some fraction of low latency idq development).

The diff to stat.py is very large and much of it is whitespace before # pylint comments. This is unnecessarily hard to read and review. Can you remove all whitespace changes from stat.py please so that we can focus on the functional changes. Also, in some places it seems you have removed inline explanatory comments from stat.py code (eg the exp fit stat class), I think this is undesirable - can the comments be reinstated or edited appropriately please.

@tdent
Copy link
Contributor

tdent commented Jan 11, 2024

Ah, I guess I'd also like to know if these changes include the ability to use idq segments shorter than 1s as we have discussed a few times

@maxtrevor
Copy link
Contributor Author

Ah, I guess I'd also like to know if these changes include the ability to use idq segments shorter than 1s as we have discussed a few times

That feature is included in this change. I knew there was a feature I forgot to mention in the description.

@maxtrevor
Copy link
Contributor Author

Some comments/requests before really starting review ..

The diff to stat.py is very large and much of it is whitespace before # pylint comments. This is unnecessarily hard to read and review. Can you remove all whitespace changes from stat.py please so that we can focus on the functional changes. Also, in some places it seems you have removed inline explanatory comments from stat.py code (eg the exp fit stat class), I think this is undesirable - can the comments be reinstated or edited appropriately please.

These changes have been reverted

@tdent
Copy link
Contributor

tdent commented Jan 11, 2024

Ah, good, you can edit the description to also mention the sub-second aspect.

Looking at where the description mentions low latency in more detail, I guess it's OK to have a change to dq stat implementation in this PR if it is just reworking / rearranging of the previous calculation and does not add functionality that is specific to low latency (ie not used in offline). But it does look like you have put in a bit of that, eg allowing for different hdf file formats?
If it's not a huge disruption of this PR, could the changes in functionality be restricted to what is needed for offline?
(Maybe you already did that but I haven't re-read the diffs in detail.)

This hopefully may make review organization a bit easier, ie if all the low latency specific changes could be pushed together in a (different) PR

@tdent tdent requested a review from ahnitz January 11, 2024 17:23
@maxtrevor
Copy link
Contributor Author

Ah, good, you can edit the description to also mention the sub-second aspect.

Looking at where the description mentions low latency in more detail, I guess it's OK to have a change to dq stat implementation in this PR if it is just reworking / rearranging of the previous calculation and does not add functionality that is specific to low latency (ie not used in offline). But it does look like you have put in a bit of that, eg allowing for different hdf file formats? If it's not a huge disruption of this PR, could the changes in functionality be restricted to what is needed for offline? (Maybe you already did that but I haven't re-read the diffs in detail.)

I have edited the description.

The LL specific features have been removed, to be added in a future PR with some other LL machinery. The remaining changes to the dq stat are for compatibility with the changes in the workflow, but create a statistic that should be mathematically equivalent to the previous version.

Copy link
Contributor

@spxiwh spxiwh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've looked over the changes and I'm happy that there's nothing here that would affect any code other than the DQ code, that the changes seem stylistically good, and the checks etc. pass. Some minor comments are left.

However, there's a lot of DQ code being altered, and I hadn't followed what it did prior to this. What checks have you done to convince yourself that these changes haven't affected anything and that this code is now doing what you want?

bin/plotting/pycbc_plot_dq_flag_likelihood Outdated Show resolved Hide resolved
pycbc/events/stat.py Outdated Show resolved Hide resolved
pycbc/workflow/dq.py Outdated Show resolved Hide resolved
pycbc/workflow/dq.py Outdated Show resolved Hide resolved
pycbc/workflow/dq.py Outdated Show resolved Hide resolved
@maxtrevor
Copy link
Contributor Author

However, there's a lot of DQ code being altered, and I hadn't followed what it did prior to this. What checks have you done to convince yourself that these changes haven't affected anything and that this code is now doing what you want?

I have an offline run that has been testing the changes. All of the dq workflow jobs, and coinc and sngl jobs have compelted, but the followups failed out over the weekend. I restarted the workflow this morning so hopefully they will complete this time.

I'm happy to make the other changes requested later today

pycbc/workflow/dq.py Outdated Show resolved Hide resolved
pycbc/workflow/dq.py Outdated Show resolved Hide resolved
pycbc/workflow/dq.py Outdated Show resolved Hide resolved
Copy link
Contributor

@tdent tdent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK from my point of view, not sure why tests would be failing

@maxtrevor
Copy link
Contributor Author

Seems OK from my point of view, not sure why tests would be failing

Looks like the test suite is failing to get some files from gwosc that it needs

@maxtrevor maxtrevor merged commit fbfd8b9 into gwastro:master Jan 26, 2024
33 checks passed
spxiwh pushed a commit to spxiwh/pycbc that referenced this pull request Feb 2, 2024
* Start offline dq workflow rework

* Modified stat

* Rewrote dq workflow

* Added note for future suggested changes to dq workflow

* Finished first draft of offline workflow

* Fixed a comment

* Removed unneeded argument to make dq workflow

* Made bin templates executable

* WOrking version of offline dq workflow

* Reverting non-functional changes in stat.py

* linting

* Reverted more non-functional changes

* Reverted low-latency related features

* Rearrange imports

* Addressed Ian's comments

* Simplified masking in dq workflow

* Modify dq workflow to avoid using numpy arrays

* Use vetoed times followup in example (gwastro#4597)

* Use vetoed times followup in example

* Fix statistic files double-definition

* Addressed more comments from Tom

* Addressed Gareth's comments on parser arguments

* Don't allow dq stat to uprank candidates

* Modify dq trigger rate plots to not use dq trigger rates below 1

* Address Tom's most recent comment

* Readded [Min 1] to dq plot's y-axis label

* Pass bank plotting tags to dq template bin plots

* Update bin/plotting/pycbc_plot_dq_flag_likelihood

Co-authored-by: Thomas Dent <thomas.dent@usc.es>

* Update bin/workflows/pycbc_make_offline_search_workflow

Co-authored-by: Thomas Dent <thomas.dent@usc.es>

* Use abs() correctly

* Break up 2 try/ecept cases

---------

Co-authored-by: Gareth S Cabourn Davies <gareth.cabourndavies@ligo.org>
Co-authored-by: Thomas Dent <thomas.dent@usc.es>
@spxiwh spxiwh removed the v23_release_branch PRs applied to the v2.3.X release branch or to be cherry-picked if merging to master label Feb 2, 2024
spxiwh added a commit that referenced this pull request Feb 2, 2024
* Fix error querying Virgo frame type (#4523)

* Fix error querying Virgo frame type

* Typo

* Typo

* Implement Ian's suggestion

* Make it work

* Use an actual DeprecationWarning

* Scripts for dealing with preparation of offline gracedb upload files (#4534)

* Scripts for dealing with preparation of offline gracedb upload files

* Update bin/all_sky_search/pycbc_make_bayestar_skymap

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>

* reinstate SNR timeseries being saved into HDF

* TDC comments

* TDC comments 2

* Remove unneeded import

---------

Co-authored-by: Tito Dal Canton <tito@dalcanton.it>
Co-authored-by: Tito Dal Canton <tito.dalcanton@ijclab.in2p3.fr>

* Use vetoed times followup in example (#4597)

* Use vetoed times followup in example

* Fix statistic files double-definition

* Rationalize some calls to waveform properties (#4540)

* rationalize some calls to waveform properties

* Only allow explictly listed names

* don't try to change function name

* g

g

* Add the ability to choose how to sort injections in minifollowups (#4602)

* Add the ability to choose how to sort injections in minifollowups

* Minor cleanup

* Minor refactor

* Tom's comments

* Update example search workflow

* Further thoughts and suggestions

* Tom's suggestion

* Fix bug and Tom's suggestion

* Standardise logging: bin/all_sky_search and pycbc/events (#4576)

* add level to default logging

* Decrease logging level for debug information in coinc_findtrigs

* decrease logging level for some information in sngls_findtrigs

* add named logger in pycbc/events/cuts.py

* add named logger in pycbc/events/significance.py

* REVERT ME: an example of usage

* CC fix

* Revert "REVERT ME: an example of usage"

This reverts commit eb647e0.

* Use init_logging throughout all_sky_search

* give pycbc/events modules named loggers

* missed the fact that the level was set to warn, not info, as default before

* CC

* missed an import

* start moving verbose argparse into common facility

* add common_options to all_sky_search executables

* set --verbose default value to zero

* pycbc not imported in some codes

* CC

* rename function to be more decriptive

* O4b idq offline update (#4603)

* Start offline dq workflow rework

* Modified stat

* Rewrote dq workflow

* Added note for future suggested changes to dq workflow

* Finished first draft of offline workflow

* Fixed a comment

* Removed unneeded argument to make dq workflow

* Made bin templates executable

* WOrking version of offline dq workflow

* Reverting non-functional changes in stat.py

* linting

* Reverted more non-functional changes

* Reverted low-latency related features

* Rearrange imports

* Addressed Ian's comments

* Simplified masking in dq workflow

* Modify dq workflow to avoid using numpy arrays

* Use vetoed times followup in example (#4597)

* Use vetoed times followup in example

* Fix statistic files double-definition

* Addressed more comments from Tom

* Addressed Gareth's comments on parser arguments

* Don't allow dq stat to uprank candidates

* Modify dq trigger rate plots to not use dq trigger rates below 1

* Address Tom's most recent comment

* Readded [Min 1] to dq plot's y-axis label

* Pass bank plotting tags to dq template bin plots

* Update bin/plotting/pycbc_plot_dq_flag_likelihood

Co-authored-by: Thomas Dent <thomas.dent@usc.es>

* Update bin/workflows/pycbc_make_offline_search_workflow

Co-authored-by: Thomas Dent <thomas.dent@usc.es>

* Use abs() correctly

* Break up 2 try/ecept cases

---------

Co-authored-by: Gareth S Cabourn Davies <gareth.cabourndavies@ligo.org>
Co-authored-by: Thomas Dent <thomas.dent@usc.es>

* Bumping version number

---------

Co-authored-by: Tito Dal Canton <tito.dalcanton@ijclab.in2p3.fr>
Co-authored-by: Gareth S Cabourn Davies <gareth.cabourndavies@ligo.org>
Co-authored-by: Tito Dal Canton <tito@dalcanton.it>
Co-authored-by: Thomas Dent <thomas.dent@usc.es>
Co-authored-by: maxtrevor <65971534+maxtrevor@users.noreply.github.com>
GarethCabournDavies added a commit to GarethCabournDavies/pycbc that referenced this pull request Feb 5, 2024
bhooshan-gadre pushed a commit to bhooshan-gadre/pycbc that referenced this pull request Mar 4, 2024
* Start offline dq workflow rework

* Modified stat

* Rewrote dq workflow

* Added note for future suggested changes to dq workflow

* Finished first draft of offline workflow

* Fixed a comment

* Removed unneeded argument to make dq workflow

* Made bin templates executable

* WOrking version of offline dq workflow

* Reverting non-functional changes in stat.py

* linting

* Reverted more non-functional changes

* Reverted low-latency related features

* Rearrange imports

* Addressed Ian's comments

* Simplified masking in dq workflow

* Modify dq workflow to avoid using numpy arrays

* Use vetoed times followup in example (gwastro#4597)

* Use vetoed times followup in example

* Fix statistic files double-definition

* Addressed more comments from Tom

* Addressed Gareth's comments on parser arguments

* Don't allow dq stat to uprank candidates

* Modify dq trigger rate plots to not use dq trigger rates below 1

* Address Tom's most recent comment

* Readded [Min 1] to dq plot's y-axis label

* Pass bank plotting tags to dq template bin plots

* Update bin/plotting/pycbc_plot_dq_flag_likelihood

Co-authored-by: Thomas Dent <thomas.dent@usc.es>

* Update bin/workflows/pycbc_make_offline_search_workflow

Co-authored-by: Thomas Dent <thomas.dent@usc.es>

* Use abs() correctly

* Break up 2 try/ecept cases

---------

Co-authored-by: Gareth S Cabourn Davies <gareth.cabourndavies@ligo.org>
Co-authored-by: Thomas Dent <thomas.dent@usc.es>
GarethCabournDavies added a commit to GarethCabournDavies/pycbc that referenced this pull request Mar 11, 2024
lpathak97 pushed a commit to lpathak97/pycbc that referenced this pull request Mar 13, 2024
* Start offline dq workflow rework

* Modified stat

* Rewrote dq workflow

* Added note for future suggested changes to dq workflow

* Finished first draft of offline workflow

* Fixed a comment

* Removed unneeded argument to make dq workflow

* Made bin templates executable

* WOrking version of offline dq workflow

* Reverting non-functional changes in stat.py

* linting

* Reverted more non-functional changes

* Reverted low-latency related features

* Rearrange imports

* Addressed Ian's comments

* Simplified masking in dq workflow

* Modify dq workflow to avoid using numpy arrays

* Use vetoed times followup in example (gwastro#4597)

* Use vetoed times followup in example

* Fix statistic files double-definition

* Addressed more comments from Tom

* Addressed Gareth's comments on parser arguments

* Don't allow dq stat to uprank candidates

* Modify dq trigger rate plots to not use dq trigger rates below 1

* Address Tom's most recent comment

* Readded [Min 1] to dq plot's y-axis label

* Pass bank plotting tags to dq template bin plots

* Update bin/plotting/pycbc_plot_dq_flag_likelihood

Co-authored-by: Thomas Dent <thomas.dent@usc.es>

* Update bin/workflows/pycbc_make_offline_search_workflow

Co-authored-by: Thomas Dent <thomas.dent@usc.es>

* Use abs() correctly

* Break up 2 try/ecept cases

---------

Co-authored-by: Gareth S Cabourn Davies <gareth.cabourndavies@ligo.org>
Co-authored-by: Thomas Dent <thomas.dent@usc.es>
acorreia61201 pushed a commit to acorreia61201/pycbc that referenced this pull request Apr 4, 2024
* Start offline dq workflow rework

* Modified stat

* Rewrote dq workflow

* Added note for future suggested changes to dq workflow

* Finished first draft of offline workflow

* Fixed a comment

* Removed unneeded argument to make dq workflow

* Made bin templates executable

* WOrking version of offline dq workflow

* Reverting non-functional changes in stat.py

* linting

* Reverted more non-functional changes

* Reverted low-latency related features

* Rearrange imports

* Addressed Ian's comments

* Simplified masking in dq workflow

* Modify dq workflow to avoid using numpy arrays

* Use vetoed times followup in example (gwastro#4597)

* Use vetoed times followup in example

* Fix statistic files double-definition

* Addressed more comments from Tom

* Addressed Gareth's comments on parser arguments

* Don't allow dq stat to uprank candidates

* Modify dq trigger rate plots to not use dq trigger rates below 1

* Address Tom's most recent comment

* Readded [Min 1] to dq plot's y-axis label

* Pass bank plotting tags to dq template bin plots

* Update bin/plotting/pycbc_plot_dq_flag_likelihood

Co-authored-by: Thomas Dent <thomas.dent@usc.es>

* Update bin/workflows/pycbc_make_offline_search_workflow

Co-authored-by: Thomas Dent <thomas.dent@usc.es>

* Use abs() correctly

* Break up 2 try/ecept cases

---------

Co-authored-by: Gareth S Cabourn Davies <gareth.cabourndavies@ligo.org>
Co-authored-by: Thomas Dent <thomas.dent@usc.es>
GarethCabournDavies added a commit to GarethCabournDavies/pycbc that referenced this pull request Apr 18, 2024
@maxtrevor maxtrevor deleted the o4b_idq_offline_update branch May 1, 2024 19:09
GarethCabournDavies added a commit to GarethCabournDavies/pycbc that referenced this pull request May 20, 2024
GarethCabournDavies added a commit to GarethCabournDavies/pycbc that referenced this pull request Jun 28, 2024
GarethCabournDavies added a commit to GarethCabournDavies/pycbc that referenced this pull request Nov 8, 2024
GarethCabournDavies added a commit to GarethCabournDavies/pycbc that referenced this pull request Jan 9, 2025
GarethCabournDavies added a commit that referenced this pull request Jan 22, 2025
* fixes to the snr-like statistics

* Move exp_fit statistics into a modular framework

* remove unused statistic

* use keyword:value rather than feature for alpha below

* Codeclimate complaints

* use new-style statistic in CI

* fix in case teh fit_by_templte is not stored in the fit_over file

* remove testing change

* fix usage of parse_statistic_feature_options in test

* Docstrings for various functions

* Add back in the changes from #4603

* Add description of the statistics to the documentation

* fix error if passing keywords which need to be floats, rework the alpha_below_thresh keyword

* Allow sngl_ranking keywords to actually be used

* CC

* try this

* maybe

* single-word titles

* Fix a bunch of line-too-long errors

* lines-too-long

* These tables are annoying me

* CC again

* Fix errors in the tables

* run black on pycbc/events/stat.py

* Start getting recent stat changes into module

* fixes post-rebase

* run black on pycbc/events/ranking in order to try and get codeclimate to be quiet

* Revert "run black on pycbc/events/ranking in order to try and get codeclimate to be quiet"

This reverts commit 4f082ea.

* minor fixes

* Bring up to date with recent changes

* Use new ranking statistics in CI

* Start working on getting modular statistic into Live usage

* loads of lint disabling not needed

* IWH comments

* Move ranking statistic discussion to its own part of the documentation (adding search folder)

* sort out links / search homepage

* Fix anchor

* clarify comment

* PhaseTD pregenerate hist will not work on some code paths

* Add search to toctree

* min_snr if no triggers are present

* 0D array for maximum again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants