Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-factor database extension and support cell_methods in ExperimentExplorer #274

Merged
merged 22 commits into from
Nov 9, 2021

Conversation

aidanheerdegen
Copy link
Collaborator

Re-factor database extension and support cell_methods in ExperimentExplorer.

The re-factor has been on the cards for a while, and seemed like the best time to dive in.

Added hybrid properties model and is_restart to to NCFile. model infers the realm of the model from the ncfile path. is_restart is a boolean to identify restart variables and also inferred from the ncfile path.

Added hybrid property is_coordinate to CFVariable. This returns a boolean value which is True if this is a coordinate, False otherwise. It is inferred from the value of units.

All hybrid properties are accessible as a python method from a returned object and as a column in a query.

This is moving logic from explorer into the classes themselves to simplify and speed up using the explorer GUI elements.

Added inferred properties to get_variables when argument inferred is True.

get_variables can now return all variables from all experiments if experiment is None. In this case it will also return the experiment name.

Updated tests, but far from comprehensive.

Closes #273

@aidanheerdegen aidanheerdegen marked this pull request as draft September 2, 2021 07:08
@aidanheerdegen aidanheerdegen marked this pull request as ready for review September 14, 2021 06:37
@aidanheerdegen
Copy link
Collaborator Author

The circleci check is just timing out, so can be ignored for the moment.

Still haven't actually added the cell_methods selector I realised, so will turn back into a draft ....

@aidanheerdegen aidanheerdegen marked this pull request as draft September 14, 2021 06:42
@aidanheerdegen
Copy link
Collaborator Author

This is a working version but still have a few things to work on:

  1. Need more tests, covering existing functionality and new cell methods addition
  2. Too slow to start Experiment Explorer (15s) and Database explorer (22s CPU, 60s Walltime)
  3. Sluggish when selecting/deselecting filtering coordinates and restart vars. Maybe triggering some event loop stuff?
  4. Selecting cell methods works, but the time range is for all variables with the same frequency. Currently have no way to query for a time range based on a var + frequency + cell_methods, so don't have any way to update the time range. So it can be (almost certainly is) misleading.

@aidanheerdegen aidanheerdegen marked this pull request as ready for review November 1, 2021 06:40
@aidanheerdegen
Copy link
Collaborator Author

New approach fixes the slow start-up times for the DatabaseExplorer and ExperimentExplorer

> %%time
> dbx = DatabaseExplorer(session=session)
CPU times: user 16.3 s, sys: 1.04 s, total: 17.3 s
Wall time: 17.3 s

> %%time
> ee = ExperimentExplorer(session=session, experiment='01deg_jra55v140_iaf')
CPU times: user 6.58 s, sys: 498 ms, total: 7.08 s
Wall time: 7.07 s

Just as a data point for later comparison the current COSIMA master DB is indexing 115 experiments containing 379590 individual files, 9.2M variables with 56M individual attributes.

No longer slow to filter restarts and coordinates.

the realm of the model from the ncfile path. is_restart is a boolean to
identify restart variables and also inferred from the ncfile path.

Added hybrid property is_coordinate to CFVariable. This returns a boolean
value which is True if this is a coordinate, which is inferred from the
value of the units.

All hybrid properties are accessible as a python method from a returned
object and as a column in a query.

This is moving logic from explorer into the classes themselves to simplify
and speed up using the explorer GUI elements.

Added inferred properties to get_variables when argument inferred is True.

get_variables can now return *all* variables from *all* experiments if
experiment is None. In this case it will also return the experiment name.

Updated tests, but far from comprehensive.
… which contain

specified variables. Added tests.

Modified get_variables: can limit returned variables to those which contain search
terms in variable name, long name and standard name. Added tests.
Added some more testing for filter selector.
Added get_cellmethods function to querying.

Added cell methods selector to ExperimentExplorer.
…ly into the returned

pandas Dataframe. Significantly faster and simpler to do it this way.

Added separate cellmethods observe and event handler. Now cascade events from
frequency > cellmethods > daterange.

Updated explore tests with new cascade approach method.

Tried to fix up formatting in Experiment Explorer for new cell methods label with limited success.

Updated tests to work with extra new (cell_methods) field being returned from get_variables.
@aidanheerdegen
Copy link
Collaborator Author

what's the python version that conda-analysis3 runs?

3.9

@aidanheerdegen
Copy link
Collaborator Author

21.04 uses 3.8. I don't think it is a big deal. It is some weird pytest related error. Doesn't happen outside of testing AFAICT.

@aidanheerdegen
Copy link
Collaborator Author

I think this is ready for review now @angus-g if you have time.

@angus-g
Copy link
Collaborator

angus-g commented Nov 8, 2021

I'll take a look now!

Copy link
Collaborator

@angus-g angus-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a few tweaks to queries/expressions, but otherwise it seems to make sense. This is pretty substantial! I haven't gone through all the tests too closely, so I'm assuming they're covering all the cases you need for the explorer.

@aidanheerdegen
Copy link
Collaborator Author

Thanks for checking and making those changes.

Regarding tests, if I'm honest no there aren't enough, but I ran out of motivation as creating them is a bit painful. I have added a test to make sure the hybrid model property is consistent between the two methods of invocation.

I'll take a look at the cell_methods stuff to make sure there is appropriate test coverage.

@aidanheerdegen
Copy link
Collaborator Author

I take that back, clearly a more motivated me added a fair bit of test coverage for cell_methods. I think I'm happy to merge if you are @angus-g.

@aidanheerdegen aidanheerdegen merged commit 7f4eff7 into master Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for cell_methods to explorer
3 participants