Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new utility functions #521

Merged
merged 21 commits into from
Jul 21, 2021
Merged

new utility functions #521

merged 21 commits into from
Jul 21, 2021

Conversation

prjemian
Copy link
Contributor

@prjemian prjemian commented Jul 8, 2021

Three new utility functions from APS 4-ID polartools will help users to find data from databroker runs. Fixes #502

Thanks @gfabbris & @strempfer!

   db_query()
   getRunData()
   getRunDataValue()
   listRunKeys()

@prjemian prjemian added this to the 1.5.1 milestone Jul 8, 2021
@prjemian prjemian self-assigned this Jul 8, 2021
@prjemian prjemian requested a review from gfabbris July 9, 2021 05:15
@prjemian
Copy link
Contributor Author

prjemian commented Jul 9, 2021

@gfabbris This is ready for review.

@strempfer
Copy link
Collaborator

I don't see how it is possible to get values from baseline with any of the three functions, when only a key fragment is given. Can key_fragment and strict also be implemented for getRunData and getRunDataValue?

@prjemian
Copy link
Contributor Author

There is a stream keyword that is used to pick the "baseline" stream. Default is "primary". Example is in the unit tests: https://github.com/BCDA-APS/apstools/blob/502-find_devices/tests/test_utils.py#L348-L376

@prjemian
Copy link
Contributor Author

The real situation is that the databroker v2 search is slow compared with the v1 interface. For now, rely on the v1 interface.

@prjemian prjemian requested a review from strempfer July 14, 2021 19:04
@prjemian prjemian requested a review from CrI3 July 14, 2021 19:20
@prjemian
Copy link
Contributor Author

Time some test functions to compare:

import databroker
catalog_name = "training"  # on my test workstation

%%timeit
cat = databroker.catalog[catalog_name]
for idx in range(-20, 0):
    run = cat.v1[idx]
    uid = run.start["uid"]
    # if  "baseline" in run.stream_names:
    #     bl = run.table(stream_name="baseline")
    #     print(idx, uid)


%%timeit
cat = databroker.catalog[catalog_name]
for idx in range(-20, 0):
    run = cat.v2[idx]
    uid = run.metadata["start"]["uid"]
    # if hasattr(run, "baseline"):
    #    baseline = run.baseline.read()
    #    print(idx, uid)

mean ± std. dev. of 7 runs, 1 loop each

time per loop v1 v2
commented 370 ms ± 1.21 ms 369 ms ± 926 µs
uncommented 12.8 s ± 33 ms 21.2 s ± 42.5 ms

In commented, the baseline data is not actually read from databroker. Roughly twice as long to use the v2 interface instead of the v1 interface.

In [9]: databroker.__version__
Out[9]: '1.2.2'

The database is on a LAN MongoDB server.

@prjemian
Copy link
Contributor Author

getStreamValues() is hoisted from 4ID Polar's lookup_position(). It returns a pandas DataFrame (instead of printing to console). If the object is not assigned to a variable, it prints with nice format.

Needs unit testing and could benefit from a lower-case match instead of explicit case match.

@prjemian
Copy link
Contributor Author

@gfabbris , @strempfer , @CrI3 ; This is ready for review.

Copy link
Collaborator

@strempfer strempfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.
In getStreamValues db could be moved after key_fragment like it is done for listRunKeys. Otherwise there is an error if no db is given: getStreamValues(-1,'lake') does not work, but getStreamValues(-1,db,'lake') does.

@prjemian
Copy link
Contributor Author

prjemian commented Jul 19, 2021

@strempfer Great catch. Turns out there are no unit tests yet for getStreamValues(). That's an oversight I will need to fix. As a result, you should get a proper error if db is not defined and no default database can be found. I suspect you actually have a database. It should be found by default.

@prjemian
Copy link
Contributor Author

@strempfer : Added unit tests. If you do not specify a db argument and you have multiple catalog configurations available, then a ValueError will be raised since apstools.utils.getDefaultCatalog() cannot decide which catalog configuration to select. There is now a unit test for this.

Copy link
Collaborator

@gfabbris gfabbris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one question that can be addressed in the future.

_db = db

if len(query) != 0:
_db = _db.v2.search(query)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way combine these searches? I wonder if that would be faster.

Copy link
Collaborator

@strempfer strempfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@prjemian prjemian merged commit 39fa16f into main Jul 21, 2021
@prjemian prjemian deleted the 502-find_devices branch July 21, 2021 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

find_devices() from polartools.utils
4 participants