Discussion: capturing processed data #75

danielballan · 2021-01-27T14:59:52Z

Spinning off from this comment:

For the most part, we intend processed data to go in a separate BlueskyRun, which may reference the BlueskyRun(s) with the original data. There are several reasons to put this in a separate Run rather than an additional stream in the original Run.

The data management for processed / derived / analyzed data may be different than that of raw data---for example, rules about who can access and how long it is retained in the system.
For any given raw data set, there may be multiple process / derived / analyzed data sets, and expressing this "one to many" relationship inside streams will get awkward.
Many part of the Bluesky infrastructure make the assumption that once a BlueskyRun is complete (i.e. once the 'stop' document is emitted by the RunEngine) that it will not change. This assumption simplifies a lot of things. Breaking it to add streams after the fact comes with a high complexity cost.

This is our working theory of how to capture analysis results in Databroker: https://blueskyproject.io/databroker/docs-rewrite-draft/how-to/store-analysis-results.html (Note: This link is to a preview of new Databroker documentation that is being evaluated by some users. It will be moved to https://blueskyproject.io/databroker/how-to/store-analysis-results.html

One could then imagine queries like "Show me all the processed results for Scan ID X," or "Given this processed result, find me the raw data."

However some analysis that can be done cheaply in real time during data acquisition and in a rote fashion that is highly unlikely to require re-processing with different parameters might be done in the Ophyd/Bluesky layer as part of data acquisition, and could be including in a stream in the original BlueskyRun. That particular case stays on the right side of points 1-3 above.

gfabbris · 2021-01-27T16:10:41Z

This is really nice, and I expect it to be very useful for our spectroscopy data processing (likely using larch). I will give it a try before I start asking a bunch of trivial stuff.

I suspect that the threshold for calling a data processing cheap can be somewhat fuzzy. But would you call the data processing described in #42 cheap? Maybe it wasn't very clear there, but the reason I asked about it is that if we can add the processed XANES/XMCD to a new stream, then we could just plot that stream. Having this would also be useful to users.

danielballan mentioned this issue Jan 27, 2021

custom visualization callback for dichroism scans #42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: capturing processed data #75

Discussion: capturing processed data #75

danielballan commented Jan 27, 2021

gfabbris commented Jan 27, 2021

Discussion: capturing processed data #75

Discussion: capturing processed data #75

Comments

danielballan commented Jan 27, 2021

gfabbris commented Jan 27, 2021