You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the most part, we intend processed data to go in a separate BlueskyRun, which may reference the BlueskyRun(s) with the original data. There are several reasons to put this in a separate Run rather than an additional stream in the original Run.
The data management for processed / derived / analyzed data may be different than that of raw data---for example, rules about who can access and how long it is retained in the system.
For any given raw data set, there may be multiple process / derived / analyzed data sets, and expressing this "one to many" relationship inside streams will get awkward.
Many part of the Bluesky infrastructure make the assumption that once a BlueskyRun is complete (i.e. once the 'stop' document is emitted by the RunEngine) that it will not change. This assumption simplifies a lot of things. Breaking it to add streams after the fact comes with a high complexity cost.
One could then imagine queries like "Show me all the processed results for Scan ID X," or "Given this processed result, find me the raw data."
However some analysis that can be done cheaply in real time during data acquisition and in a rote fashion that is highly unlikely to require re-processing with different parameters might be done in the Ophyd/Bluesky layer as part of data acquisition, and could be including in a stream in the original BlueskyRun. That particular case stays on the right side of points 1-3 above.
The text was updated successfully, but these errors were encountered:
This is really nice, and I expect it to be very useful for our spectroscopy data processing (likely using larch). I will give it a try before I start asking a bunch of trivial stuff.
I suspect that the threshold for calling a data processing cheap can be somewhat fuzzy. But would you call the data processing described in #42 cheap? Maybe it wasn't very clear there, but the reason I asked about it is that if we can add the processed XANES/XMCD to a new stream, then we could just plot that stream. Having this would also be useful to users.
Spinning off from this comment:
For the most part, we intend processed data to go in a separate BlueskyRun, which may reference the BlueskyRun(s) with the original data. There are several reasons to put this in a separate Run rather than an additional stream in the original Run.
This is our working theory of how to capture analysis results in Databroker: https://blueskyproject.io/databroker/docs-rewrite-draft/how-to/store-analysis-results.html (Note: This link is to a preview of new Databroker documentation that is being evaluated by some users. It will be moved to https://blueskyproject.io/databroker/how-to/store-analysis-results.html
One could then imagine queries like "Show me all the processed results for Scan ID X," or "Given this processed result, find me the raw data."
However some analysis that can be done cheaply in real time during data acquisition and in a rote fashion that is highly unlikely to require re-processing with different parameters might be done in the Ophyd/Bluesky layer as part of data acquisition, and could be including in a stream in the original BlueskyRun. That particular case stays on the right side of points 1-3 above.
The text was updated successfully, but these errors were encountered: