-
Notifications
You must be signed in to change notification settings - Fork 1
use bluesky, et al., for at least one user #46
Comments
We had a good 10 hours of work yesterday with bluesky for the XPCS instrument using the Lambda detector. Bluesky was used for most of the instrument preparation (such as alignment scans and the routine remote control of various equipment). I believe we were able to complete a few successful measurements for the user. Part of the day was spent developing a user plan that would measure at a list of temperatures (using the Lakeshore controller). This plan would automate an overnight sequence of measurements. Our work was plagued with many interruptions of the RunEngine execution of our plan due to Also, we experienced some failures to connect with one EPICS PV or another at the start of a bluesky session using ipython. The remedy has been to exit and restart. This usually worked. A few times, a different PV was the cause of a similar failure to connect. SummaryWe agree that the bluesky framework is not ready now for unattended operations at XPCS. There are too many interruptions (Python exceptions which interrupt the bluesky RunEngine) and it seems these interruptions are at a low level in the framework (somewhere in the handling of EpicsSignal objects). Resolving these interruptions requires the full attention of experts at diagnosing at this low level. Since the XPCS instrument is now in a routine operations period and the beam line has a working alternative software, they decided to switch back to using SPEC for the remainder of the user's beam time. The exception traces are too deep, in most cases. This is an opportunity to improve the exception handling in the RunEngine. When a problem is due to a user's plan, the exception is raised from a deep level in the bluesky framework and passed directly to the console. We expect such exceptions should be caught by the RunEngine, then raised as a new TODO list
|
Thanks for the thorough write up. It sounds like there are two categories of problem: (1) something is causing TimeoutErrors and interrupting scans and (2) the deep tracebacks present a usability issue. |
I've seen I don't remember the python, pyepics3, and EPICS lib versions. I can look them up on the next shutdown day. |
Thanks! This is valuable input, helping us to divide our problem space of where to look next. |
We noticed that problem at a few "heavy" (in terms of the number of PVs) beamlines. I think we need to come up with a common solution with longer allowed connection times. A temporary, in-place fix can be similar to what we use in our CI testing, i.e. monkey-patching the timeout of the EpicsSignal: https://github.com/NSLS-II/profile-collection-ci/blob/726ebc6a618caadfdfc764579471b78e802804cf/azure-linux.yml#L145-L148: import ophyd
import functools
ophyd.signal.EpicsSignalBase.wait_for_connection = functools.partialmethod(ophyd.signal.EpicsSignalBase.wait_for_connection, timeout=60) In general, I noticed the timeout is set to different values in different places -- from 1 to 10 seconds. I think we should homogenize it to something standard, or, at least, configurable. From previous discussions (e.g., in Nikea Slack and caproto/caproto#512) I learned that pyepics timeout value is 5 seconds, which seems to be a reasonable value. |
Also, regarding the timeouts while setting the PVs, here can be a potential solution and corresponding debug logs: bluesky/ophyd#779 (heavy WIP!) |
We wlil use bluesky this week with the BES Pilot Project team. That will satisfy this issue. |
Add feature to Lambda to close the shutter once acquisition of frames is completed, even when processing is still needed by the CAM plugin. |
For the lambda, control the shutter during the |
Rigaku detector:
|
|
Lambda testing:
|
Rigaku testing:
|
Issue #146 might stop us from reaching this goal. We can continue to operate but we know we'll have to restart the EPICS IOC often. |
Created a new milestone for this issue: https://github.com/aps-8id-dys/ipython-8idiuser/milestone/12 |
Closing this issue now |
I can see closing this issue since 8-ID-I can operate with Bluesky for a week without switching back to SPEC. Continuous operation for one week, without unplanned exceptions due to the software, is the next step. |
run user operations for at least 1 week with an expert user without having to resort back to SPEC in the middle of the week
The text was updated successfully, but these errors were encountered: