Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

lambda det IOC, autosave: too many open files #132

Closed
prjemian opened this issue Jan 29, 2020 · 11 comments
Closed

lambda det IOC, autosave: too many open files #132

prjemian opened this issue Jan 29, 2020 · 11 comments
Assignees
Labels
question Further information is requested

Comments

@prjemian
Copy link
Collaborator

Cam plugin says no (0) images complete. Looking at the lambda IOC's console and find this trouble:

2020/01/29 11:43:56.532 Lambda:writeInt32 Setting TriggerMode 0
2020/01/29 11:43:56.534 Lambda:writeInt32 Setting TriggerMode 0
2020/01/29 11:43:56.535 Entering Lambda:writeFloat64
2020/01/29 11:43:56.537 Entering Lambda:writeFloat64
Setting Acquire Period
mkdir umask 777 chmod ret 0
 Created Dir /home/8-id-i/2020-1/bluesky/A1020
IMM Recurse path: statis = 0
Error- Could not open IMM File
2020/01/29 11:43:56.784 NDPluginFile::openFileBase Error opening file /home/8-id-i/2020-1/bluesky/A1020/A1020.imm, status=3
save_restore:write_save_file: Backup file (./autosave/auto_settings.savB) bad or not found.  Writing a new one. [200129-114356]
save_restore:write_it - unable to open file './autosave/auto_settings.savB' [200129-114356]
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
save_restore:write_save_file: Can't write new backup file. [200129-114356]
../save_restore.c(1704): [0x18]=write_it:Too many open files

Will an IOC reboot fix this?

Originally posted by @prjemian in #127 (comment)

@prjemian
Copy link
Collaborator Author

Yes, problem was gone after IOC reboot. But still wonder why too many open files from autosave?

@prjemian prjemian added the question Further information is requested label Jan 29, 2020
@prjemian prjemian self-assigned this Jan 29, 2020
@prjemian
Copy link
Collaborator Author

@JPHammonds, @timmmooney, @keenanlang, @Engbretson, @MarkRivers - Any ideas why this area detector IOC for the Lambda 750 ended up with autosave reporting too many open files? It is running on RHEL workstation bronze. We restarted the IOC and were then able to continue scans. Our scans had stopped when we could not write a 1 to the IMMout plugin's Capture PV in prep for image acquisition.

Thread starts here: #127 (comment)

@prjemian
Copy link
Collaborator Author

note: ran df -HT on workstation bronze but no file systems were full.

@Engbretson
Copy link

Engbretson commented Jan 29, 2020 via email

@JPHammonds
Copy link

JPHammonds commented Jan 29, 2020 via email

@MarkRivers
Copy link

I don't think we know that autosave had too many open files. There are actually 2 error messages:

Error- Could not open IMM File
../save_restore.c(1704): [0x18]=write_it:Too many open files

Those errors just indicate that there are too many files open, but we don't know what software opened them, e.g. the detector driver, autosave, or something else. I think it is more likely that the driver is the culprit than autosave.

@prjemian
Copy link
Collaborator Author

Don’t suppose you know if this Detector was known to be working and stable before and this is new behavior (i.e., ‘bit rot’) or if this was recently recompiled against epics base 7? Or if the computer in question only started to do this after the latest RHEL7 Linux patches were deployed this shutdown?

Detector was being used in a a long scanning procedure involving multiple image acquisitions, typical of regular operations with this specific detector and IOC. Workstation uptime was since before the experiment started and IOC was working.

@prjemian
Copy link
Collaborator Author

There are actually 2 error messages:

Good catch. I agree that autosave was not the likely root cause, but rather a symptom.

@sureshnaps
Copy link
Collaborator

sureshnaps commented Jan 29, 2020 via email

@prjemian
Copy link
Collaborator Author

prjemian commented Jan 29, 2020 via email

@prjemian
Copy link
Collaborator Author

I believe now this is related to #127 (comment). With the changes in e0d4731 and e6a18a4 (and restarting the IOC), data acquisition continues.

Summary: the IMMout file path must start with /data/ for the XPCS image data, while the data management workflow is written to a paht that starts with /home/8-id-i/. These are the same shared filesystem but mounted in different places on the different hosts.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants