Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example 2 fails to run #205

Closed
scott314159 opened this issue Mar 28, 2024 · 8 comments
Closed

Example 2 fails to run #205

scott314159 opened this issue Mar 28, 2024 · 8 comments

Comments

@scott314159
Copy link

I am just starting out with seisflows so I decided to try to run the examples. examples 1 and 3 run to completion but example 2 does not. I put the error message below. It fails at the end of iteration 1

  • A few details:
  • seisflows version: 3.0.0
  • installed using the documented git command: git clone --branch devel https://github.com/adjtomo/seisflows.git
  • pytest:
  •    1) seems to fail:  ====== 10 failed, 13 passed, 1 warning, 9 errors in 47.87s ====
    
  •    2)  based on issue #167 I am not sure if this is because the regression testing needs an update or if I have missing libraries etc.?  If there is a list of required installs I can't find it?
    
  •   3) platform linux -- Python 3.11.8, pytest-8.1.1, pluggy-1.4.0
    
  • I am running the examples by seisflows examples run 2

Any advice would be appreciated.

FINALIZING ITERATION

2024-03-28 11:10:49 [INFO] | writing model m_new to /home/scott/Seismic/FWI/Work/example2/output/MODEL_01
2024-03-28 11:10:50 [DEBU] | organizing log files in /home/scott/Seismic/FWI/Work/example2/logs
2024-03-28 11:10:53 [INFO] | running finalization for module 'system.Workstation'
2024-03-28 11:10:54 [INFO] | running finalization for module 'preprocess.Pyaflowa'
Traceback (most recent call last):
File "/homes/scott/miniconda3/envs/seisflows_3.11.8/bin/seisflows", line 8, in
sys.exit(main())
^^^^^^
File "/home/scott/seisflows/seisflows/seisflows.py", line 1428, in main
sf()
File "/home/scott/seisflows/seisflows/seisflows.py", line 450, in call
getattr(self, self._args.command)(**vars(self._args))
File "/home/scott/seisflows/seisflows/seisflows.py", line 802, in submit
system.submit(workdir=self._args.workdir,
File "/home/scott/seisflows/seisflows/system/workstation.py", line 211, in submit
workflow.run()
File "/home/scott/seisflows/seisflows/workflow/inversion.py", line 198, in run
super().run() # Runs task list
^^^^^^^^^^^^^
File "/home/scott/seisflows/seisflows/workflow/forward.py", line 280, in run
func()
File "/home/scott/seisflows/seisflows/workflow/inversion.py", line 677, in finalize_iteration
module.finalize()
File "/home/scott/seisflows/seisflows/preprocess/pyaflowa.py", line 684, in finalize
unix.rm(self.path._logs)
File "/home/scott/seisflows/seisflows/tools/unix.py", line 205, in rm
shutil.rmtree(name)
File "/homes/scott/miniconda3/envs/seisflows_3.11.8/lib/python3.11/shutil.py", line 763, in rmtree
onerror(os.rmdir, path, sys.exc_info())
File "/homes/scott/miniconda3/envs/seisflows_3.11.8/lib/python3.11/shutil.py", line 761, in rmtree
os.rmdir(path, dir_fd=dir_fd)
OSError: [Errno 39] Directory not empty: '/home/scott/Seismic/FWI/Work/example2/scratch/preprocess/logs'

@bch0w
Copy link
Member

bch0w commented Mar 28, 2024

Hi @scott314159, thanks for raising the issue, sorry you had troubles with the examples. The devel branch is in rapid active development and not super stable right now, tests are definitely not passing so that is expected.

Examples for the devel branch were fixed in #195 and I just ran the Example 2 on the latest devel branch succesfully. Can you confirm you are on the latest version (should be 3.0.2) and try the example agian?

@scott314159
Copy link
Author

@bch0w Thanks for the speedy reply. I missed the update (I was running 3.0.0) so I installed 3.0.2 and reran example 2. Unfortunately I still have an issue. When I reran the example I did get an error where the tasktime was too short (this happened in both versions) so I updated the parameter file to change tasktime from 1 ==> 5 and reran (seisflows submit).

I am running CentOS 7.9, python 3.11.8

2024-03-28 14:19:18 [INFO] | setting trial model as starting model (m_try -> m_new)
2024-03-28 14:19:18 [INFO] | misfit of accepted trial model is f=1.246E-01
2024-03-28 14:19:18 [INFO] | resetting line search step count to 0
2024-03-28 14:19:18 [DEBU] | checkpointing workflow to seisflows state file
2024-03-28 14:19:18 [INFO] |
////////////////////////////////////////////////////////////////////////////////
FINALIZE ITERATION
////////////////////////////////////////////////////////////////////////////////
2024-03-28 14:19:18 [INFO] | running finalization for module 'system.Workstation'
2024-03-28 14:19:19 [INFO] | running finalization for module 'solver.Specfem2D'
2024-03-28 14:19:19 [INFO] | running finalization for module 'preprocess.Pyaflowa'
2024-03-28 14:19:27 [DEBU] | checkpointing workflow to seisflows state file
Traceback (most recent call last):
File "/homes/scott/miniconda3/envs/seisflows/bin/seisflows", line 8, in
sys.exit(main())
^^^^^^
File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/seisflows.py", line 1430, in main
sf()
File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/seisflows.py", line 451, in call
getattr(self, self._args.command)(**vars(self._args))
File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/seisflows.py", line 803, in submit
system.submit(workdir=self._args.workdir,
File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/system/workstation.py", line 211, in submit
workflow.run()
File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/workflow/inversion.py", line 197, in run
super().run() # Runs task list
^^^^^^^^^^^^^
File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/workflow/forward.py", line 284, in run
func()
File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/workflow/inversion.py", line 625, in finalize_iteration
super().finalize_iteration()
File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/workflow/forward.py", line 636, in finalize_iteration
module.finalize()
File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/preprocess/pyaflowa.py", line 691, in finalize
unix.rm(self.path._logs)
File "/home/scott/Seismic/FWI/seisflows/seisflows/seisflows/tools/unix.py", line 205, in rm
shutil.rmtree(name)
File "/homes/scott/miniconda3/envs/seisflows/lib/python3.11/shutil.py", line 763, in rmtree
onerror(os.rmdir, path, sys.exc_info())
File "/homes/scott/miniconda3/envs/seisflows/lib/python3.11/shutil.py", line 761, in rmtree
os.rmdir(path, dir_fd=dir_fd)
OSError: [Errno 39] Directory not empty: '/home/scott/Seismic/FWI/Work/example2/scratch/preprocess/logs'

@bch0w
Copy link
Member

bch0w commented Mar 28, 2024

Hey @scott314159, thanks for updating. Super interesting, looks like this might be coming from unexpected behavior of the built-in Python shutil.rmtree function which SeisFlows is wrapping with its unix utilities.

On my machine unix.rm is able to remove non-empty directories but it seems like that is not the case for your situation. I'm using Python 3.11.3 to test but also tried on 3.11.8.

Can you run the following commands in Python and see what happens? It'll just make a non-empty directory and then delete it.

from seisflows.tools import unix
unix.mkdir("test/test")
unix.rm("test")

Could you also let me know if the parameter export_log_files is set in your parameter file, and if it's True or False?

@scott314159
Copy link
Author

@bch0w

  1. I tried the rm test and it seems to work fine. I confirmed that the directory is created and removed as I ran the python line by line.
    Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:53:32) [GCC 12.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.

from seisflows.tools import unix
unix.mkdir("test/test")
unix.rm("test")
unix.mkdir("test/test/test")
unix.rm("test")

  1. export_log_files is set to True

@bch0w
Copy link
Member

bch0w commented Mar 29, 2024

Hey @scott314159, thanks for that. I did a bit more google searching and it seems like this is a Python+Linux issue, and a known bug with theshutil.rmtree command being used on .nfs filesystems, see e.g.,:

https://stackoverflow.com/questions/11228079/python-remove-directory-error-file-exists

My guess is there are some .nfs* files still in that preprocess/logs directory that prevent it from being deleted, leading to this error. Not sure if you have been running in clean working directories, but that might help. You could also try running example 1, just to see if the different preprocessing module in Example 2 is causing issues.

@scott314159
Copy link
Author

@bch0w thanks for sticking with me on this issue.

  1. example 1 runs to completion. Also, example 3 runs to completion. I ran both are on the same file system where example 2 was having trouble.
  2. Thanks for the nfs idea. I did not think of that one. I figured that If nfs file lock was causing the issue then if I run on a local disk I should be OK. So, I ran example 2 on /scratch and it did complete. So, I guess somewhere in the prreprocessing there is a file that isn't being closed?

You have been a big help. I'll be glad to run additional tests if it will help you.

Below are a few more details that may or may not be important:
How I ran example 2:
I ran example 2 a number of times and it always seems to fail at the same place. During the reruns I did a seisflows clean and an rm -rf. The run seems to fail at the same place each time which, I guess, supports the open file hypothesis.

I did look in output/preprocess/logs directory on the failed run and it is empty. I guess it is possible that the nfs lock was released after the workflow failed.

@bch0w
Copy link
Member

bch0w commented Mar 31, 2024

No problem, @scott314159, happy to help! Glad that running on local helped, perhaps there is a bit of a race condition happening here where the file lock cannot be removed before the delete command is run.

Makes sense that it always happens on that finalization command as I think that is the only place we try to delete those log files. I think one way to get around this would be to add a try-except on the deletion with a few seconds in between each try to see if that gives the system enough time to release the lock.

If it's alright with you, I'll open a new issue referencing this one, and have you close this one? Please don't hesitate to raise new issues if you encounter more problems, though!

@bch0w
Copy link
Member

bch0w commented Jun 5, 2024

Closing this issue and continuing discussion in #206

@bch0w bch0w closed this as completed Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants