Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline crash at Cannot find MS files #59

Open
AlexKurek opened this issue Feb 16, 2025 · 12 comments
Open

Pipeline crash at Cannot find MS files #59

AlexKurek opened this issue Feb 16, 2025 · 12 comments

Comments

@AlexKurek
Copy link
Contributor

I tried 3 different fields from public projects using the latest LBAdevel. I stage the data using WWW, download html.txt and put it in LiLF/ folder where the repo is cloned. For each field after a while of working and downloading Im getting:

2025-02-15 19:37:31 - INFO - Tar mss/id789458_-3c196/3c196_SB317.MS...
2025-02-15 19:37:32 - INFO - L789460_SB242_uv->mss/id789458
-3c196/3c196_SB318.MS: Average in freq (factor of 1) and time (factor of 2)...
2025-02-15 19:37:42 - INFO - Tar mss/id789458
-3c196/3c196_SB318.MS...
2025-02-15 19:37:43 - INFO - L789460_SB243_uv->mss/id789458
-3c196/3c196_SB319.MS: Average in freq (factor of 1) and time (factor of 2)...
2025-02-15 19:37:53 - INFO - Tar mss/id789458
-3c196/3c196_SB319.MS...
2025-02-15 19:37:53 - INFO - << done << renameavg
2025-02-15 19:37:53 - INFO - Done. Total time: 1h 04m
mv: cannot stat '*MS': No such file or directory
INFO: Note: NumExpr detected 28 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2025-02-15 19:37:56 - INFO - Logging initialised in /storage/akurek/LiLF_workdir/id789458
-_3c196 (file: pipeline-cal_2025-02-15_19:37:56.logger)
2025-02-15 19:37:56 - WARNING - Hostname lof8 unknown.
2025-02-15 19:37:56 - INFO - Scheduler initialised for cluster Unknown: lof8 (maxProcs: 28, max_cpucores: 28).
2025-02-15 19:37:56 - INFO - Found config file: ../lilf.config
2025-02-15 19:37:56 - INFO - Parset: {'parset_dir': '/home/akurek/LiLF/LiLF/../parsets/LOFAR_cal', 'data_dir': 'data-bkp/', 'skymodel': '', 'imaging': 'False', 'fillmissingedges': 'True', 'less_aggressive_flag': 'False', 'develop': 'False'}
2025-02-15 19:37:56 - INFO - >> start >> cleaning
2025-02-15 19:37:56 - INFO - Cleaning...
2025-02-15 19:37:56 - INFO - << done << cleaning
2025-02-15 19:37:56 - ERROR - Cannot find MS files.
Traceback (most recent call last):
File "/home/akurek/LiLF/pipelines/LOFAR_cal.py", line 73, in
MSs = lib_ms.AllMSs(glob.glob(data_dir + '/*MS'), s, check_flags=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/akurek/LiLF/LiLF/lib_ms.py", line 34, in init
raise('Cannot find MS files.')
TypeError: exceptions must derive from BaseException
2025-02-15 19:37:57 - ERROR - Something went wrong in the last pipeline call.

The contents of the workdir at this point look like this:

download download-cal_id789458 id789458_-3c196 id789458-_c2020f3 lilf.config tree.log

lilf.config:

[PiLL]
working_dir = /storage/akurek/LiLF_workdir
download_file = html.txt

tree of a workdir: tree.zip

@AlexKurek
Copy link
Contributor Author

AlexKurek commented Feb 16, 2025

I entered the following debug statements:

print(os.getcwd())
print("glob.glob(data_dir)", glob.glob(data_dir))
print("data_dir", data_dir)

MSs = lib_ms.AllMSs(glob.glob(data_dir + '/*MS'), s, check_flags=False)

and got:

/storage/akurek/LiLF_workdir/id789458_-_3c196
glob.glob(data_dir) ['data-bkp/']
data_dir data-bkp/

/storage/akurek/LiLF_workdir/id789458_-_3c196/data-bkp is empty.

@revoltek
Copy link
Owner

did you run the LOFAR_preprocess by hand in the dir with the html.txt?

@AlexKurek
Copy link
Contributor Author

AlexKurek commented Feb 17, 2025

No, I did:

cd LiLF/

where I have both lilf.conf and html.txt and then:

python /home/akurek/LiLF/pipelines/PiLL.py

and this is crashing.

Now I tried LOFAR_preprocess by hand in the dir with the html.txt. This step finished sucessfully:

319.MS: Average in freq (factor of 1) and time (factor of 2)...
2025-02-17 11:25:55 - INFO - Tar mss/id789458_-_c2020f3/c2020f3_SB319.MS...
2025-02-17 11:25:55 - INFO - << done << renameavg
2025-02-17 11:25:55 - INFO - Done. Total time: 1h 41m

@revoltek
Copy link
Owner

ok, PiLL has not yet been updated to the last refactoring in LBAdevel, best is to follow the (now updated) documentation and call each pipeline in a row

@AlexKurek
Copy link
Contributor Author

On you working directory create a Download directory and put here the html.txt files obtained from a data staging request on Long Term Archive (LTA).

How should I put 2 html.txt files in the Download/ dir? Should I rename one of them or put them in some subfolders?

@revoltek
Copy link
Owner

you can work in 2 different dirs and run 2 preprocess pipelines or you can concatenate the two html.txt files (that are just list of urls) into one

@AlexKurek
Copy link
Contributor Author

During:
python /home/akurek/LiLF/pipelines/LOFAR_ddparallel.py
I got:

2025-02-19 08:37:21 - INFO - Running WSClean...
2025-02-19 08:40:57 - INFO - >> start >> solve_fr
2025-02-19 08:40:57 - INFO - Converting to circular (DATA -> CIRC_PHASEDIFF_DATA)...
2025-02-19 08:40:57 - ERROR - DP3 run problem on:
logs_pipeline-ddparallel_2025-02-19_08:37:08/TC00_lin2circ.log
Traceback (most recent call last):
  File "/home/akurek/LiLF/pipelines/LOFAR_ddparallel.py", line 400, in <module>
    MSs.run(f'DP3 {parset_dir}/DP3-lin2circ.parset msin=$pathMS msin.datacolumn=DATA msout.datacolumn=CIRC_PHASEDIFF_DATA', log='$nameMS_lin2circ.log', commandType="DP3")
  File "/home/akurek/LiLF/LiLF/lib_ms.py", line 173, in run
    self.scheduler.run(check = True, maxProcs = maxProcs)
  File "/home/akurek/LiLF/LiLF/lib_util.py", line 808, in run
    self.check_run(log, commandType)
  File "/home/akurek/LiLF/LiLF/lib_util.py", line 881, in check_run
    raise RuntimeError(commandType+' run problem on:\n'+out+'\n'+errlines)
RuntimeError: DP3 run problem on:
logs_pipeline-ddparallel_2025-02-19_08:37:08/TC00_lin2circ.log
std exception detected: ModuleNotFoundError: No module named 'dp3'

At:
  /home/akurek/LiLF/LiLF/polconv.py(29): <module>
  <frozen importlib._bootstrap>(488): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(995): exec_module
  <frozen importlib._bootstrap>(950): _load_unlocked
  <frozen importlib._bootstrap>(1334): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1360): _find_and_load

@revoltek
Copy link
Owner

revoltek commented Feb 19, 2025

this is a missing path that we need to add automatically to the syngularity.

There's a:
ENV PYTHONPATH $PYTHONPATH:/usr/local/lib/python3.12/site-packages/
so unless you reset your pythonpath somewhere it should be there...

@AlexKurek
Copy link
Contributor Author

AlexKurek commented Feb 27, 2025

I got:

  File "/home/akurek/LiLF/pipelines/LOFAR_ddserial.py", line 980, in <module>
    lib_util.run_wsclean(s, 'wscleanLR-c'+str(cmaj)+'.log', MSs.getStrWsclean(), name=imagenameL, data_column='CORRECTED_DATA',
  File "/home/akurek/LiLF/LiLF/lib_util.py", line 464, in run_wsclean
    s.run(check=True)
  File "/home/akurek/LiLF/LiLF/lib_util.py", line 808, in run
    self.check_run(log, commandType)
  File "/home/akurek/LiLF/LiLF/lib_util.py", line 881, in check_run
    raise RuntimeError(commandType+' run problem on:\n'+out+'\n'+errlines)
RuntimeError: wsclean run problem on:
logs_pipeline-ddserial_2025-02-24_20:06:12/wscleanLR-c1.log
+ + + + + + + + + + + + + + + + + + +
+ An exception occured:
+ >>> Could not parse value '1593.5' for parameter -size to an integer

EDIT:
...so I updated to the current LBAdevel, tried again and got:

2025-02-27 07:04:26 - WARNING - >> skip << output-vstokes
2025-02-27 07:04:26 - INFO - >> start >> output-lres
2025-02-27 07:04:26 - INFO - Cleaning (low res)...
Traceback (most recent call last):
  File "/home/akurek/LiLF/pipelines/LOFAR_ddserial.py", line 986, in <module>
    **beam_kwargs)
      ^^^^^^^^^^^
NameError: name 'beam_kwargs' is not defined

@revoltek
Copy link
Owner

yes, that code is still under devel, we are currently editing the parallel, the serial will come next

@AlexKurek
Copy link
Contributor Author

I tried to do the 4th step using python LOFAR_self.py, but I got:

2025-03-02 16:31:07 - INFO - TC04 (2020-07-23 15:30:04.494): Hour angle: 1.4 hrs - Elev: 73.23 (Sun distance: 39)
2025-03-02 16:31:07 - INFO - TC05 (2020-07-23 16:30:03.490): Hour angle: 2.4 hrs - Elev: 64.49 (Sun distance: 39)
--2025-03-02 16:31:07-- https://lcs165.lofar.eu/cgi-bin/gsmv1.cgi?coord=159.589667,43.876250&radius=7.881383&unit=deg
Resolving lcs165.lofar.eu (lcs165.lofar.eu)... 195.169.3.120
Connecting to lcs165.lofar.eu (lcs165.lofar.eu)|195.169.3.120|:443... failed: Connection timed out.
Retrying.

--2025-03-02 16:33:23-- (try: 2) https://lcs165.lofar.eu/cgi-bin/gsmv1.cgi?coord=159.589667,43.876250&radius=7.881383&unit=deg
Connecting to lcs165.lofar.eu (lcs165.lofar.eu)|195.169.3.120|:443...

This server seems to be not responding. I guess I will just wait until LOFAR_ddserial.py is done.

@revoltek
Copy link
Owner

revoltek commented Mar 3, 2025

yes the server is down, I am not sure when astron will bring it back up. We are moving the new code to use LoTSS as a starting model, so that part will become not relevant, but to use the old pipeline it is required. Better to open a ticket on jira at astron for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants