Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminal/system halt when running "python BLRunner.py --config config-files/config.yaml" in the BEELINE conda environment #66

Closed
shachafl opened this issue May 10, 2022 · 10 comments
Assignees

Comments

@shachafl
Copy link

Following the quick setup instructions in the main page, my terminal halts with the error message below (I have removed the prior ExpressionData0.csv output as it seems to finish properly).

I am using Anaconda for python 3.9 on Ubuntu 16.04

Any advice is welcome.


...
docker run --rm -v /home/lshacha1/Downloads/BEELINE/Beeline:/VBEM/data/ grnbeeline/grnvbem:base /bin/sh -c "time -v -o data/outputs/example/GSD/GRNVBEM/time1.txt ./GRNVBEM data/inputs/example/GSD/GRNVBEM/ExpressionData1.csv data/outputs/example/GSD/GRNVBEM/outFile1.txt "

= Running AR1MA1-VBEM method for GRN inference =

( use [Ctrl]+[C] to abort the execution )

Choosing dataset...

file =

'data/inputs/example/GSD/GRNVBEM/ExpressionData1.csv'

Elapsed time is 0.274912 seconds.

  • DMRT1
    VBEM converges after 3606 iteractions
  • FGF9
    VBEM converges after 2489 iteractions
  • RSPO1
    VBEM converges after 4578 iteractions
  • DHH
    VBEM converges after 2350 iteractions
  • CTNNB1
    VBEM converges after 791 iteractions
  • PGD2
    VBEM converges after 2600 iteractions
  • WT1mKTS
    VBEM converges after 3899 iteractions
  • SRY
    VBEM converges after 2794 iteractions
  • DKK1
    VBEM converges after 3902 iteractions
  • WNT4
    VBEM converges after 3179 iteractions
  • CBX2
    VBEM converges after 4800 iteractions
  • AMH
    VBEM converges after 2632 iteractions
  • NR0B1
    VBEM converges after 3663 iteractions
  • NR5A1
    VBEM converges after 4059 iteractions
  • WT1pKTS
    VBEM converges after 4572 iteractions
  • FOXL2
    VBEM converges after 562 iteractions
  • UGR
    VBEM converges after 1851 iteractions
  • SOX9
    VBEM converges after 2128 iteractions
  • GATA4
    VBEM converges after 3510 iteractions
    Elapsed time is 30.612702 seconds.

docker run --rm -v /home/lshacha1/Downloads/BEELINE/Beeline:/data/ --expose=41269 grnbeeline/arboreto:base /bin/sh -c "time -v -o data/outputs/example/GSD/GENIE3/time.txt python runArboreto.py --algo=GENIE3 --inFile=data/inputs/example/GSD/GENIE3/ExpressionData.csv --outFile=data/outputs/example/GSD/GENIE3/outFile.txt "
Task exception was never retrieved
future: <Task finished coro=<connect.._() done, defined at /opt/conda/lib/python3.7/site-packages/distributed/comm/core.py:288> exception=CommClosedError()>
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 297, in _
handshake = await asyncio.wait_for(comm.read(), 1)
File "/opt/conda/lib/python3.7/asyncio/tasks.py", line 435, in wait_for
await waiter
concurrent.futures._base.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 304, in _
raise CommClosedError() from e
distributed.comm.core.CommClosedError
Traceback (most recent call last):
File "runArboreto.py", line 43, in
main(sys.argv)
File "runArboreto.py", line 32, in main
network = genie3(inDF.to_numpy(), client_or_address = client, gene_names = inDF.columns)
File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 73, in genie3
limit=limit, seed=seed, verbose=verbose)
File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 135, in diy
.compute(graph, sync=True)
File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 2919, in compute
result = self.gather(futures)
File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 1993, in gather
asynchronous=asynchronous,
File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 834, in sync
self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 339, in sync
raise exc.with_traceback(tb)
File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 323, in f
result[0] = yield future
File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
concurrent.futures._base.CancelledError
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7fb5cd650bd0>>, <Task finished coro=<SpecCluster._correct_state_internal() done, defined at /opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py:320> exception=OSError("Timed out trying to connect to 'inproc://172.17.0.2/9/1' after 10 s: Timed out trying to connect to 'inproc://172.17.0.2/9/1' after 10 s: connect() didn't finish in time")>)
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 322, in connect
_raise(error)
File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 275, in _raise
raise IOError(msg)
OSError: Timed out trying to connect to 'inproc://172.17.0.2/9/1' after 10 s: connect() didn't finish in time

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
ret = callback()
File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
future.result()
File "/opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py", line 401, in _close
await self._correct_state()
File "/opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py", line 328, in _correct_state_internal
await self.scheduler_comm.retire_workers(workers=list(to_close))
File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 810, in send_recv_from_rpc
comm = await self.live_comm()
File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 772, in live_comm
**self.connection_args,
File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 334, in connect
_raise(error)
File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 275, in _raise
raise IOError(msg)
OSError: Timed out trying to connect to 'inproc://172.17.0.2/9/1' after 10 s: Timed out trying to connect to 'inproc://172.17.0.2/9/1' after 10 s: connect() didn't finish in time

@shachafl
Copy link
Author

shachafl commented Jun 2, 2022

I have also tried to build the containers from scratch using: . initialize.sh
But this also resulted in a terminal halt and errors (below):

docker run --rm -v /home/lshacha1/Downloads/BEELINE/Beeline:/data/ --expose=41269 grnbeeline/arboreto:base /bin/sh -c "time -v -o data/outputs/example/GSD/GENIE3/time.txt python runArboreto.py --algo=GENIE3 --inFile=data/inputs/example/GSD/GENIE3/ExpressionData.csv --outFile=data/outputs/example/GSD/GENIE3/outFile.txt "
distributed.comm.inproc - WARNING - Closing dangling queue in
Traceback (most recent call last):
File "runArboreto.py", line 43, in
main(sys.argv)
File "runArboreto.py", line 32, in main
network = genie3(inDF.to_numpy(), client_or_address = client, gene_names = inDF.columns)
File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 73, in genie3
limit=limit, seed=seed, verbose=verbose)
File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 135, in diy
.compute(graph, sync=True)
File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 2919, in compute
result = self.gather(futures)
File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 1993, in gather
asynchronous=asynchronous,
File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 834, in sync
self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 339, in sync
raise exc.with_traceback(tb)
File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 323, in f
result[0] = yield future
File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
concurrent.futures._base.CancelledError
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7f3f9baaf750>>, <Task finished coro=<SpecCluster._correct_state_internal() done, defined at /opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py:320> exception=OSError("Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: connect() didn't finish in time")>)
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 322, in connect
_raise(error)
File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 275, in _raise
raise IOError(msg)
OSError: Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: connect() didn't finish in time

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
ret = callback()
File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
future.result()
File "/opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py", line 401, in _close
await self._correct_state()
File "/opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py", line 328, in _correct_state_internal
await self.scheduler_comm.retire_workers(workers=list(to_close))
File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 810, in send_recv_from_rpc
comm = await self.live_comm()
File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 772, in live_comm
**self.connection_args,
File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 334, in connect
_raise(error)
File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 275, in _raise
raise IOError(msg)
OSError: Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: Timed out trying to connect to 'inproc://172.17.0.2/10/1' after 10 s: connect() didn't finish in time

@shachafl
Copy link
Author

shachafl commented Jul 20, 2022 via email

@shachafl
Copy link
Author

The problem with python BLEvaluator.py --config config-files/config.yaml --auc
was that my PyYaml was version 6.0 and required the extra "Loader" parameter under yaml.load(), so to fix the code I modified the file BLEval/init.py:
config_map = yaml.load(config_file_handle, Loader=yaml.CLoader)

@tmmurali
Copy link
Member

Thanks for this report and the fix. @ktakers can we update BLEvaluator.py with this change without breaking compatability with earlier versions of PyYAML?

@shachafl
Copy link
Author

You can also raise the backward compatibility issue with the PyYaml team, and they can solve it by rolling back the change or adding defaults.

By using PyYAML==5.4 (instead of 6.0) as you defined in the requirements.txt and BEELINE conda environments the command:
python BLEvaluator.py --config config-files/config.yaml --auc
works fine.

But I am keeping the issue open for now as keeping Genie3 with the other algorithms still halts my terminal and return errors.

@smartpig-666
Copy link

I encountered the same problem. When I added the genie3 algorithm, I also reported the same error:

docker run --rm -v /home/huxin/Beeline:/data/ --expose=41269 grnbeeline/arboreto:base /bin/sh -c "time -v -o data/outputs/example/Simulation/GENIE3/time.txt python runArboreto.py --algo=GENIE3 --inFile=data/inputs/example/Simulation/GENIE3/ExpressionData.csv --outFile=data/outputs/example/Simulation/GENIE3/outFile.txt "
distributed.comm.inproc - WARNING - Closing dangling queue in <InProc  local=inproc://192.188.0.2/9/1 remote=inproc://192.188.0.2/9/8>
Traceback (most recent call last):
  File "runArboreto.py", line 43, in <module>
    main(sys.argv)
  File "runArboreto.py", line 32, in main
    network = genie3(inDF.to_numpy(), client_or_address = client, gene_names = inDF.columns)
  File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 73, in genie3
    limit=limit, seed=seed, verbose=verbose)
  File "/opt/conda/lib/python3.7/site-packages/arboreto/algo.py", line 135, in diy
    .compute(graph, sync=True) \
  File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 2919, in compute
    result = self.gather(futures)
  File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 1993, in gather
    asynchronous=asynchronous,
  File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 834, in sync
    self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
  File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 339, in sync
    raise exc.with_traceback(tb)
  File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 323, in f
    result[0] = yield future
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
concurrent.futures._base.CancelledError
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7fbacfd61890>>, <Task finished coro=<SpecCluster._correct_state_internal() done, defined at /opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py:320> exception=OSError("Timed out trying to connect to 'inproc://192.188.0.2/9/1' after 10 s: Timed out trying to connect to 'inproc://192.188.0.2/9/1' after 10 s: connect() didn't finish in time")>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 322, in connect
    _raise(error)
  File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 275, in _raise
    raise IOError(msg)
OSError: Timed out trying to connect to 'inproc://192.188.0.2/9/1' after 10 s: connect() didn't finish in time

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py", line 401, in _close
    await self._correct_state()
  File "/opt/conda/lib/python3.7/site-packages/distributed/deploy/spec.py", line 328, in _correct_state_internal
    await self.scheduler_comm.retire_workers(workers=list(to_close))
  File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 810, in send_recv_from_rpc
    comm = await self.live_comm()
  File "/opt/conda/lib/python3.7/site-packages/distributed/core.py", line 772, in live_comm
    **self.connection_args,
  File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 334, in connect
    _raise(error)
  File "/opt/conda/lib/python3.7/site-packages/distributed/comm/core.py", line 275, in _raise
    raise IOError(msg)
OSError: Timed out trying to connect to 'inproc://192.188.0.2/9/1' after 10 s: Timed out trying to connect to 'inproc://192.188.0.2/9/1' after 10 s: connect() didn't finish in time

In addition, I tried to modify the Docker configuration file, but it didn't work

@tmmurali
Copy link
Member

tmmurali commented Nov 2, 2022

Thank you for this report. @ktakers can you take a look at this issue?

@ktakers
Copy link
Collaborator

ktakers commented Nov 2, 2022

I apologize for the late response.

The tornado timeout appears to be the same issue reported in #48 and #42. According to the Arboreto issue aertslab/arboreto#10 , GENIE3 can run successfully despite those timeout errors.

Unfortunately I wasn't able to reproduce that error. Can you please check under the directory outputs/example/GSD/GENIE3 to see if there's a rankedEdges.csv or an outFile.txt, which would indicate that GENIE3 did actually complete successfully?

@smartpig-666
Copy link

You are so kind. Unfortunately, I have been waiting for a long time for genie3 to complete normally , but it cannot generate rankededge.csv file. Now I have reproduced the genie3 algorithm and generated the rankededge separately. Beeline can normally score it

@ktakers
Copy link
Collaborator

ktakers commented Apr 6, 2023

Thank you for reporting and working around the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants