Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error running example code #1903

Closed
Sanzo00 opened this issue Jul 29, 2022 · 12 comments · Fixed by #1910
Closed

[BUG] Error running example code #1903

Sanzo00 opened this issue Jul 29, 2022 · 12 comments · Fixed by #1910
Assignees
Labels
component:vineyard question Further information is requested

Comments

@Sanzo00
Copy link

Sanzo00 commented Jul 29, 2022

Describe the bug

[error] Check failed: IOError: Receive message failed: Connection reset by peer in "client->Connect(vineyard_socket)", in function void gs::EnsureClient(std::shared_ptr<vineyard::Client>&, const string&), file /work/analytical_engine/core/launcher.cc, line 123
terminate called after throwing an instance of 'std::runtime_error'
  what():  Check failed: IOError: Receive message failed: Connection reset by peer in "client->Connect(vineyard_socket)", in function void gs::EnsureClient(std::shared_ptr<vineyard::Client>&, const string&), file /work/analytical_engine/core/launcher.cc, line 123
*** Aborted at 1659058982 (unix time) try "date -d @1659058982" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGABRT (@0x3f00001ede5) received by PID 126437 (TID 0x7f813dbc1040) from PID 126437; stack trace: ***
    @     0x7f813fd223c0 (unknown)
    @     0x7f813ece818b gsignal
    @     0x7f813ecc7859 abort
    @     0x7f813f09c911 (unknown)
    @     0x7f813f0a838c (unknown)
    @     0x7f813f0a83f7 std::terminate()
    @     0x7f813f0a86a9 __cxa_throw
    @     0x7f8149263000 (unknown)
    @           0x48ebc0 gs::GrapeInstance::Init()
    @           0x46f353 gs::GrapeEngine::Start()
    @           0x45da42 main
    @     0x7f813ecc90b3 __libc_start_main
    @           0x45e345 (unknown)
    @                0x0 (unknown)
Traceback (most recent call last):
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/gscoordinator/__main__.py", line 3, in <module>
    launch_graphscope()
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/gscoordinator/coordinator.py", line 1779, in launch_graphscope
    coordinator_service_servicer = CoordinatorServiceServicer(
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/gscoordinator/coordinator.py", line 175, in __init__
    if not self._launcher.start():
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/gscoordinator/launcher.py", line 174, in start
    self._create_services()
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/gscoordinator/launcher.py", line 610, in _create_services
    self._start_analytical_engine()
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/gscoordinator/launcher.py", line 592, in _start_analytical_engine
    time.sleep(1)
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/client/rpc.py", line 72, in waiting_service_ready
    self._stub.HeartBeat(request)
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses"
        debug_error_string = "{"created":"@1659059576.256340172","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1659059576.256338821","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "gcn.py", line 11, in <module>
    graph = load_cora()
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/dataset/cora.py", line 80, in load_cora
    sess = get_default_session()
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/client/session.py", line 1477, in get_default_session
    return _default_session_stack.get_default()
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/client/session.py", line 1498, in get_default
    sess = session(cluster_type="hosts", num_workers=1)
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/client/utils.py", line 357, in wrapper
    return_value = func(*args, **kwargs)
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/client/session.py", line 724, in __init__
    self._connect()
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/client/session.py", line 1065, in _connect
    self._grpc_client.waiting_service_ready(
  File "/home/sanzo/software/miniconda/4.12/envs/sanzo/lib/python3.8/site-packages/graphscope/client/rpc.py", line 82, in waiting_service_ready
    raise ConnectionError(f"Connect coordinator timeout, {msg}")
ConnectionError: Connect coordinator timeout, code: UNAVAILABLE, details: failed to connect to all addresses

To Reproduce

import graphscope
from graphscope.dataset import load_ogbn_mag

g = load_ogbn_mag()

Environment (please complete the following information):

  • GraphScope version: 0.15.0
  • OS: Linux
  • Version: 20.04
  • Kubernetes Version : no
@welcome
Copy link

welcome bot commented Jul 29, 2022

Thanks for opening your first issue here! Be sure to follow the issue template! And a maintainer will get back to you shortly!
Please feel free to contact us on DingTalk, WeChat account(graphscope) or Slack. We are happy to answer your questions responsively.

@sighingnow
Copy link
Collaborator

Hi @Sanzo00,

Thanks for reporting. Could you please paste your pip3 list output here?

@Sanzo00
Copy link
Author

Sanzo00 commented Jul 29, 2022

Hi @sighingnow , here this my pip3 list output:

Package                      Version
---------------------------- ----------------------
Package                Version
---------------------- ---------
aenum                  3.1.11
aiobotocore            2.3.4
aiohttp                3.8.1
aioitertools           0.10.0
aiosignal              1.2.0
aliyun-python-sdk-core 2.13.36
aliyun-python-sdk-kms  2.15.0
argcomplete            2.0.0
async-timeout          4.0.2
asynctest              0.13.0
attrs                  22.1.0
botocore               1.24.21
cachetools             5.2.0
certifi                2022.6.15
cffi                   1.15.1
charset-normalizer     2.1.0
cmake                  3.22.5
crcmod                 1.7
cryptography           37.0.4
cycler                 0.11.0
Cython                 3.0a6
etcd-distro            3.5.1
fonttools              4.34.4
frozenlist             1.3.0
fsspec                 2022.7.1
future                 0.18.2
google-auth            2.9.1
graphscope             0.15.0
graphscope-client      0.15.0
gremlinpython          3.6.1
grpcio                 1.48.0
grpcio-tools           1.48.0
gs-apps                0.15.0
gs-coordinator         0.15.0
gs-engine              0.15.0
gs-include             0.15.0
hdfs3                  0.3.1
idna                   3.3
importlib-metadata     4.12.0
isodate                0.6.1
jmespath               0.10.0
kiwisolver             1.4.4
kubernetes             24.2.0
matplotlib             3.5.2
msgpack                1.0.4
multidict              6.0.2
nest-asyncio           1.5.5
networkx               2.6
numpy                  1.21.6
oauthlib               3.2.0
orjson                 3.7.8
oss2                   2.16.0
packaging              21.3
pandas                 1.3.5
pickle5                0.0.12
Pillow                 9.2.0
pip                    22.1.2
protobuf               3.18.1
psutil                 5.9.1
pyarrow                6.0.0
pyasn1                 0.4.8
pyasn1-modules         0.2.8
pycparser              2.21
pycryptodome           3.15.0
pyparsing              3.0.9
pysimdjson             5.0.1
python-dateutil        2.8.2
pytz                   2022.1
PyYAML                 6.0
requests               2.28.1
requests-oauthlib      1.3.1
rsa                    4.9
s3fs                   2022.7.1
scipy                  1.7.3
setuptools             61.2.0
shared-memory38        0.1.2
six                    1.16.0
sortedcontainers       2.4.0
tqdm                   4.64.0
treelib                1.6.1
typing_extensions      4.3.0
urllib3                1.26.11
vineyard               0.6.2
vineyard-io            0.6.2
websocket-client       1.3.3
wheel                  0.37.1
wrapt                  1.14.1
yarl                   1.7.2
zipp                   3.8.1
zstd                   1.5.2.5

@sighingnow
Copy link
Collaborator

Cannot reproduce.

Could you please try python3 -m vineyard --socket=/tmp/vineyard.sock to see if vineyardd could be launched as expected?

Thanks!

@Sanzo00
Copy link
Author

Sanzo00 commented Jul 29, 2022

I tried as you said and this is the output:

(graphscope) ➜  ~ python3 -m vineyard --socket=/tmp/vineyard.sock
I20220729 12:32:34.738972 131355 vineyardd.cc:91] Hello vineyard v0.6.2!
I20220729 12:32:34.739400 131355 meta_service.h:94] meta service is starting ...
I20220729 12:32:36.463382 131355 etcd_launcher.cc:93] Starting the etcd server
I20220729 12:32:36.463488 131355 etcd_launcher.cc:101] Found etcd at: /home/sanzo/software/miniconda/4.12/envs/graphscope/bin/etcd
I20220729 12:32:36.466235 131355 etcd_launcher.cc:204] Etcd launched: pid = 131433, listen on 2379
{"level":"info","ts":1659069156.5181363,"caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_LOG_LEVEL","variable-value":"error"}
[error] Check failed: Etcd error: Etcd has been launched but failed to connect to it in "root_vs->Serve(StoreType::kDefault)", in function vineyard::Status vineyard::VineyardRunner::Serve(), file /work/v6d/src/server/server/vineyard_runner.cc, line 63

Unhandled exception:
  std::exception:what(): Check failed: Etcd error: Etcd has been launched but failed to connect to it in "root_vs->Serve(StoreType::kDefault)", in function vineyard::Status vineyard::VineyardRunner::Serve(), file /work/v6d/src/server/server/vineyard_runner.cc, line 63

@sighingnow
Copy link
Collaborator

Looks quite strange. Could you please paste out of etcd, and python3 -m etcd_distro.etcd ?

Thanks!

@sighingnow sighingnow added question Further information is requested component:vineyard labels Jul 29, 2022
@sighingnow sighingnow self-assigned this Jul 29, 2022
@Sanzo00
Copy link
Author

Sanzo00 commented Jul 29, 2022

etcd --version:

(graphscope) ➜  ~ etcd --version
etcd Version: 3.5.1
Git SHA: e8732fb5f
Go Version: go1.16.3
Go OS/Arch: linux/amd64

python3 -m etcd_distro.etcd:

(graphscope) ➜  ~ python3 -m etcd_distro.etcd
{"level":"info","ts":"2022-07-29T14:32:42.698+0800","caller":"etcdmain/etcd.go:72","msg":"Running: ","args":["/home/sanzo/software/miniconda/4.12/envs/graphscope/lib/python3.7/site-packages/etcd_distro/etcdbin/etcd"]}
{"level":"warn","ts":"2022-07-29T14:32:42.699+0800","caller":"etcdmain/etcd.go:104","msg":"'data-dir' was empty; using default","data-dir":"default.etcd"}
{"level":"info","ts":"2022-07-29T14:32:42.699+0800","caller":"etcdmain/etcd.go:115","msg":"server has been already initialized","data-dir":"default.etcd","dir-type":"member"}
{"level":"info","ts":"2022-07-29T14:32:42.699+0800","caller":"embed/etcd.go:131","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:2380"]}
{"level":"info","ts":"2022-07-29T14:32:42.699+0800","caller":"embed/etcd.go:367","msg":"closing etcd server","name":"default","data-dir":"default.etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"]}
{"level":"info","ts":"2022-07-29T14:32:42.699+0800","caller":"embed/etcd.go:369","msg":"closed etcd server","name":"default","data-dir":"default.etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"]}
{"level":"fatal","ts":"2022-07-29T14:32:42.699+0800","caller":"etcdmain/etcd.go:203","msg":"discovery failed","error":"listen tcp 127.0.0.1:2380: bind: address already in use","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/server/etcdmain/etcd.go:203\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/server/etcdmain/main.go:40\nmain.main\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/server/main.go:32\nruntime.main\n\t/home/remote/sbatsche/.gvm/gos/go1.16.3/src/runtime/proc.go:225"}

@sighingnow
Copy link
Collaborator

It seems that your local 2380 port is in use but both vineyard and graphscope failed to detect that.

@sighingnow
Copy link
Collaborator

Hi @Sanzo00,

Could you drop a message to me via wechat or dingding ? I need more information about your environment settings as I cannot see what happens currently.

You could find me on wechat or dingding via 13240327026.

Thanks.

@sighingnow
Copy link
Collaborator

I think there's might be a program that listening on other network interfaces on the 2380 port so that our detection procedure failed.

@Sanzo00
Copy link
Author

Sanzo00 commented Jul 29, 2022

Yes, there are other programs occupying this port. After I killed that program, it can be executed normally.

@sighingnow
Copy link
Collaborator

Happy to know that it works finally.

It is quite strange that we cannot detect the port is in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:vineyard question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants