Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Parity process killed when warp-syncing #8825

Closed
Tbaut opened this issue Jun 6, 2018 · 7 comments · Fixed by #9088
Closed

Parity process killed when warp-syncing #8825

Tbaut opened this issue Jun 6, 2018 · 7 comments · Fixed by #9088
Labels
F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust. P0-dropeverything 🌋 Everyone should address the issue now.
Milestone

Comments

@Tbaut
Copy link
Contributor

Tbaut commented Jun 6, 2018

Before filing a new issue, please provide the following information.

I'm running:

  • Which Parity version?: 1.11.3 and 1.11.2
  • Which operating system?: Linux
  • How installed?: binaries
  • Are you fully synchronized?: no
  • Which network are you connected to?: ethereum mainnet
  • Did you try to restart the node?: yes

When syncing with --warp-barrier the process gets "randomly killed". If I restart, the snapshot restoraion kicks in (with the chunks previously downloaded), this goes through and the snapshot DL process continues.. until the process gets killed at some point.

Console without snapshot restoration:

thib@Thib ~/.local/share/io.parity.ethereum $ parity --no-ancient-blocks --warp-barrier 5730000
2018-06-06 15:34:52  Starting Parity/v1.11.3-beta-a66e36b-20180605/x86_64-linux-gnu/rustc1.26.1
2018-06-06 15:34:52  Keys path /home/thib/.local/share/io.parity.ethereum/keys/Foundation
2018-06-06 15:34:52  DB path /home/thib/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d
2018-06-06 15:34:52  Path to dapps /home/thib/.local/share/io.parity.ethereum/dapps
2018-06-06 15:34:52  State DB configuration: fast
2018-06-06 15:34:52  Operating mode: active
2018-06-06 15:34:52  Configured for Foundation using Ethash engine
2018-06-06 15:34:54  Sending warning alert CloseNotify
2018-06-06 15:34:54  Updated conversion rate to Ξ1 = US$609.44 (7813574 wei/gas)
2018-06-06 15:34:58  Public node URL: enode://8c820b97320e978b841fe62fed9a6ac2baca0ff59abe40460d2fba7e062ebe3ec4ad48c815a9656d9b26ba3f593640bb4ac4347fcbf0362637770d724b96d7c2@192.168.1.56:30303
2018-06-06 15:35:03  Syncing snapshot 0/1505        #0    5/25 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-06-06 15:35:13  Syncing snapshot 9/1505        #0    8/25 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs
....
2018-06-06 15:52:28  Syncing snapshot 717/1505        #0   26/50 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs
Killed

Console with snapshot restauration:

$ parity --no-ancient-blocks --warp-barrier 5730000
2018-06-06 13:47:13  Starting Parity/v1.11.3-beta-a66e36b-20180605/x86_64-linux-gnu/rustc1.26.1
2018-06-06 13:47:13  Keys path /home/thib/.local/share/io.parity.ethereum/keys/Foundation
2018-06-06 13:47:13  DB path /home/thib/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d
2018-06-06 13:47:13  Path to dapps /home/thib/.local/share/io.parity.ethereum/dapps
2018-06-06 13:47:13  State DB configuration: fast
2018-06-06 13:47:13  Operating mode: active
2018-06-06 13:47:14  Configured for Foundation using Ethash engine
2018-06-06 13:47:14  Removed existing file '/home/thib/.local/share/io.parity.ethereum/jsonrpc.ipc'.
2018-06-06 13:47:15  Sending warning alert CloseNotify
2018-06-06 13:47:15  Updated conversion rate to Ξ1 = US$606.08 (7856891 wei/gas)
2018-06-06 13:47:19  Public node URL: enode://8c820b97320e978b841fe62fed9a6ac2baca0ff59abe40460d2fba7e062ebe3ec4ad48c815a9656d9b26ba3f593640bb4ac4347fcbf0362637770d724b96d7c2@192.168.1.56:30303
2018-06-06 13:47:24  Snapshot initializing (2 chunks restored)        #1    7/25 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-06-06 13:47:34  Snapshot initializing (12 chunks restored)        #1    9/25 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-06-06 13:47:44  Snapshot initializing (21 chunks restored)        #1   10/25 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs

...
2018-06-06 14:29:19  Snapshot initializing (1428 chunks restored)        #1   24/25 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-06-06 14:29:29  Snapshot initializing (1435 chunks restored)        #1   24/25 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-06-06 14:29:39  Syncing snapshot 1441/1505        #1   24/25 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-06-06 14:30:05  Syncing snapshot 1444/1505        #1   24/25 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-06-06 14:30:13  Syncing snapshot 1446/1505        #1   24/25 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs
Killed

Sync trace: https://gist.github.com/Tbaut/0146c51850c0aef81b3dc20f39a658f7#file-v1-11-2-warp-sync-killed-trace-sync

dmesg:

$ dmesg | grep oom
[ 2773.041749] gmain invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
[ 2773.041774]  oom_kill_process+0x219/0x420
[ 2773.041890] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[ 2773.325879] oom_reaper: reaped process 4426 (parity), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 6123.404422] dockerd invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=-500
[ 6123.404448]  oom_kill_process+0x219/0x420
[ 6123.404562] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[ 6123.685207] oom_reaper: reaped process 7131 (parity), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[10039.757902] JS Watchdog invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
[10039.757929]  oom_kill_process+0x219/0x420
[10039.758050] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[10039.972778] oom_reaper: reaped process 8604 (parity), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[13541.420127] kworker/u9:0 invoked oom-killer: gfp_mask=0x14002c0(GFP_KERNEL|__GFP_NOWARN), nodemask=(null),  order=0, oom_score_adj=0
[13541.420164]  oom_kill_process+0x219/0x420
[13541.420276] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[13541.656908] oom_reaper: reaped process 11103 (parity), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[14968.703282] parity invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
[14968.703307]  oom_kill_process+0x219/0x420
[14968.703426] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[14968.919307] oom_reaper: reaped process 21314 (parity), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Memory usage (I have 8GB):
image

@Tbaut Tbaut added F2-bug 🐞 The client fails to follow expected behavior. P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible. M4-core ⛓ Core client code / Rust. labels Jun 6, 2018
@Tbaut Tbaut added this to the 1.12 milestone Jun 6, 2018
@Tbaut
Copy link
Contributor Author

Tbaut commented Jun 6, 2018

Could be linked to #8618 ?

@Tbaut
Copy link
Contributor Author

Tbaut commented Jun 11, 2018

Some more info for @tomusdrw. I first ran it without log up to the point where it gets killer:

$ parity --no-ancient-blocks --warp-barrier 5750000
...
2018-06-11 17:12:42  Syncing snapshot 414/1514        #0   24/25 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-06-11 17:12:52  Syncing snapshot 415/1514        #0   24/25 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-06-11 17:13:02  Syncing snapshot 417/1514        #0   24/25 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-06-11 17:13:10  Syncing snapshot 417/1514        #0   24/25 peers   8 KiB chain 3 MiB db 0 bytes queue 10 KiB sync  RPC:  0 conn,  0 req/s,   0 µs
Killed

and then ran it with logs until it gets killed again:

thib@Thib ~ $ parity --no-ancient-blocks -l sync=trace,snapshot=trace,warp=trace --warp-barrier 5750000 --log-file ~/Playground/log-warp-saync-trace.log

Logs (truncated between 17:14:33 and to fit into a gist 17:21:47): https://gist.github.com/Tbaut/95e1260daf29fdc54f2bc520d05c442f

My memory usage looks like:
24655 log

@Tbaut
Copy link
Contributor Author

Tbaut commented Jun 22, 2018

Still happening on master from this morning.

@5chdn
Copy link
Contributor

5chdn commented Jun 23, 2018

Can we merge this with #8618 ?

@5chdn 5chdn closed this as completed Jun 23, 2018
@5chdn 5chdn added Z7-duplicate 🖨 Issue is a duplicate. Closer should comment with a link to the duplicate. and removed F2-bug 🐞 The client fails to follow expected behavior. P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible. labels Jun 23, 2018
@5chdn 5chdn reopened this Jun 23, 2018
@5chdn
Copy link
Contributor

5chdn commented Jun 23, 2018

Oh, the other one is not during warp sync. So we have two distinct issues here :-(

@5chdn 5chdn added F2-bug 🐞 The client fails to follow expected behavior. P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible. and removed Z7-duplicate 🖨 Issue is a duplicate. Closer should comment with a link to the duplicate. labels Jun 23, 2018
@Tbaut
Copy link
Contributor Author

Tbaut commented Jul 2, 2018

Still happens with the latest master Parity/v1.12.0-unstable-a1a002f-20180702

@5chdn 5chdn added P0-dropeverything 🌋 Everyone should address the issue now. and removed P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible. labels Jul 2, 2018
@ordian
Copy link
Collaborator

ordian commented Jul 2, 2018

I have a memory profile of 11GB peak:
memory_peak
memory_peak_profile

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust. P0-dropeverything 🌋 Everyone should address the issue now.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants