-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Linux OOM-killer kills Parity due to excessive I/O #1395
Comments
did you run with any |
Nope. It all went well actually in the beginning. Had it running for 2 days after syncing on its own on newly installed linux machine to function as dedicated client to eventually serve all my miners on the LAN. Then today started to connect my ethminer clients to it. Had few issues with geth compatible flags it seems at beginning, but after I changed it to the parity flags, it started and ran smoothly. Went out for 2 hours and came back to this issue. |
Looking at the log it panics right after start. Was there the same panic message when it happened for the first time? Could it be that it ran out of disk space before? |
As I said, no panic/issues the first time. Ethminer Clients just could not connect when I used some geth compatible flags at first on the parity start command line, but after I changed to parity flags, the miners connected and it ran smoothly , or at least until I left. |
Also, it is newly fomatted 500 GB hard drive. I also have 2 GB RAM on machine, could it be too little ? |
What's causing the panic is that there is a node in the storage trie which references another node that should be in the DB, but for some reason isn't. This is a symptom of some other bug. A temporary fix would be to delete the chain's databases (stored in |
500GB / 2GB should be plenty in terms of specs; if possible, please use the |
@gavofyork Use beta branch without deleting chain database ? Do you know if the beta branch is free of this bug ? @rphmeier If this is a common issue, should I not just wait for a fix before using Parity again? I'm just worried I do a resync and then have the bug appear again .... |
Your key-value database is missing a key-value pair that other items in the database refer to. Unfortunately, it doesn't matter which version of the software you run it with, because that lookup will fail regardless. The bug here is not that the lookup is failing, it is that something earlier caused that database entry not to be written out. In order to get a coherent database, all transactions need to be re-run to regenerate the state database. You can do that by re-syncing; as far as I see it there is no other option. Maybe there is another way that I'm not considering. That said, there have been numerous improvements since the 1.1 release. Although I don't know for sure if the scenario which led to the database error here has been solved, the issue is certainly present in the 1.1 release. Trying with the 1.2 beta definitely can't hurt. |
@AlbieC Could you give me the parity and miner flags you used, so that i can test this? i see what are your miner flags and ethminer version? ethminer --farm-recheck 500 -G -F 192.168.127.2 :8545 |
@mista66 After installation (great easy 1 line installation BTW !) , and running just ( parity -j ) after, I left it running for 2 days to sync, not knowing how fast it will sync. After 2 days I got back to the machine and stopped Parity. I was then going to try to connect the miners to it, so I think I started the test with command line something like ( parity --geth --rpc --rpcaddr "192.168.127.2" -jw ) . I then tried to connect the miners by typing/(running bat file) the following : (ethminer -G -F 192.168.127.2 :8545 ) This did not work as I was getting rpc json connection errors if I remember correctly. I then changed the parity start commandline to pure parity flags like the following: ( parity -jw --jsonrpc-interface 192.168.127.2 --jsonrpc-port 8545 --author 0x3e089b6dF17ad25019488fA3665252Bd70E6292F --identity Eilandia ) This worked and the miners all connected and started to mine correctly. I think I had to delete and recreate 1 DAG file on only one of the miner PC's. I was feeling chuffed and left it all running while I left and when I came back 2 hours later all the clients was disconnected and Parity was exited. I could then not run Parity again as I was getting the panic messages every time as soon as it started after few seconds. |
@rphmeier |
@AlbieC ok with no account keys on the mining machine/node ! what if you create an account first? |
This is a database which holds all storage values of all accounts on the blockchain, as well as account balances, nonces, and code. It is populated naturally through the process of syncing, executing transactions, and importing blocks. Under the hood, we use rocksdb to implement this. For some reason, one of the key-value pairs that is expected to be in this database is not. It could be a bug in parity or rocksdb. The symptom you are observing is the failed query, but the cause would be more interesting to reproduce. Please try a fresh sync with the beta branch. At best, the problem resolves itself. At worst, we get another data point towards solving the issue. A full sync usually takes around 3-4 hours on my laptop, so it will almost definitely be faster than 2 days on your machine. |
@rphmeier Ok, I'll start it tonight, but I live in SA with not too fast broadband (5 mbps) and not too many nodes nearby I think. Anyway, lets see how it goes. Is the beta branch also so easy to install or must I compile something first from source. Not too clued up with that !:) |
You will have to build from source, but it doesn't take an absurd amount of time. To build on the Once it is done building, it will output a |
@rphmeier Ok thanks, got the development version compiled and it's busy syncing. 23H00 here now so I'm going to sleep and see if it's finished tomorrow morning. :) 👍 |
Just an update. Last night the Parity Dev Version has not finished syncing yet and when I woke up this morning I saw the "killed" message in console and my next line command prompt. so I tried again ... albie@linux3 ~/parity/target/release $ parity note: Run with RUST_BACKTRACE=1 for a backtrace.albie@linux3 ~/parity/target/release $ but I get the same repsonse every time now. I now went back to the V1.2 1 line installer to try the latest version, it installed very fast and is now unresponsive after the "parity -j " command, but I can see it is frantically busy doing something, so I'll wait till it's finished to see if V1.2 took over where the development version left off with the syncing of the database. |
prior to the parity was apparently killed by the system during a database commit - this likely means the database is corrupted and, you guessed it, a resync is required. 2GB shouldn't be too little memory - i've synced on a lot less - but one of the reasons a process can be killed is running out of memory. it may be worth adding a swap file. also check the amount of disk space free to ensure that there really is 500GB free. |
@gavofyork nothing unsual above the "killed" message except the normal lines of previously imported blocks. I expected another databse corrupt ... losing my steam now with parity. PS. I also got an error with the Parity V1.2 resync earlier. Have given up. Going to try installing Parity on another linux machine. It's busy downloading but it's taking very slow for some reason, maybe too many people downloading Parity now ? :) |
it's difficult to tell regarding hardware, but a simple it would be useful to know if a decently-sized swap partition fixes the problem. |
@gavofyork Is there anyway to know where the database got corrupted? The reason why I ask because then one could probably export till there, and then reimport that part and resync from there ? |
yes - you should be able to export more or less freely. try also, you can try |
Thanks for the help and suggestions @gavofyork . |
do you know how far the syncing got before being killed? |
around the 1.4 mil blocks |
according to one of the answers on http://stackoverflow.com/questions/726690/who-killed-my-process-and-why
i've never used linux mint, but i guess it is possible that it is configured to kill something that uses as much resources while syncing as parity does. if possible i'd recommend ubuntu; it is well tested and no problems have been reported. |
So far as I know Mint is very much based on/the same as Ubuntu. Normally I use tutorials for Ubuntu for Mint. Never had a problem with compatibility . |
A recent commit that tweaked the database configuration substantially increased the I/O load for the database. Such a large amount of I/O could perhaps have caused the kernel to kill parity; we'll re-tweak this now. |
That will explain it.
|
I've also since read that people say increasing swap size dont help with the OOM problem.
|
Also good info here on OOM issues, http://lwn.net/Articles/317814/ |
Just another update. Finished syncing on V1.2 installed on another linux machine. Now I'll just first copy/backup the databse and start testing the mining again. Holding thumbs. Just one other thing. I am currently running 2 live nodes, 1 with Geth 1.4.8 (windows 10) and now the 2nd with Parity V1.2 on different machines. I've noticed the Parity client seems to be a bit behind the Geth client downloading new blocks from network, and sometimes even "misses" a block (or don't show it in the console) ? Anyone else seen this ? |
The console log gets updated every 5 seconds and not with every new block. |
ok, thanks, that's probably where the difference is between Parity and Geth then. |
ok, hold thumbs, here goes testing my miners on my newly installed Parity V1.2 with synced (and backed up ) database ...:) |
I noticed very fast resyncing with setting --bootnodes to a live geth node I am running on my LAN :) |
Only affects startup. You can have it connected all the time with |
Thanks @arkpar . So if I set --bootnodes pointing to a local geth node it should stay connected to that node as long as the geth node stays up/online, but the parity node will also connect to other nodes on network and not stop working/communicating/syncing if my local geth node fails or drops connection ? Did I get it right ? |
Right, except that after reaching the peer limit, connection with the local geth node might get replaced by another peer. --reserved-peers guarantees that this won't happen. |
just another update. Was out for whole day just came back to another instance of "killed" on the Parity node. Had 1 miner mining to it for 24 hours now, and others on and off. No issues except the last killed message when I got back home. However it seems the database was not corrupted this time, as I just started Parity up again and it resynced and the miner resumed mining. It seems to be reources/memory related as I had another console window open when I left this morning busy compiling genoil's ethminer on the same machine and that process also ran into errors and stopped. |
Are you running with an HDD or SSD? |
HDD, but it may be older hardware related too. Don't know. Although, everything else seems to work fine. Busy mining with it using Genoil's ethminer now after new clean Linux install. Haven't tried installing Parity on it again. |
probably using |
Hi guys
Pretty new to this, but have been mining succesfully on Geth node with multiple PC's (Windows and Linux ) on LAN using ethminer. Thought I'd switch to Parity client to give it a go and supporting "client diversity" and all that you know ... Anyway, got it all working, but suddenly this issue below have come up which I don't know how to resolve. Parity stops working and exit.
Any ideas ?
Many thanks.
The text was updated successfully, but these errors were encountered: