-
Notifications
You must be signed in to change notification settings - Fork 456
Syncing from scratch randomly slows down #352
Comments
It's only using 25% because you have 4 cores. Node is only a single threaded process so it only can use one core. Multithreading would be great, but isn't easy. You could start tackling it if you wanted though :) |
Im not familiar with Node, as im not big fan on JS at all, doesn't matter how many cores i've. |
Sorry if there is some confusion. Node.js can be used in multi threading. Just lisk's node code is not written that way currently. I'm sure it will be rewritten at some point, but currently that is why you see it at 25% on 4 core system |
I have checked, it seems some versions of Node.js supports reasonable multithreading some not. Let's wait for @karmacoma to take position. |
There are plans to clusterize the process at some point. |
There are also plans to re-write time/performance critical functionalities into a low level language. This will probably be done later in the second part of the Ascent phase. |
@karek314 Regarding your possible solutions:
I don't see a possible solution here.
I assume you mean allow for other CPU cores to be utilized. At the persistence layer, PostgreSQL is already utilizing multiple cores, which is where much of the heavy lifting is being conducted. As already mentioned by @Isabello, we plan to clusterize the node.js application itself into several distinct processes. There is also ongoing work: #302 that will improve the efficiency by which "work" is actually delegated to the persistence layer.
@4miners recently introduced a change from At this point, imo the bottleneck is not the language or level at which it is written. The inefficiencies are largely related to the way db connections / queries are being conducted. Once again #302 should address this, especially in the area of block / transaction processing. We are also looking at ways we can improve the
Yes, we are already in agreement on this. We can further discuss your proposal in #347.
@Isabello has reopened the issue on |
Good to know about db inefficiencies caused by how queries are made, but i think that language will be next bottleneck sooner or later.
Yes and no. I've been discussing with her, but couldn't make her to agree with me. Lisk is decentralised project. We should encourage every user to build their database on the top of data collected over peers found in network, instead allowing to go with snapshot. Snapshot is a great way to sync node very fast way, same good as syncing ethereum not in archive mode which is simply fast but less secure. Let me bring some possible attacks when every user is encouraged to chose sync from snapshot. Im saying encouraged since there is no question, it's default option to go with snapshot.
In summary, I believe in every distributed ledger - blockchain based decentralised projects, syncing blockchain from genesis block should be primary option. With fast sync as opt in, in case of Lisk (snapshots) - as it's obviously less secure. Moreover forging delegates should be even clearly encouraged to sync from genesis block as forging ones are the ones writing data to blockchain. I know snapshots were mandatory at first stages in Lisk when node couldn't possibly sync from beginning, but now ? It works stable enough. |
That issue is still valid, slow down during sync can be noticed on 0.9, will investigate. |
After some investigation I found that slow down of sync is probably caused by transactions received by node during sync. Each transaction received need to be processed before and after processing a block (undo/redo to unconfirmed balance). With time they are stacking and block processing became slower and slower. Solution: |
This may also be a side effect from moving receive blocks rejection inside
the sequence. Previously we always rejected blocks during sync. Now we add
the receive block to a sequence and do the check when that sequence comes
up.
…On May 14, 2017 07:05, "Mariusz Serek" ***@***.***> wrote:
After some investigation I found that slow down of sync is probably caused
by transactions received by node during sync. Each transaction received
need to by processed before and after processing a block (undo/redo to
unconfirmed balance). With time they are stacking and block processing
became slower and slower.
Solution:
Don't allow to receive transactions when node is in syncing state.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#352 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/APzFsJmgsQS2_3TtN-9OGzLMTd2vkvM_ks5r5ouJgaJpZM4LJ_6y>
.
|
What may be important to add is that this issue has been occurring from very early versions of Lisk (first testnet release). Up to now including last release. |
Superseded by #2384 |
This is issue i've been already reporting in early versions of Lisk. Syncing from scratch works better than before but it's still far from perfect.
It's understundable that on beginning syncing could be slow due many transactions made and cpu time needed to verify. But after time syncing speeds up to reasonable value. Then suddenly without additional logging process becomes very slow again. (i have marked areas on disk usage chart which shows good speed with green and slow with red). Restarting lisk process fix issue temporarily, but anyway with current version of Lisk it took 24h to sync from genesis block to 3209245310885481431. I've described in #351 why only to this block. #351 is different issue than this, with this one there wasn't any additional errors/logging as i've mentioned before.
Cpu usage seems to be the same while it's syncing with reasonable speed and when syncing very slow. By slow i mean abnormally slow, sometimes getting new block takes longer than network interval which technically makes it impossible to sync.
Additionally CPU usage vary at around 25% roughly, the same information can be read from load average, which simply indicates that syncing can be possibly 4x faster than currently with the same implementation of cryptographic functions and logic to verify block & transactions. There is 3/4 cpu power left in idle state.
Possible solutions:
Improve current logic to fix randomly slowed down syncing
Improve current logic to use additional 75% CPU power which is idle
Implement better than official commonly used JS cryptographic library, possibly written in C++/C or any other low level fast language. Possibly rewrite transaction/block verification code as a C++/C library for JS - this is very necessary step to achieve reasonable scaleability.
Possibly syncing speed can be also improved by moving communication between nodes to Web Sockets as proposed in Network connectivity channels #347 - this can be big step forward as it will positively affect block propagation times over network.
Another problem is that starting Lisk to sync without snapshot form lisk.io is tricky and confusing enough so most users will ignore this and go with centralised snapshot. As i've reported here LiskArchive/lisk-build#57 but it have been ignored, moreover option in installLisk.sh to sync from scratch does not work. I've managed to do it tricky way with creating fake
file.db.gz
andbash lisk.sh rebuild -f file.db.gz
I think this should be as easy as deciding with install location or choosing between main network or test network. So more people will get encouraged to sync from other people, locate issues with syncing etc. It's good approach in decentralised project generally.Screenshot 1
Screenshot 2
Additional information about hardware which i've run tests
The text was updated successfully, but these errors were encountered: