-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orderer fails to start after some time being down. failed to open WAL: fileutil: file already locked #2931
Comments
what is going on here? Is that node the only one in that channel? |
yes, it was some kind of dev env with a single orderer node. I managed to replicate the same on the network with 3 orderers. |
@yacovm pointed |
Merged in 2.3, 2.2, 1.4 and main. |
I think there's another issue caused by this problem: if we have two consecutive blocks creating the same channel, and if a new orderer is being added to this channel, it will be confused and recognize the second block as genesis block, and crashes later due to hash pointer mismatch. possible related code fabric/orderer/common/cluster/replication.go Lines 547 to 573 in c3f6ef9
possible fix would be to ignore second channel creation block during the loop. |
I think the right approach would be to treat channel creations as "config in flight" semantics - stop the pipeline until the channel creation block is committed. This would ensure only one channel creation with the same name. |
Yep, that would be a more proper solution. Although we still need to address the issue in the code mentioned, to accommodate networks running in production, which might already suffer from this - having two consecutive blocks in system channel that create the same application channel. |
So the function you mention already ignores the second channel creation, since it overrides the map entry, does it not? And come to discord, it's easier to collaborate there on this topic. |
@guoger do you want to open a new issue for your use case? We can then make a trivial PR that skips that double occurrence and uses the first one. |
I understood the logic in same way, that the existing code ignores as it overrides the map entry based on channel key name. But the next comment about a need for trivial PR to check & skip the double occurrence:
I assume that this additional check is required as the application channel genesis block is based on the first block & when the application channel gets pulled that time the hash might mismatch. could you correct me if I misunderstood or missing any points here. |
You understood correctly, we need to skip the second and third and so on. |
sorry for late reply. shall i just open a PR and link it to this one? |
fix hyperledger#2931 Signed-off-by: Jay Guo <guojiannan1101@gmail.com>
fix #2931 Signed-off-by: Jay Guo <guojiannan1101@gmail.com>
It happened on my network when an orderer crashed because no space was left of a device. I'm running nodes in k8s, so I just resized related PersistentVolumeClaim and restarted the node. It fails to start with this error
fileutil: file already locked
. It happens only to a specific channel, the other ones are ok and replicated just fine.Logs : https://pastebin.com/pb1XSGn9
For some reason orderer tries to HandleChain two times for the same channel and it fails on the second attempt.
Maybe when it
Restarting raft node channel=channel6
a race condition happens and the previous node hadn't closed the storage and new one already tried to use it.The text was updated successfully, but these errors were encountered: