-
Notifications
You must be signed in to change notification settings - Fork 684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
State prunning in the cumulus embedded relay is not being pruned #2850
Comments
CC @skunert |
Hmm the parameters look correct to me, relay chain should prune. Otherwise your node is running fine and is at the tip of the chain? You could run with |
yes we will and we will report back. One thing we can already say is that we have several nodes affected by this and each of them stopped pruning states at a different height of the relay chain history.. The memory usage seems to be OK (below 4GB) |
Does the log line |
This is indeed suspicious! |
Alright I was able to figure out the issue, and it is an issue in our side. See PR: moondance-labs/tanssi#382 The way I discovered it was by comparing synchronizing a
Now I see that that state prunning is working. However after synchronization I still see a bunch of log lines like
And I dont know the reason for this, but I see that you have a similar issue: #19 The paremeters we are using are fast-sync
I can put the raw specs here if you want to try it out. In any case this looks like a more common problem than the one we were observing, so thank you for the input! |
What is the idea behind these changes? To me it looks like this is just extracting the logic from here to the tanssi node without changing anything. I have doubts that this fixes the underlying reason.
Whenever we send out a finality or import notification in the node, we pin the underlying block and state the notification references. The pin here means that this block is not to be pruned. When the notification is dropped, the pin is released and the block can be pruned again. Since you are seeing this log line, the pinning cache is full which means that some notification was not dropped. |
Actually yes it was a false positive from my side, I see it working without these changes too. I will close such PR. We will keep investigating to see if we understand what is going on... but at least I see state pruning working now when I sync fresh. |
Okay. I will also continue investigation. |
Hey @skunert we have been able to investigate a bit more and here is what we found. We are using the 1.3.0 polkadot version (we are using I have managed to reproduce the behavior in which state pruning is not working by doing the following:
After that the log lines that we see are:
Some of those logs are added by us, but two things stand out to me here:
We can confirm that after this, state pruning stops working and it never prunes anymore Does this put some clarity on the issue? |
@girazoki Can you elaborate the steps to arrive at this? Is this correct?:
Are these steps correct? |
yes, I tried it with both 50 and 2000, so maybe the logs reflect to the trial where I tried with 50. But in both cases I observed this pattern |
Can you try to upgrade to 1.4..0, or at least apply the changes form this PR? #2228 Your node is reading |
We can try yes! do we need to resync from scratch or can we apply it on one of the nodes whose database is not being pruned? |
Please try with a fresh node to verify if this fixes the issue. |
@skunert we have been running the collators with such change for one week and apparently the state pruning is working in all of them now. So I guess this issue can be considered resolved. Thank you so much! |
We are running rpc-nodes for tanssi with the embedded relay chain from cumulus, with the following parameters
Which as you can see, it should do state pruning and block pruning, it should keep only 2000 blocks from the relay. Block pruning is working well, but state pruning on the other hand is not. We have tested by asking the state of specific storage keys on blocks that are more than 1M old and we are still able to fetch them. Initially I though it could be related to #1358, but we upgraded our nodes to the 1.3.0 release, and still observe the same behavior.
Is there anything that we are missing? Any paremeter that we are setting wrong?
The text was updated successfully, but these errors were encountered: