Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Fix consensus error between wasm and native #1595

Merged
merged 4 commits into from
Jan 30, 2019

Conversation

cmichi
Copy link
Contributor

@cmichi cmichi commented Jan 28, 2019

The bug from #1553 occurred because of two things playing into each other:

  • The bucket size of 8192 bytes is quite large, which caused the available heap space to be exhausted too soon (even for allocating just 1 byte a whole bucket of 8192 bytes needs to allocated).
    This is an effect of the buddy allocator algorithm: There is less heap space available to be assigned, because a bucket does not necessarily have to be filled and thus the remaining bucket space is wasted.

  • The other thing was that there was a bug when trying to allocate in a full heap. A double allocation to the first pointer happened then. I adapted the tests so that they would have catched this.

Closes #1553

@cmichi cmichi force-pushed the cmichi-fix-consensus-error-between-wasm-and-native branch 5 times, most recently from 57ec82f to 6a9291e Compare January 28, 2019 09:22
@cmichi cmichi requested a review from pepyakin January 28, 2019 10:18
@cmichi cmichi added the A0-please_review Pull request needs code review. label Jan 28, 2019
Copy link
Contributor

@pepyakin pepyakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to me!

It would be awesome if we could add some more diagnostic tools for catching this or other issues.

As we discussed today privately, we may want to improve the allocator with time, thus new bugs will be introduced anyway. Since the environment we deal with is not the easiest to debug one on one hand and on the other hand I'd consider the allocator to be very sensitive and rather not trivial it would be really good if we added some additional diagnostic tools, like fuzzing/quickchecking and things that will allow us to diagnose issues more easily.

I'd suggest you to check out the wee_alloc repo as it has some nice diagnostic tooling.

@pepyakin
Copy link
Contributor

(To clarify: I don't expect this to be done in this PR)

@pepyakin pepyakin added A6-seemsok and removed A0-please_review Pull request needs code review. labels Jan 28, 2019
@pepyakin pepyakin requested a review from bkchr January 29, 2019 10:29
@gavofyork
Copy link
Member

Even 1K is likely too large a hit to take unless there's concrete evidence that any allocations much below that are few. I think we need to avoid the buddy allocator until there is a slab allocator available to ensure that the lowest leaves in the tree don't get exhausted through numerous allocations of small pieces of data.

@gavofyork
Copy link
Member

Just to be clear, until we have a compelling combination of slab+buddy allocator, I think we will need to keep the bump allocator.

A bucket size of 8192 bytes is quite large and it turned
out that this can exhaust the available heap space too
too quickly.

This is because even for allocating 1 byte a bucket of
8192 bytes is allocated/wasted.
The test didn't use an offset when setting up the heap.
Hence the first successfully allocated pointer was
always `0`.

This is unfortunate since `0` is also the return value
when there is an error.

This lead to us not noticing that the test was failing,
because it did not distinguish between success and error.
@cmichi cmichi force-pushed the cmichi-fix-consensus-error-between-wasm-and-native branch from da5b4bb to 9c3cc02 Compare January 29, 2019 15:31
@cmichi
Copy link
Contributor Author

cmichi commented Jan 29, 2019

Alright, I have added a commit to this PR which reverts the current allocator to the linear allocator.

In order to continue working on a better allocator we just need to revert df44a2ea3fb85d538e79d8566d40ea3303810d73 to get back to the buddy allocator setup.

Do we want to open #300 up again or create a new follow-up ticket?

@cmichi cmichi added A0-please_review Pull request needs code review. and removed A5-grumble labels Jan 29, 2019
@cmichi cmichi force-pushed the cmichi-fix-consensus-error-between-wasm-and-native branch from 9c3cc02 to 5d127ce Compare January 29, 2019 16:45
@cmichi cmichi force-pushed the cmichi-fix-consensus-error-between-wasm-and-native branch from 5d127ce to df44a2e Compare January 29, 2019 17:03
@gavofyork
Copy link
Member

Done: #1615

@gavofyork
Copy link
Member

We may also consider trying out a free-list allocator. https://en.wikipedia.org/wiki/Free_list

@gavofyork gavofyork added A8-looksgood and removed A0-please_review Pull request needs code review. labels Jan 29, 2019
@gavofyork gavofyork closed this Jan 29, 2019
@gavofyork gavofyork reopened this Jan 29, 2019
@rphmeier
Copy link
Contributor

FWIW I did an implementation of that ages ago (before custom allocators were a thing in rust): https://github.com/rphmeier/allocators/blob/master/src/freelist.rs

maybe it can be ported over.

@bkchr
Copy link
Member

bkchr commented Jan 29, 2019

Do we maybe want to make the allocator exchangable?

@gavofyork
Copy link
Member

was thinking that, but i think it's not needed for now, and could even be dangerous in terms of consensus.

@gavofyork gavofyork merged commit 7ec612f into master Jan 30, 2019
@gavofyork gavofyork deleted the cmichi-fix-consensus-error-between-wasm-and-native branch January 30, 2019 13:37
rphmeier pushed a commit that referenced this pull request Jan 30, 2019
* Decrease bucket size

A bucket size of 8192 bytes is quite large and it turned
out that this can exhaust the available heap space too
too quickly.

This is because even for allocating 1 byte a bucket of
8192 bytes is allocated/wasted.

* Return 0 if requested size too large

* Improve test

The test didn't use an offset when setting up the heap.
Hence the first successfully allocated pointer was
always `0`.

This is unfortunate since `0` is also the return value
when there is an error.

This lead to us not noticing that the test was failing,
because it did not distinguish between success and error.

* Revert to linear allocator
MTDK1 pushed a commit to bdevux/substrate that referenced this pull request Apr 12, 2019
* Decrease bucket size

A bucket size of 8192 bytes is quite large and it turned
out that this can exhaust the available heap space too
too quickly.

This is because even for allocating 1 byte a bucket of
8192 bytes is allocated/wasted.

* Return 0 if requested size too large

* Improve test

The test didn't use an offset when setting up the heap.
Hence the first successfully allocated pointer was
always `0`.

This is unfortunate since `0` is also the return value
when there is an error.

This lead to us not noticing that the test was failing,
because it did not distinguish between success and error.

* Revert to linear allocator
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants