Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

"Error allocating TSD" when running Polkadot #6591

Closed
mrcnski opened this issue Jan 20, 2023 · 20 comments
Closed

"Error allocating TSD" when running Polkadot #6591

mrcnski opened this issue Jan 20, 2023 · 20 comments
Labels
I1-panic The node panics and exits without proper error handling. I3-bug Fails to follow expected behavior. S0-design PR/Issue is in the design stage T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance.

Comments

@mrcnski
Copy link
Contributor

mrcnski commented Jan 20, 2023

A user is running into this error when running a release binary which they built using a recent nightly. They also report:

I noticed in this your PR, that you switched out parity-util-mem crate for tikv-jemallocator directly, but jemalloc doesn't work on mac, so I used to rely on the parity-util-mem features to disable using jemalloc and use a different allocator.

However, AFAIK jemalloc should work on Mac (at least it does for me, both pre- and post-M1).

@ordian do you have any ideas what might be the issue? And would it make sense to support a different allocator through a feature flag, which parity-util-mem used to support?

@mrcnski
Copy link
Contributor Author

mrcnski commented Jan 20, 2023

For reference, here is the old allocator selection code in parity-util-mem:

https://github.com/paritytech/parity-common/blob/806ca48a95c06014e0fde74a1382dc128da62206/parity-util-mem/src/lib.rs#L25-L26

@bkchr
Copy link
Member

bkchr commented Jan 20, 2023

Why does jemalloc doesn't work on mac? Maybe they should provide more information, otherwise we can not help.

@tonyalaribe
Copy link
Contributor

I'm facing this issue. I just get the error

<jemalloc>: Error Allocating TBD when I compile and run polkadot.

My computer is a macbook pro, with the m1 pro chip (32gb ram, etc).

I've tried quite a few things so far. Reinstalling make, cmake, installing rosetta 2 for Intel chip emulation, updating to latest rust nightly, updating to the macOS ventura, reinstalling mac xcode developer tools, deleting and recloning polkadot, etc.

This issue has always been the case for me, but my workaround in the past was to compile with a different allocator(mimalloc) via the parity-util-mem features.

@vieira-giulia mentioned also facing a similar error in the past, but its not clear how she got rid of hers.

@tonyalaribe
Copy link
Contributor

This might be unrelated, but on the tikv-jemallocator crate page, there's a table that shows that jemalloc doesn't pass their macos tests either:

image

@ordian
Copy link
Member

ordian commented Jan 20, 2023

@mrcnski I don't know the reason. If it works for you, it probably has to do with the C toolchain used to build jemalloc.

Having a crate for setting a global allocator might make sense, I've thought about it as well. We could extract memory stats there. That would make us not worry about

/// Global allocator. Changing it to another allocator will require changing

That crate could have jemalloc and mimalloc features (both disabled by default). The features should be mutually exclusive (cfg(all(..), compile_error)).

@mrcnski
Copy link
Contributor Author

mrcnski commented Jan 20, 2023

@ordian Would setting the allocator in a crate not introduce the risk of soundness issues, similar to those we encountered with parity-util-mem?

(My understanding of the soundness issues with parity-util-mem: it was unsound particularly if two different versions were used, which was possible because we had dependencies also relying on it... So every time it got updated, we had to coordinate releases across different crates, and be very careful not to miss any.)

@ordian
Copy link
Member

ordian commented Jan 20, 2023

Yes, the soundness issue was due to having to versions with different features. It was mitigated by https://github.com/paritytech/parity-common/blob/806ca48a95c06014e0fde74a1382dc128da62206/parity-util-mem/Cargo.toml#L12, which can be done here as well.

@ordian
Copy link
Member

ordian commented Jan 20, 2023

Disallowing duplicates of parity-util-mem has caused troubles for downstream dependents, but it will not be the case here, because global allocator crate will be only a dependency of "polkadot-as-a-binary" not "polkadot-as-a-library".

EDIT: Hmm, actually, if we extract mem stats, it will be used as a dependency for the overseer, which is not great at all. Maybe it's best to not extract it into a crate after all.

@mrcnski
Copy link
Contributor Author

mrcnski commented Jan 20, 2023

Yeah, I don't see a need for this crate 🤔, but I do think we could/should add a feature flag to Polkadot that enables a different allocator.

@bkchr
Copy link
Member

bkchr commented Jan 20, 2023

EDIT: Hmm, actually, if we extract mem stats, it will be used as a dependency for the overseer, which is not great at all.

TBH, I don't get why we need to use jemalloc for this. As we are mainly supporting Linux, we should only make mem stats works for Linux and report nothing for Mac os.

Then we can make jemalloc optional again. We could also make the jemalloc memory collection optional.

@vieira-giulia
Copy link

Yep I had this, it's a problem for macs with M1 processor regarding make.

These are a few things I did at the time that are maybe worth checking.

  • Double check your rust version and if it's in knightly darwin 64
  • Reinstall make. I remember doing it with brew was not enough so try to reset your make from scratch and delete every older version you have. The one your mac already uses is the problematic one. This is the main thing, cmake and make are weird with M1. I think this should have been solved in new upgrades for macOS but yea just install it again. Brew may not be enough as it keeps the old versions and you need to substitute them
  • Check c make and other c dependencies
  • Reset your terminal and maybe your computer
  • Upgrade your OS if you haven't yet
  • Cargo clean or even clone de repo again to be sure there is no cashed stuff for target that is being reused from previous makes :)

All of these are decently obvious but well worth doing anyway. I remember deleting and reseting the jemalloc dirs but that was just brute forcing the whole thing.

@tonyalaribe
Copy link
Contributor

As a local(temporary) workaround, I tried commenting out where we set the ALLOC static variable in main.rs, but that didnt do much. How do i manually disable the custom allocator for my local debugging?

This is what I tried:
image

@mrcnski
Copy link
Contributor Author

mrcnski commented Jan 24, 2023

@tonyalaribe that seems to work for me when I try it. What error do you get now, still the same one? Perhaps run cargo clean and try again. Also, have you tried all the suggestions from @vieira-giulia?

@tonyalaribe
Copy link
Contributor

@mrcnski it's strange. I get the exact same error even after cargo clean and cargo build. Why would it still show a jemalloc error?

image

@tonyalaribe
Copy link
Contributor

Also, I tried all of @vieira-giulia 's suggestions. But the issue persists

@tonyalaribe
Copy link
Contributor

Thanks @bkchr, I commented the code in the function you shared, and it runs now.

@mrcnski mrcnski added I3-bug Fails to follow expected behavior. S0-design PR/Issue is in the design stage T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance. I1-panic The node panics and exits without proper error handling. labels Jan 25, 2023
@tonyalaribe
Copy link
Contributor

Hi @bkchr I want to take a stab at this, as it's blocking my workflow (quite difficult to run zombienet tests when the default build doesn't run).

About the implementation, and about your comment

TBH, I don't get why we need to use jemalloc for this. As we are mainly supporting Linux, we should only make mem stats works for Linux and report nothing for Mac os.
Then we can make jemalloc optional again. We could also make the jemalloc memory collection optional.

Should I make a PR that hides the jemalloc and memory collection behind a flag, which is turned on by default on Linux? Or what would you suggest as the expected behavior?

@bkchr
Copy link
Member

bkchr commented Feb 2, 2023

Should I make a PR that hides the jemalloc and memory collection behind a flag, which is turned on by default on Linux? Or what would you suggest as the expected behavior?

Generally sounds reasonable.

@tonyalaribe
Copy link
Contributor

The linked PR to introduce the jemalloc-allocator feature flag was merged, so I'll close this issue for now

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I1-panic The node panics and exits without proper error handling. I3-bug Fails to follow expected behavior. S0-design PR/Issue is in the design stage T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance.
Projects
None yet
Development

No branches or pull requests

5 participants