Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subgraph built on Mac M1 chip fails: 'registering new sigaltstack failed' #2325

Closed
azf20 opened this issue Mar 31, 2021 · 31 comments · Fixed by #2347
Closed

Subgraph built on Mac M1 chip fails: 'registering new sigaltstack failed' #2325

azf20 opened this issue Mar 31, 2021 · 31 comments · Fixed by #2347

Comments

@azf20
Copy link
Contributor

azf20 commented Mar 31, 2021

Do you want to request a feature or report a bug?
Bug

*What is the current behavior?
Indexing a subgraph built on a Mac M1 Chip fails after deployment with the following error:

graph-node_1  | Mar 30 17:44:30.544 INFO Syncing 1 blocks from Ethereum., code: BlockIngestionStatus, blocks_needed: 1, blocks_behind: 1, latest_block_head: 2, current_block_head: 1, provider: localhost-rpc-0, component: BlockIngestor
graph-node_1  | Mar 30 17:44:40.634 INFO Syncing 1 blocks from Ethereum., code: BlockIngestionStatus, blocks_needed: 1, blocks_behind: 1, latest_block_head: 3, current_block_head: 2, provider: localhost-rpc-0, component: BlockIngestor
graph-node_1  | Mar 30 17:44:41.150 INFO 1 trigger found in this block for this subgraph, block_hash: 0x591ae02eb53ff160e33abe0cc1fcbf0813c380e897523f3ab1aed294cff56159, block_number: 3, subgraph_id: QmRvEgBm8fknmKVcBC9SJECFg1hJZFSsE5SzJ3VSuji8hQ, component: SubgraphInstanceManager
graph-node_1  | thread 'mapping-QmRvEgBm8fknmKVcBC9SJECFg1hJZFSsE5SzJ3VSuji8hQ-635e0c5d-65f3-4468-abf8-5bcfdfff157d' panicked at 'assertion failed: `(left == right)`
graph-node_1  |   left: `-1`,
graph-node_1  |  right: `0`: registering new sigaltstack failed', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/wasmtime-runtime-0.21.0/src/traphandlers.rs:774:9
graph-node_1  | note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
graph-node_1  | Mar 30 17:44:41.312 ERRO Subgraph instance failed to run: Failed to process trigger in block #3 (591ae02eb53ff160e33abe0cc1fcbf0813c380e897523f3ab1aed294cff56159), transaction 89db588442c8299ef04d471ce3dfd0ef21d58e3e63bfefc7b7eda120f962fc81: Mapping terminated before handling trigger: oneshot canceled, code: SubgraphSyncingFailure, id: QmRvEgBm8fknmKVcBC9SJECFg1hJZFSsE5SzJ3VSuji8hQ, subgraph_id: QmRvEgBm8fknmKVcBC9SJECFg1hJZFSsE5SzJ3VSuji8hQ, component: SubgraphInstanceManager

This was also reported in Discord:
Reports error
Mentions M1
On both local node and hosted service

The same subgraph built and deployed on an Intel chip Mac works.

If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem.
Build and deploy a subgraph from a Mac with a new M1 chip.

git clone https://github.com/austintgriffith/scaffold-eth.git m1-bug-demo
cd m1-bug-demo
// install submodules
yarn install
// run a local hardhat chain
yarn chain

In a second tab run a local graph node

yarn graph-run-node

Deploy a simple local contract to the chain and to the subgraph

// deploy contracts
yarn deploy
// register the subgraph
yarn graph-create-local
yarn graph-ship-local

Error visible in the graph node logs

Versions:

"@graphprotocol/graph-cli": "0.18.0",
"@graphprotocol/graph-ts": "0.18.0"

What is the expected behavior?
Subgraph starts indexing as normal.

@azf20
Copy link
Contributor Author

azf20 commented Mar 31, 2021

Pulled down latest images & saw the same, see RUST_BACKTRACE: 1 output

graph-node_1  | Mar 31 19:09:56.321 INFO 1 trigger found in this block for this subgraph, block_hash: 0x8a0ce12e32ac7d3b7855aafb975309cd4c347fb46cfb91d911cbb37518e395d4, block_number: 3, subgraph_id: QmRvEgBm8fknmKVcBC9SJECFg1hJZFSsE5SzJ3VSuji8hQ, component: SubgraphInstanceManager
graph-node_1  | thread 'mapping-QmRvEgBm8fknmKVcBC9SJECFg1hJZFSsE5SzJ3VSuji8hQ-031c6ba4-d9bf-4b1a-95bf-65ce11dbef3b' panicked at 'assertion failed: `(left == right)`
graph-node_1  |   left: `-1`,
graph-node_1  |  right: `0`: registering new sigaltstack failed', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/wasmtime-runtime-0.25.0/src/traphandlers.rs:947:9
graph-node_1  | stack backtrace:
graph-node_1  |    0: rust_begin_unwind
graph-node_1  |    1: core::panicking::panic_fmt
graph-node_1  |    2: wasmtime_runtime::traphandlers::setup_unix_sigaltstack
graph-node_1  |    3: wasmtime_runtime::traphandlers::catch_traps
graph-node_1  |    4: wasmtime::instance::Instantiator::start_raw
graph-node_1  |    5: wasmtime::instance::Instance::new
graph-node_1  |    6: wasmtime::linker::Linker::instantiate
graph-node_1  |    7: graph_runtime_wasm::module::WasmInstance::from_valid_module_with_ctx
graph-node_1  |    8: <futures::stream::for_each::ForEach<S,F,U> as futures::future::Future>::poll
graph-node_1  |    9: futures::task_impl::std::set
graph-node_1  |   10: futures::task_impl::Spawn<T>::poll_future_notify
graph-node_1  |   11: futures::future::Future::wait
graph-node_1  |   12: tokio::runtime::handle::Handle::enter
graph-node_1  | note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
graph-node_1  | Mar 31 19:09:56.429 ERRO Subgraph instance failed to run: Failed to process trigger in block #3 (8a0ce12e32ac7d3b7855aafb975309cd4c347fb46cfb91d911cbb37518e395d4), transaction 979754139ebe955a9105ab1a113f2a4f411fbc1656af9ef0597594885f1eaa9d: Mapping terminated before handling trigger: oneshot canceled, code: SubgraphSyncingFailure, id: QmRvEgBm8fknmKVcBC9SJECFg1hJZFSsE5SzJ3VSuji8hQ, subgraph_id: QmRvEgBm8fknmKVcBC9SJECFg1hJZFSsE5SzJ3VSuji8hQ, component: SubgraphInstanceManager
REPOSITORY                 TAG       IMAGE ID       CREATED         SIZE
graphprotocol/graph-node   latest    79b77fb9260a   2 hours ago     163MB
postgres                   latest    0950a56d0184   12 hours ago    300MB
ipfs/go-ipfs               v0.4.23   7ae05c5b3dd6   14 months ago   51.6MB

@leoyvens
Copy link
Collaborator

leoyvens commented Apr 1, 2021

The issue is actually with running graph node on a mac M1, hopefully wasmtime will fix this in the next release.

@cfallin
Copy link

cfallin commented Apr 2, 2021

Greetings -- just dropping a note here after following a link from the Wasmtime issue -- FYI we're planning to cut our next release of Wasmtime on Monday and it should include a new PR that makes M1 work (though we still don't have it on CI so I can't say for sure; if not, we have a contributor who is actively working on M1 issues). Sorry and thanks for the patience!

@dylankilkenny
Copy link

dylankilkenny commented Apr 3, 2021

Ive been running on m1 mac ok for the past month and only today have I gotten this issue. Anyone know a docker image version which is working?

@evaporei
Copy link
Contributor

evaporei commented Apr 5, 2021

@dylankilkenny since I am also trying to run graph-node on a M1, I will be investigating this as well.

I already found some things that might be breaking at compilation time, when we advance on this matter I will post the solution here 🙂

@evaporei
Copy link
Contributor

evaporei commented Apr 5, 2021

@dylankilkenny I've been able to run graph-node on a Macbook Pro M1 13" locally by:

  1. Pointing to wasmtime's main branch, at this moment it is pointed to this commit: bytecodealliance/wasmtime@4d036a4

To do this I just did this change on runtime/wasm/Cargo.toml, moving from:

wasmtime = "0.25.0"

to

wasmtime = { git = "https://github.com/bytecodealliance/wasmtime.git", branch = "main" }
  1. I tried building the project but I still got problems with the wat crate, so I did an update with:
cargo update -p wat

After these changes I think you should be go to go 😊

@dylankilkenny
Copy link

@otaviopace thanks for the help, its back up and running now 🙂

@schmidsi
Copy link
Member

I just stumbled upon this bug again when I saw this problem on a M1. The graph-node was running in a docker-compose. First we tried with v0.22.0 (older than this issue).

panicked at 'assertion failed: `(left == right)`

After reading through this issue I though it should be fixed, so we tried v0.23.1 or latest at the time of writing. Interestingly, this did not fix the problem. But after some research, I found the docker-compose file on scaffold-eth with version 2c23cce.

Although the panicked error did not appear anymore and the subgraph actually started to index some entities, the graph-node container crashed:

graph-node_1  | Jun 24 10:13:36.473 INFO Done processing Ethereum trigger, data_source: cryptopunks, handler: handleAssign, total_ms: 129, trigger_type: Log, address: 0xb47e…3bbb, signature: Assign(indexed address,uint256), block_hash: 0x33d2d7911ef4a6b291503d95a7d3f93692752b7ff6873223dafe8e3f08a6925e, block_number: 3918258, sgd: 1, subgraph_id: Qmf9kah4dfYmaEBrRiqXEuHXMGRUCKNHcytYSasCygAF9K, component: SubgraphInstanceManager
postgres_1    | 2021-06-24 10:13:37.818 UTC [45] LOG:  could not receive data from client: Connection reset by peer
postgres_1    | 2021-06-24 10:13:37.827 UTC [45] LOG:  unexpected EOF on client connection with an open transaction
graph-node_1  | /usr/local/bin/start: line 52:    85 Killed                  graph-node --node-id "${nodeid//-/}" --postgres-url "$postgres_url" --ethereum-rpc $ethereum --ipfs "$ipfs"
desktop_graph-node_1 exited with code 137

So I see two problems here:

  1. It seems that we reintroduced the "panicked" bug somewhen between version 2c23cce and latest.
  2. What's the problem with postgres in this setup?

@dylankilkenny @otaviopace @azf20 Can you please test if you can successfully deploy a subgraph to your local graph-node running with docker-compose and pointing to v0.23.1 or latest?

@schmidsi schmidsi reopened this Jun 24, 2021
@azf20
Copy link
Contributor Author

azf20 commented Jun 24, 2021

I replicated this with 0.23.1. I know @otaviopace and @leoyvens looked into this previously?

@evaporei
Copy link
Contributor

evaporei commented Jun 24, 2021

I'll look into it 🙂

Last time it was a problem with an update in the wasmtime crate, I'll see if changing that fixes something.

@leoyvens
Copy link
Collaborator

It's a known bug that running graph-node in docker on an M1 will crash, the solution for now is to run it outside docker, until wasmtime fixes it. @schmidsi not sure what's going on there, but running an arbitrary commit isn't something we support.

@evaporei
Copy link
Contributor

Oh, it's that docker only bug, I thought it was breaking again using the raw binary.

@evaporei
Copy link
Contributor

I think for this we gonna have to look into our whole docker image build process, tweak some things until we find what's breaking on the M1 (maybe we're adding something that has incompatibility with wasmtime?).

@leoyvens
Copy link
Collaborator

@otaviopace yes, we could try to fix it on our side by seeing if it reproduces outside docker when running this build process https://github.com/graphprotocol/graph-node/blob/master/docker/Dockerfile#L16-L29.

@cfallin
Copy link

cfallin commented Jun 24, 2021

Hello again from the Wasmtime side -- we're definitely interested in whatever this breakage is. I'm curious (unfortunately I don't have M1 hardware myself to try, sorry!): does running a vanilla wasmtime binary in a plain Docker container fail? That might indicate whether there's something fundamental about Docker's containerization that we need to consider, or whether it's something to do with the embedding, other settings, etc. Please do feel free to create an issue on our side if you find it can be reproduced in this way!

(Docker on Mac/M1 would mean that this is Linux/aarch64, and we do test that but not on M1 chips specifically, so I'm curious if there's something weird about the combination of the two...)

@evaporei
Copy link
Contributor

We actually run wasmtime as a library instead of a binary, I'll see if I can create a simple example only with Docker + wasmtime (without our project). If that fails I'll create an issue on your repo with the info 🙂

Thanks for showing interest in our problem 😊

@schmidsi
Copy link
Member

It's a known bug that running graph-node in docker on an M1 will crash, the solution for now is to run it outside docker, until wasmtime fixes it. @schmidsi not sure what's going on there, but running an arbitrary commit isn't something we support.

I just wanted to share the observation that it was a different error with this specific commit. Maybe this helps to narrow it down.

@kennym
Copy link

kennym commented Jun 30, 2021

Getting the same error on M1 - any fix on the horizon?

@evaporei
Copy link
Contributor

evaporei commented Jun 30, 2021

Hey @kennym, I still didn't have much time to tackle this issue yet.

For now you either have to run graph-node by having Rust installed in your local machine, or use this docker tag for now: https://hub.docker.com/layers/graphprotocol/graph-node/2c23cce/images/sha256-9182194b742c2c1a70658f7f01aa49a30a8ecd2353f7ae1e7cfd43be73f7c262?context=explore

@kennym
Copy link

kennym commented Jun 30, 2021

@otaviopace thanks - that worked!

@kennym
Copy link

kennym commented Jul 10, 2021

Is there a fix for this in the latest releases?

@mdtanrikulu
Copy link

Hey @kennym, I still didn't have much time to tackle this issue yet.

..., or use this docker tag for now: https://hub.docker.com/layers/graphprotocol/graph-node/2c23cce/images/sha256-9182194b742c2c1a70658f7f01aa49a30a8ecd2353f7ae1e7cfd43be73f7c262?context=explore

Just FYI, this seems working for the issue mentioned on the title by @azf20 , but then it raises the issue mentioned by @schmidsi. #2325 (comment)

@evaporei
Copy link
Contributor

Sorry for the long delay on this guys, I've been able to reproduce the error in a isolated example, and I've created an issue in the wasmtime repo 🙂

bytecodealliance/wasmtime#3203

@cfallin if you can take a look, that would be great 😊

@evaporei
Copy link
Contributor

Well, after a lot of help from the wasmtime team ❤️, it ended up that the problem is in QEMU, which Docker for Mac uses for translating machine code from one architecture to another.

In the current version of Docker for Mac (3.6.0), it uses version 5.0.1 of QEMU, which has a bug on the sigaltstack call that wasmtime does. It's only fixed in this commit on version 5.2. Until Docker for Mac bumps this, the error will keep happening 😞 .

I'll keep an eye on Docker release notes and post here when this is fixed 🙂

@azf20
Copy link
Contributor Author

azf20 commented Aug 19, 2021

Thanks @otaviopace! Any idea if it's possible to get this on the Docker for Mac team's radar?

@evaporei
Copy link
Contributor

@azf20 I just created an issue there to see if they have any plans on updating QEMU's version on Docker for Mac 😊 docker/for-mac#5919

@leoyvens
Copy link
Collaborator

It looks like Docker's position is that running x86-64 images on aarch64 is not guaranteed to work. We should fix this on our side by publishing multi-architecture tags.

@azf20
Copy link
Contributor Author

azf20 commented Aug 26, 2021

Nice - @leoyvens that is a matter of adding some additional metadata?

@leoyvens
Copy link
Collaborator

I'm not sure of what exactly needs to change, but it should be a change only to the docker build process.

@roughpandaz
Copy link

This is what worked for me: Mac M1 13" 2020

Thanks to: #2325 (comment)

  1. Run docker-compose without graphnode
  2. Download and build your own graphnode (initial build is slow, but superfast afterwards.)

docker-compose.yml

version: "3"
services:
  ipfs:
    image: ipfs/go-ipfs:v0.4.23
    ports:
      - "5001:5001"
    volumes:
      - ipfs-data:/data/ipfs
  postgres:
    image: postgres
    ports:
      - "5432:5432"
    command: ["postgres", "-cshared_preload_libraries=pg_stat_statements"]
    environment:
      POSTGRES_USER: graph-node
      POSTGRES_PASSWORD: let-me-in
      POSTGRES_DB: graph-node
    volumes:
      - postgres-data:/var/lib/postgresql/data
  ganache:
    image: trufflesuite/ganache-cli
    ports:
      - "8545:8545"
    command: --deterministic --db /data/ganache
    volumes:
      - ganache-data:/data/ganache

volumes:
  ipfs-data:
  postgres-data:
  ganache-data:

graphnode command (must run inside of graphnode repo)

cargo run -p graph-node --release -- \
           --postgres-url postgresql://graph-node:let-me-in@localhost:5432/graph-node \
           --ethereum-rpc mainnet:http://127.0.0.1:8545 \
           --ipfs 127.0.0.1:5001

Note: downgrading to https://hub.docker.com/layers/graphprotocol/graph-node/2c23cce/images/sha256-9182194b742c2c1a70658f7f01aa49a30a8ecd2353f7ae1e7cfd43be73f7c262?context=explore didn't work for me because I was using subgraph version "0.0.5" which requires additional changes to node-modules that broke other stuff https://githubmemory.com/repo/austintgriffith/scaffold-eth/issues/456

@evaporei
Copy link
Contributor

This is working as of Docker for Mac 4.1.0, which updated Qemu to 6.1.0 🎉
https://docs.docker.com/desktop/mac/release-notes/#upgrades-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants