Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add risc-v support for eth-docker #1873

Open
lazyprogrammerio opened this issue Jul 15, 2024 · 40 comments
Open

Add risc-v support for eth-docker #1873

lazyprogrammerio opened this issue Jul 15, 2024 · 40 comments
Labels
enhancement New feature or request

Comments

@lazyprogrammerio
Copy link

As the upstream software RISC-V support has come a long way, there is the real possibility of a dockerized approach to Ethereum staking on RISC-V boards like Sifive, Milk-V, LicheePI.

@lazyprogrammerio
Copy link
Author

Current state of building and running ETH execution clients on RISC-V:

  • geth - WORKS

@lazyprogrammerio
Copy link
Author

lazyprogrammerio commented Jul 16, 2024

Current state of building and running ETH consensus clients on RISC-V:

@lazyprogrammerio
Copy link
Author

For Prysm -> Bazel issue created: bazelbuild/bazel#23018
For Lodestar -> riscv-forks/electron-riscv-releases#1

@lazyprogrammerio
Copy link
Author

Lodestar: After deciding to not install the binaries for the Electron dependency to see how far I can go, I reached the big blocker, which is classic-level (a LevelDb wrapper) - no support for RISC-V in the assembly code.

PR was created to see if it fixes the issue: Level/classic-level#94.

One note here, Lodestar depends on a classic-level, which depends on leveldb. Leveldb version used on classic-level seems to be 7 years old (if I am not wrong): https://github.com/google/leveldb/commits/v1.20, see https://github.com/Level/classic-level/commits/main/deps/leveldb.

@haurog
Copy link
Contributor

haurog commented Jul 16, 2024

Nimbus - DOES NOT WORK: gcc: error: ‘-march=native’: ISA string must begin with rv32 or rv64 Probably easy to fix and get a step further by adding rv64 target

Grandine - DOES NOT WORK: error: failed to run custom build command for ring v0.16.20 Seems to fail due to missing arch info in that crate.

@lazyprogrammerio
Copy link
Author

lazyprogrammerio commented Jul 16, 2024

Erigon - DOES NOT WORK: github.com/prysmaticlabs/gohashtree@v0.0.3-alpha.0.20230502123415-aafd8b3ca202/hash.go:77:5: undefined: supportedCPU. From how it looks like, the supported amd64 and arm64 arches have the code written in assembly, which is going to be quite a challenge for RISC-V: https://github.com/prysmaticlabs/gohashtree/blob/main/hash_arm64.s

@lazyprogrammerio
Copy link
Author

lazyprogrammerio commented Jul 17, 2024

With @haurog's fix, here is a Dockerfile to build Nimbus beacon/validator:

FROM alpine:edge

RUN apk update
RUN apk add nim
RUN nim --version

RUN apk update && apk add --no-cache make gcc musl-dev linux-headers git bash git-lfs

WORKDIR /usr/src

RUN bash -c "git clone --recurse-submodules -j8 https://github.com/lazyprogrammerio/nimbus-eth2 nimbus-eth2"

RUN bash -c "cd nimbus-eth2 && make USE_SYSTEM_NIM=1 -j$(nproc) update"

RUN bash -c "cd nimbus-eth2 && make USE_SYSTEM_NIM=1 -j$(nproc) nimbus_beacon_node nimbus_validator_client"

Output:

Build completed successfully: build/nimbus_validator_client
Build completed successfully: build/nimbus_beacon_node

@lazyprogrammerio
Copy link
Author

Docker file to build Nimbus for eth-docker

# Build Nimbus in a stock alpine container
FROM nimbus/devel:stage1 AS builder
# nimbus/devel:stage1 is the above comment Dockerfile built image

# Included here to avoid build-time complaints
ARG DOCKER_TAG
ARG DOCKER_VC_TAG
ARG DOCKER_REPO
ARG DOCKER_VC_REPO

ARG BUILD_TARGET
ARG SRC_REPO

# Pull all binaries into a second stage deploy debian container
FROM alpine:edge AS consensus

ARG USER=user
ARG UID=10002

RUN apk update && apk add  ca-certificates bash tzdata git curl
RUN apk update && apk add --no-cache make gcc musl-dev linux-headers git bash git-lfs
# See https://stackoverflow.com/a/55757473/12429735RUN
RUN adduser \
    --disabled-password \
    --gecos "" \
    --home "/nonexistent" \
    --shell "/usr/sbin/nologin" \
    --no-create-home \
    --uid "${UID}" \
    "${USER}"

RUN mkdir -p /var/lib/nimbus/ee-secret && chown -R ${USER}:${USER} /var/lib/nimbus && chmod 700 /var/lib/nimbus && chmod 777 /var/lib/nimbus/ee-secret

# Cannot assume buildkit, hence no chmod
COPY --from=builder --chown=${USER}:${USER} /usr/src/nimbus-eth2/build/nimbus_beacon_node /usr/local/bin/
COPY --chown=${USER}:${USER} ./docker-entrypoint.sh /usr/local/bin/
COPY --chown=${USER}:${USER} ./validator-exit.sh /usr/local/bin/
# Belt and suspenders
RUN chmod -R 755 /usr/local/bin/*

USER ${USER}

ENTRYPOINT ["nimbus_beacon_node"]

FROM alpine:edge AS validator

ARG USER=user
ARG UID=10000

RUN apk update && apk add --no-cache make gcc musl-dev linux-headers git bash git-lfs

# See https://stackoverflow.com/a/55757473/12429735RUN
RUN adduser \
    --disabled-password \
    --gecos "" \
    --home "/nonexistent" \
    --shell "/usr/sbin/nologin" \
    --no-create-home \
    --uid "${UID}" \
    "${USER}"

RUN mkdir -p /var/lib/nimbus/ee-secret && chown -R ${USER}:${USER} /var/lib/nimbus && chmod 700 /var/lib/nimbus && chmod 777 /var/lib/nimbus/ee-secret

# Cannot assume buildkit, hence no chmod
COPY --from=builder --chown=${USER}:${USER} /usr/src/nimbus-eth2/build/nimbus_beacon_node /usr/local/bin/
COPY --chown=${USER}:${USER} ./docker-entrypoint.sh /usr/local/bin/
COPY --chown=${USER}:${USER} ./validator-exit.sh /usr/local/bin/
# Belt and suspenders
RUN chmod -R 755 /usr/local/bin/*

USER ${USER}

ENTRYPOINT ["nimbus_beacon_node"]

FROM alpine:edge AS validator

ARG USER=user
ARG UID=10000

RUN apk update && apk add --no-cache make gcc musl-dev linux-headers git bash git-lfs
# See https://stackoverflow.com/a/55757473/12429735RUN
RUN adduser \
    --disabled-password \
    --gecos "" \
    --home "/nonexistent" \
    --shell "/sbin/nologin" \
    --no-create-home \
    --uid "${UID}" \
    "${USER}"

# Create data mount point with permissions
RUN mkdir -p /var/lib/nimbus && chown -R ${USER}:${USER} /var/lib/nimbus && chmod -R 700 /var/lib/nimbus

# Cannot assume buildkit, hence no chmod
COPY --from=builder --chown=${USER}:${USER} /usr/src/nimbus-eth2/build/nimbus_validator_client /usr/local/bin/
COPY --chown=${USER}:${USER} ./docker-entrypoint-vc.sh /usr/local/bin/
# Belt and suspenders
RUN chmod -R 755 /usr/local/bin/*

USER ${USER}

ENTRYPOINT ["nimbus_validator_client"]

@yorickdowne
Copy link
Contributor

I've adjusted Dockerfile.source for Nimbus to build on alpine:3. Test that on your RISC-V machine, if you would.

Can you also get me output of uname -a on that machine, please. That should allow me to adjust ./ethd config so it offers a Nimbus/Geth combo on Risc-V

@yorickdowne
Copy link
Contributor

Went for it by looking for riscv. Try an ./ethd config and see how it behaves for you, please.

@yorickdowne
Copy link
Contributor

You'd need to adjust NIM_SRC_REPO and/or NIM_SRC_BUILD_TARGET manually in .env until that build fix makes it into a release

@haurog
Copy link
Contributor

haurog commented Jul 19, 2024

Here is the output from uname -a
Linux 5.10.113+ #1 SMP PREEMPT Thu Apr 25 13:17:48 UTC 2024 riscv64 riscv64 riscv64 GNU/Linux

@yorickdowne
Copy link
Contributor

Thanks. I could adjust the grep to riscv64 as other riscv architectures won’t work with existing docker images. But, no rush. Let’s see whether it works at all, first

@yorickdowne
Copy link
Contributor

yorickdowne commented Jul 19, 2024

Checking available products. Milk-V Jupiter seems likely with NVMe and 16 GiB RAM. Ditto Sifive HiFive Unmatched Rev. B

Still hard mode compared to Odroid H4 (Ultra), but doable.

LicheePi has no NVMe from what I can see, not a good choice.

@haurog
Copy link
Contributor

haurog commented Jul 19, 2024

That pretty much is our conclusion as well. The boards we have access to at the moment are rather to see if we actually get anything running. Maybe there is a chance to split consensus and execution to two boards, but it still is gonna be difficult. Unfortunately the Jupiter is not widely available at the moment and there are only some preorder units that have been sent out. The development of new CPUs and boards seem to be very fast and in a year or so we might be in a very different position. Ideally we would have clients ready to be used by then.

@haurog
Copy link
Contributor

haurog commented Jul 22, 2024

Did a pull request for nimbus: status-im/nimbus-eth2#6439

If accepted and merged to the stable branch we will be able to directly build from their repo. The build script automatically checks if it is being built on a risc-v board and we do not have to change any build parameters.

@yorickdowne
Copy link
Contributor

Nice! Does ethd config detect riscv and offer only nimbus and Geth, in your testing?

@haurog
Copy link
Contributor

haurog commented Jul 22, 2024

I think @lazyprogrammerio does all the configurations in the .env file directly. As far as I know, nothing has yet been changed in ethd config.

@yorickdowne
Copy link
Contributor

ethd config already detects riscv and acts accordingly. It offers nimbus and Geth and sets Dockerfile.source. It doesn’t change the source repo as that’s maybe not necessary once your pr has been accepted.

that code hasn’t been tested as I don’t have a riscv

@haurog
Copy link
Contributor

haurog commented Jul 22, 2024

Ah, I see totally forgot that you implemented that already. Thanks for reminding me. @lazyprogrammerio have you tested it? Otherwise I will test it tomorrow.

@haurog
Copy link
Contributor

haurog commented Jul 22, 2024

@yorickdowne, I just tested it. The config works. dockerfile.source is set, but accidentally you set it for nethermind (NM) instead of Nimbus (NIM). After the config finishes it fails with the following error:

Total reclaimed space: 11.34kB
[+] Pulling 14/15
 ✔ execution Skipped 0.0s 
 ✔ validator Skipped 0.0s 
 ✔ consensus Skipped 0.0s 
 ✔ grafana Skipped 0.0s 
 ✔ mev-boost Skipped 0.0s 
 ✔ validator-keys Skipped 0.0s 
 ✔ prometheus Skipped 0.0s 
 ✘ blackbox-exporter Error 1.0s 
 ✘ node-exporter Error 1.0s 
 ✘ promtail Error 1.0s 
 ✘ validator-exit Error 1.0s 
 ⠋ cadvisor Pulling 1.0s 
 ✘ loki Error 1.0s 
 ✘ json-exporter Error 1.0s 
 ✘ ethereum-metrics-exporter Error 1.0s 
no matching manifest for linux/riscv64 in the manifest list entries

./ethd terminated with exit code 18 on line 20
This happened during ./ethd config 

I guess there are no docker entries for riscv64 for most needed images.

@yorickdowne
Copy link
Contributor

Got it thanks, I’ll fix that!

Yes indeed. Arm64 is rare, riscv64 is not a thing. The clients will need to be source compiled locally until / unless some teams start publishing riscv64 images

@yorickdowne yorickdowne added the enhancement New feature or request label Jul 22, 2024
@haurog
Copy link
Contributor

haurog commented Jul 23, 2024

I tested some more execution client builds on riscv. I follow the docs from each project to build the client locally.

BESU: builds, but when running is missing a library 'ckzg4844jni' might be: https://github.com/ethereum/c-kzg-4844. Needs further investigation.

Nethermind: no dotnet available on device. Dotnet has been built for riscv, might need to install manually.

Reth: builds, but fails starting:

2024-07-23T10:31:20.493075Z  INFO Opening database path="/home/haurog3389/.local/share/reth/mainnet/db"
2024-07-23T10:31:20.599685Z ERROR shutting down due to error
Error: failed to open the database: unknown error code (12)

Might be due to disk space limitations. Needs to be tested again with an actual ssd.

To conclude all the build tests:
geth and nimbus are working. These 2 are perfect for the inital tests as they still are the most resource efficient clients. Besu and Reth might become useable with some modifications. Nethermind needs further investigation into how to run dotnet on riscv. Erigon is most probably a no-go due to assemby language dependencies. Lighthouse, teku and lodestar need additional investigation and maybe some fixes to get the running. Prysm might be the most difficult one as the build tools do not support riscv.

@haurog
Copy link
Contributor

haurog commented Jul 24, 2024

While trying to get to the bottom of the lighthouse build failure I found someone from flashbots to try to build clients on risc-v: RustCrypto/utils#1087

I will try to contact them.

@haurog
Copy link
Contributor

haurog commented Jul 24, 2024

Did a pull request for the failing library in the lighthouse build. Lets hope this will fix the build: sigp/ethereum_hashing#8

@haurog
Copy link
Contributor

haurog commented Jul 27, 2024

Lighthouse builds locally with a lot of patching and upgrading dependencies. Will have to see what the best course of action is to get these changes into lighthouse and its dependencies.

@garyschulte
Copy link

garyschulte commented Aug 21, 2024

BESU: builds, but when running is missing a library 'ckzg4844jni' might be: https://github.com/ethereum/c-kzg-4844. Needs further investigation.

Got a board coming from aliexpress.

I think to get besu working will just be a matter of locally building https://github.com/Consensys/jc-kzg-4844/ on a risc-v system, and using that in the besu build. That lib currently only publishes packages for x86_64 and arm64, but I suspect it should build fine on an armbian risc-v system.

That would use java-native for things like secp256k1, but that should get the ball rolling (albeit slowly until we get besu-native support for risc-v).

@lazyprogrammerio
Copy link
Author

Linking this document here, if anyone needs it in the future. It compiles all the knowledge gathered in the last few weeks related to execution/consensus usage / support / hacks, board kernels and gotchas/quirks, OS support and more:
https://github.com/lazyprogrammerio/eth-docker-docs/blob/main/website/docs/Usage/OtherArches.md

@haurog
Copy link
Contributor

haurog commented Aug 22, 2024

I wrote an issue and a first pull request to lighthouse to make them compatible with RISC-V: sigp/lighthouse#6297
It will be a few more steps to get cpufeatures and libp2p ready for RISC-V. This is just the start.

@diglos
Copy link

diglos commented Aug 23, 2024

That would use java-native for things like secp256k1, but that should get the ball rolling (albeit slowly until we get besu-native support for risc-v).

Hi Gary, this would apply to Teku as well, right? (Getting "Teku failed to start: BLS native library unavailable for this platform" there)

@haurog
Copy link
Contributor

haurog commented Aug 23, 2024

(Getting "Teku failed to start: BLS native library unavailable for this platform" there)

Not sure if this is the same issue @lazyprogrammerio mentioned when they tried it: #1873 (comment)

@yorickdowne
Copy link
Contributor

Linking this document here, if anyone needs it in the future. It compiles all the knowledge gathered in the last few weeks related to execution/consensus usage / support / hacks, board kernels and gotchas/quirks, OS support and more: https://github.com/lazyprogrammerio/eth-docker-docs/blob/main/website/docs/Usage/OtherArches.md

Do you want to offer that as a PR to the docs?

@diglos
Copy link

diglos commented Aug 25, 2024

Current state of building and running ETH consensus clients on RISC-V:

Rocksdb compiles on Banana Pi F3 but Teku needs some additional compilation config for rebuilding it with native Rocksdb support;

Teku failed to start: java.lang.UnsatisfiedLinkError: /tmp/librocksdbjni8413112201487393251.so: /tmp/librocksdbjni8413112201487393251.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64 .so on a RISCV64 platform)

Asked in Teku Discord but no response so far.

Compiling directy with go build results in BLST issues:

env GO111MODULE=on go build  -o ./build/bin/beacon-chain ./cmd/beacon-chain
# github.com/prysmaticlabs/gohashtree
../go/pkg/mod/github.com/prysmaticlabs/gohashtree@v0.0.4-beta.0.20240624100937-73632381301b/hash.go:46:5: undefined: supportedCPU
../go/pkg/mod/github.com/prysmaticlabs/gohashtree@v0.0.4-beta.0.20240624100937-73632381301b/hash.go:56:5: undefined: supportedCPU
../go/pkg/mod/github.com/prysmaticlabs/gohashtree@v0.0.4-beta.0.20240624100937-73632381301b/hash.go:86:5: undefined: supportedCPU
# github.com/prysmaticlabs/prysm/v5/crypto/bls
crypto/bls/bls.go:19:14: undefined: blst.SecretKeyFromBytes
crypto/bls/bls.go:24:14: undefined: blst.PublicKeyFromBytes
crypto/bls/bls.go:31:14: undefined: blst.SignatureFromBytesNoValidation
crypto/bls/bls.go:36:14: undefined: blst.SignatureFromBytes
crypto/bls/bls.go:41:14: undefined: blst.MultipleSignaturesFromBytes
crypto/bls/bls.go:46:14: undefined: blst.AggregatePublicKeys
crypto/bls/bls.go:51:14: undefined: blst.AggregateMultiplePubkeys
crypto/bls/bls.go:56:14: undefined: blst.AggregateSignatures
crypto/bls/bls.go:61:14: undefined: blst.AggregateCompressedSignatures
crypto/bls/bls.go:66:14: undefined: blst.VerifySignature
crypto/bls/bls.go:66:14: too many errors

@haurog
Copy link
Contributor

haurog commented Aug 30, 2024

The new nimbus release (24.8.0) can now be built on RISC-V out of the box. They included my PR. We can now build nimbus directly from stable releases.

@haurog
Copy link
Contributor

haurog commented Sep 5, 2024

I did a PR in the libp2p library to make it compatible with risc-v: libp2p/rust-libp2p#5590

@come-maiz
Copy link

Thanks everybody, this is lovely work. People start to notice and contribute, this is a very powerful idea.

I've been learning and playing around:

a16z/helios#370
ethereum/trin#1444
flashbots/mev-boost#681

Also, I want to use this jenkins project for nightly tests to make sure we don't regress:
https://dash.cloud-v.co/view/all/job/flashbots/

@yorickdowne
Copy link
Contributor

yorickdowne commented Nov 15, 2024

Update on hardware:

  • LicheePi 3A, 16GB, NVMe, USD 160
  • Banana PI F3, 16GB, NVMe, USD 130
  • Milk-V Jupiter, 16GB, NVMe, USD 115

Milk-V is likely the best choice if you can get it, as the CPU is metal-enclosed and easy to cool

These do not sync mainnet yet, the CPU is too slow. Hardware coming out in 2025 should bring it to parity or near-parity with ARM64 boards like the RK3588 ones.

@lazyprogrammerio
Copy link
Author

lazyprogrammerio commented Nov 18, 2024

Thanks everybody, this is lovely work. People start to notice and contribute, this is a very powerful idea.

I've been learning and playing around:

a16z/helios#370 ethereum/trin#1444 flashbots/mev-boost#681

Also, I want to use this jenkins project for nightly tests to make sure we don't regress: https://dash.cloud-v.co/view/all/job/flashbots/

@come-maiz Thanks for the work, I see that some of the PRs are already merged.

There was a DevCon presentation on the topic of RISC-V too: https://app.devcon.org/schedule/J3SWYT
The feedback at DevCon was very positive, and there is ongoing work for more consensus/execution clients to be supported.

Currently, only geth offers RISC-V images: https://registry.hub.docker.com/r/ethereum/client-go
Lighthouse is working with a few small libraries upgrades off-tree, in the process of getting them merged and have official images hopefully.

Thanks alot!

@come-maiz
Copy link

Please share the video of the presentation once it's out :)

@lazyprogrammerio
Copy link
Author

Please share the video of the presentation once it's out :)

Hello, the presentation video is here at https://app.devcon.org/schedule/J3SWYT. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants