Skip to content
This repository has been archived by the owner on Apr 24, 2022. It is now read-only.

build requirement fails on linux #2282

Open
drmatthewclark opened this issue May 3, 2021 · 9 comments · May be fixed by #2284
Open

build requirement fails on linux #2282

drmatthewclark opened this issue May 3, 2021 · 9 comments · May be fixed by #2284

Comments

@drmatthewclark
Copy link

error downloading:

    --- LOG END ---
     error: downloading 'https://dl.bintray.com/boostorg/release/1.66.0/source/boost_1_66_0.7z' failed
     status_code: 22
     status_string: "HTTP response code said error"
     log:
@minecraft2048
Copy link

minecraft2048 commented May 4, 2021

Its caused by this: boostorg/boost#502

@jimmystewpot
Copy link

There's a PR being assembled that fixes this, in the mean time you can get a working source tree here.https://github.com/jimmystewpot/ethminer

@zt-chen
Copy link

zt-chen commented May 7, 2021

I created a smaller pull request(#2288 ) for fixing this by manually specifying the new boost URL, a temporary fix could be to download the boost package to .hunter manually:

mkdir -p ~/.hunter/_Base/Download/Boost/1.66.0/075d0b4/
cd /tmp
wget https://boostorg.jfrog.io/artifactory/main/release/1.66.0/source/boost_1_66_0.7z 
mv -f boost_1_66_0.7z ~/.hunter/_Base/Download/Boost/1.66.0/075d0b4/boost_1_66_0.7z

@hlfritz
Copy link

hlfritz commented May 20, 2021

@jimmystewpot any chance this will help with the binary not finding the A100's in my response at original project? (sorry if it is way off topic)

@jimmystewpot
Copy link

@hlfritz what's the issue number?

I have access to a100s so I can test

@hlfritz
Copy link

hlfritz commented May 22, 2021

@jimmystewpot mentioned in 2309, 2307.

CUDA Error : system not yet initialized
Error: No usable mining devices found

even though nvidia-smi sees all the gpu's.

if i can provide more details let me know. ubunru 18.04, cuda 11.2, 8ea. A100's. i get the same error with the stable branch or after i try recompiling per 2309.

@jimmystewpot
Copy link

add, on my test A100 with 20.04 it just works.

Can you confirm the following https://www.supermicro.com/support/faqs/faq.cfm?faq=31029

If that is done then try and write out an strace to file and attach it somehow. It could be a missing file/permission.

@hlfritz
Copy link

hlfritz commented May 23, 2021

Hmmm. I did not have dcgm installed. It is now, enabled and running. deviceQuery still fails as described in that link (which is for cuda 10, seems things have chnaged - there does not seem to be a nvidia-fabricmanager any longer?). Neither nv-hostengine nor service nvidia-fabricmanager seem to exist after installing DCGM.

I also still get the same CUDA Error: system not initialized. Not sure how to write out an strace file? Willing to do so if you can point me in the right direction. Thx!

EDIT: dug around on the system a bit. it seems that DCGM service is really just 'nv-hostengine -n'. it is running, but the article says to terminate it. but even though DCGM is installed, there is no nvidia-fabricmanager on the system.

@hlfritz
Copy link

hlfritz commented May 23, 2021

@jimmystewpot James, thank you very much for the hint. I figured out how to install fabric manager and everything works now! MUCH appreciated. While those specific instructions do not work for newer versions, NVIDIA has good instructions for getting this done with the onboard distribution package managers. Seems this may be an HGX and DGX A100 issue (although DGX comes with fabricmanager already installed for you).

Follow this to install/enable the cuda repos:

https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#ubuntu-lts

after installing the cuda drivers, follow sections 2.6 and 2.7 in this guide:

https://docs.nvidia.com/datacenter/tesla/pdf/fabric-manager-user-guide.pdf

Helmut

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
5 participants