-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
single/all_gather_test is always stuck #120
Comments
As a double check, make sure you run Then, the |
@sjeaugey Thanks for your reply very much. |
We're experiencing exactly the same problem (ubuntu and video drivers are the same). The difference is that we have 4 GTX1080 Ti cards like this: 06:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation Device 120f
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- >><PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 18
Region 0: Memory at f0000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 2f80000000 (64-bit, prefetchable) [size=256M]
Region 3: Memory at 2f90000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at b000 [size=128]
[virtual] Expansion ROM at f1000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [250 v1] Latency Tolerance Reporting
Max snoop latency: 34326183936ns
Max no snoop latency: 34326183936ns
Capabilities: [128 v1] Power Budgeting <?>
Capabilities: [420 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout+ NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900 v1] #19
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_384_drm, nvidia_384
But if we use any two (or all four) cards with I'm out of ideas. |
I've heard of similar issues being related to IOMMU / VT-d. Can you try disabling this in your BIOS? |
Unfortunately for us both vt-d and vt-x had already been disabled. :( |
@nluehr Thanks for your suggestion. I have disabled CPU virtualization already. |
Oh, you are using an AMD CPU. Then I suggest you review this post and see if it fixes your problem : @gzmarchenko are you using an AMD CPU as well ? |
@sjeaugey |
@sjeaugey Thanks very much! Everything goes well now. I just thought that CPU virtualization in BIOS settings is right IOMMU but actually not. I didn't find IOMMU settings in BIOS settings when PC boot. |
I'm glad to hear it helped @starsblinking, but it didn't work for us. |
@gzmarchenko sorry to see that it didn't work for you. Still, most likely, this is due to GPU Direct P2P not working between the GPUs which is usually caused by CPU/BIOS settings. You may want to reproduce the problem with the CUDA P2P tests and if it doesn't work, report that problem to CUDA (through developer.nvidia.com). Also, I would advise to use NCCL2 (https://developer.nvidia.com/nccl, tests at https://github.com/nvidia/nccl-tests) and until your P2P problem is resolved, you can set NCCL_P2P_DISABLE=1. Performance will be lower though. |
@sjeaugey, we cut off 2 of 4 GPUs and switched others into x16 ports (with x1 passive adapters though) It's a pity that plugging in another one makes it non-working again. Is there anything else we should think about before changing the motherboard? |
Hi, everyone. I meet a problem like issues19 but something is different.
I installed cuda 8 ,cudnn 6 and openmpi 3.0.0. My ubuntu server 16.04 has 2 GTX1080 Ti.
By
lspci -tv | grep NVIDIA
we can see:By
lspci -vvv
I will find:and
sudo lspci -vvv | grep ACSCtl
will return nothing.It seems that I'm not in the trouble of ACS like issues19 .
Even if I use
sudo setpci -s 2a:00.0 f2a.w=00001
sudo setpci -s 29:00.0 f2a.w=0000
things don't change.
If I use only one GPU, single/all_gather_test will not be stuck:
$ ./build/test/single/all_reduce_test 10000000 1 0
$ ./build/test/single/all_reduce_test 10000000 1 1
If using both GPUs, stuck stuff will appear:
$ ./build/test/single/all_reduce_test 10000000
When stuck, 2 GPUs get their max use like this:
In the same time, cpu will arrive 200% use in
top
commandI feel exhausted and wish a solution. Thank you very much.
The text was updated successfully, but these errors were encountered: