Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asus Z10PE-D16 WS C612 - ReBAR requires pci=realloc in linux and does not really work in Win10 #216

Open
1 task done
LegionnaireENR34 opened this issue Jun 30, 2024 · 4 comments

Comments

@LegionnaireENR34
Copy link

LegionnaireENR34 commented Jun 30, 2024

System

  • Motherboard: Asus Z10PE-D16 WS - this is dual 2011-3 socket board on C612 PCH
  • Two Xeon e5-2600v3 (haswell-e) installed
  • 128GB RAM
  • BIOS Version: 4101
  • GPU: Intel Arc A770 16GB VRAM attached to NUMA node 2 with x16 link
  • CSM is turned off in BIOS
  • 4G decoding is enabled in BIOS
  • UEFIPatch is applied
  • DSDT Patch not applied as I assumed it is not needed
  • I have read Common issues (and fixes)

Description

Hey, I'd like to ask for an advice and a feedback. I run Z10PE-D16 WS - it is a dual socket board with 16 memory slots and c612 chipset. BIOS used for modification is v.4101, latest available at asus.com
The intention was to make ReBAR work with ARC A770

Current state:

  1. I've installed UEFI .ffs module as per instructions
  2. UEFI patch applied. The one applied was about whitelisting on x99/c612 motherboads, as reported by hex string
  3. Above 4G decoding enabled
  4. CSM disabled

First step
of enabling resizable BAR was performed in Linux Mint with Kernel v.6.5

  • If rebarstate variable is set to 0, dmesg shows 256M register size, i915 kernel driver loads successfully
  • If rebarstate variable is set at any value ranging from 14 to 32, lspci reports that intended BAR size is reported to be 16G in lspci, but i915 kernel driver does not load, dmesg shows conflicts in addresses, stating that there is not enough space to allocate BAR of required size.
  • If I add pci=realloc to kernel boot parameters, then any value from 14 to 32 in rebarstate works, assigned BAR size is 16G, kernel mode i915 driver is loaded successuly, but I see a bunch of address space reallocations in dmesg.
    So to that end adding ReBAR appears to be working provided that pci=realloc is configured during linux boot. However I have an impression that reallocation should not be required.

Second step
Now, to confirm that ReBAR was really working I tried to install a clean Windows 10. At this time Rebarstate is set at 32.

  • Installation went smoothly, OS booted fine.
  • Then I've installed Intel's GPU driver. Immediately after installation and before reboot it complained about insufficient resources for the driver to start in the device manager. After reboot Win10 hangs during boot process.
  • When rebarstate is set to 8, win10 boots fine as it should with the default 256MB BAR
  • When rebarstate is set to 9, it's the same
  • When rebarstate is set to 10, motherboard passes post 1 time out of 3, diagnostic codes are pretty messed up
  • Rebarstate set to 11 would send the board to cyclic reboot until GPU is unplugged, it wouldn't post
  • When rebarstate is set to 12, MB would post, but win10 would BSOD
  • With rebar of 13 and above win10 just hangs during boot, no BSOD, no recovery, just locks up requiring hard reset

As a side note, installing win10 with all associated drivers and playing with rebarstate messed post codes. Before win10 post codes as seen on LED display on the MB matched the ones shown on my monitor. After win10 and drivers only onboard display shows correct codes, monitor just shows A9 "Start of setup" code till the end of the post.

So is there something wrong with either resource allocation, or with the way .ffs module behaves on dual-CPU/c612 platforms? I guess this result cannot be considered a complete failure, and yet I would not regard it as a success either. Have I done anything wrong? Can you please share any thoughts or suggestions as to what should be done next? I can provide dmesg or lspci output if needed.

@LegionnaireENR34 LegionnaireENR34 changed the title Asus Z10PE-D16 WS C612 - ReBAR requires pci=realloc in linux and not really working in Win10 Asus Z10PE-D16 WS C612 - ReBAR requires pci=realloc in linux and does not really work in Win10 Jun 30, 2024
@xCuri0
Copy link
Owner

xCuri0 commented Jul 1, 2024

Try changing BIOS settings related to MMIOH, I know X99/C612 have a few options relating to it under IntelRCSetup.

these boards should work fine with rebar when configured properly, they're made to be used with tesla/a gpus that require large bar after all

@LegionnaireENR34
Copy link
Author

Thanks for the advice, will try that. Found two options under IntelRCSetup -> Common RefCode Configuration. They are hidden by default, will need to expose them in AMIBCP in order to play with:

  • MMIOH Base - seems to default at 56T, options: 56T, 40T, 24T, 16T, 12T, 4T, 2T, 1T
  • MMIO High size - seems to default at 256G, options: 128G, 256G, 512G, 1024G
    Both default values seem pretty high already, should I reduce them? Or try to set MMIOH size to 512G and above and leave 56T Base value alone?

There are other options in BIOS that are PCIe-port specific, about single 64-bit bar or two split 32-bit bars, I tried to play with them briefly, but as far as I can tell those don't affect my problem.

Another question that interests me: does it matter which CPU the GPU is connected to? Do they both share common address space? Do QPI settings and limitations play any role in ReBAR at all?

@LegionnaireENR34
Copy link
Author

LegionnaireENR34 commented Jul 14, 2024

Hey @xCuri0, so I've done some additional testing.
I have exposed MMIO-related options that seemed to be relevant in BIOS, there were 3 of them:

  • MMCFG base - set to 2G by default
  • MMIOH base- set to 56T by default
  • MMIOH size - there are 4 sizes: 128G, 256G, 512G, 1024G.

I played with all 4 of the "MMIOH size", but none worked without pci=realloc.
Here's dmesg report of the result. Device 85:00 is Intel Arc A770 16G:
Post-REBAR-PCI=realloc-dmesg
If i'm reading this correctly, it appears to be trying to allocate memory starting from address 0x0 and towards higher addresses, not the other way around. In trying to do so it displaces some other devices' memory ranges.

Any ideas as to what I should try next? Any other options to look for in bios?
Why is it trying to allocate memory from low addresses? Is there a way to change this behavior? Is it defined by device driver, OS or UEFI itself?
Can this issue be relevant to the system being two-CPU and some limitations of how memory is accessed via QPI?
Should I try different brand GPU?

@LegionnaireENR34
Copy link
Author

LegionnaireENR34 commented Aug 11, 2024

@xCuri0
Some new development here. To ensure that my issues were not hardware-specific, I've managed to obtain AMD RX6900 for testing.
Can report that is was at least some success.

  • On Windows 10 with RX6900 and this UEFI module ReBAR works successfully with any rebarstate variable above 12. Here are the screenshots of it with rebarstate at 32:

Memory segments rx6900_MMIOH_unlimitedBAR
Z10PE-D16 WS

  • On Linux Mint kernel v.6.5 with the same rebarstate amdgpu driver fails to start without pci=realloc kernel parameter - it gets stuck in permanent "amdgpu trn=2 ack should not assert! wait again!" message in dmesg. With pci=realloc it appears to load successfully.

So As far as ReBarUEFI with Z10PE-D16 WS, we can declare partial success, but not full success.
Whether the new info justifies any updates to this UEFI module or not - not for me to decide. I want to thank you for you work on it.
As of now, I think we can close this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants