Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB FW Memory Carveout Marked as Useable #111

Open
pnewman-cti opened this issue Nov 29, 2024 · 4 comments
Open

USB FW Memory Carveout Marked as Useable #111

pnewman-cti opened this issue Nov 29, 2024 · 4 comments

Comments

@pnewman-cti
Copy link
Contributor

As of commit f6b220e6550521bd2663062d1ad42ecd3a41bfe7 the XUSB Firmware memory carveout (CARVEOUT_XUSB) has been added to the "Usable Memory Regions" which allows UEFI to allocate memory from that region.

When this happens it triggers a "Carveout Uncorrectable Error", below is an example from Orin NX 16GB:

ERROR:   **************************************
ERROR:   RAS Uncorrectable Error in SNOC, base=0xe011000:
ERROR:          Status = 0xec00030d
ERROR:   SERR = Illegal address (software fault): 0xd
ERROR:          IERR = Carveout Uncorrectable Error: 0x3
ERROR:          Overflow (there may be more errors) - Uncorrectable
ERROR:          MISC0 = 0x2c00
ERROR:          MISC1 = 0x3c1842000000000
ERROR:          MISC2 = 0x0
ERROR:          MISC3 = 0x20
ERROR:          ADDR = 0x8000000472f02820
ERROR:   **************************************
ERROR:   sdei_dispatch_event returned -1
ERROR:   **************************************
ERROR:   RAS Uncorrectable Error in ACI, base=0xe01a000:
ERROR:          Status = 0xe8000904
ERROR:   SERR = Assertion failure: 0x4
ERROR:          IERR = FillWrite Error: 0x9
ERROR:          Overflow (there may be more errors) - Uncorrectable
ERROR:          ADDR = 0x8000000472f02c00
ERROR:   **************************************
ERROR:   sdei_dispatch_event returned -1
ERROR:   Powering off core

The same error happens on Orin NX 8GB but the ADDR value is 0x8000000272f02820 instead of 0x8000000472f02820 because the XUSB carveout is in a different location.

This error is fairly repeatable on the Orin NX platform when attempting to PXE boot. Enabling prints in DramCarveoutLib.c shows 0x472F00000 is a "Usable Carveout":

DRAM Region [Usable Carveout]: 0000000471E00000, 0000000000200000
DRAM Region [Usable Carveout]: 0000000472000000, 0000000000400000
DRAM Region [Usable Carveout]: 0000000472F00000, 0000000000100000
DRAM Region [Usable Carveout]: 0000000476000000, 0000000002000000

Making the following change fixes this issue:

Index: Silicon/NVIDIA/Library/PlatformResourceLib/T234ResourceConfig.c
===================================================================
diff --git a/Silicon/NVIDIA/Library/PlatformResourceLib/T234ResourceConfig.c b/Silicon/NVIDIA/Library/PlatformResourceLib/T234ResourceConfig.c
--- a/Silicon/NVIDIA/Library/PlatformResourceLib/T234ResourceConfig.c	(revision 87009)
+++ b/Silicon/NVIDIA/Library/PlatformResourceLib/T234ResourceConfig.c	(working copy)
@@ -263,7 +263,6 @@
       case CARVEOUT_OS:
       case CARVEOUT_GR:
       case CARVEOUT_PROFILING:
-      case CARVEOUT_XUSB:
         // Leave in memory map but marked as used
         if (  (  (Index == CARVEOUT_CCPLEX_INTERWORLD_SHMEM)
               && FixedPcdGetBool (PcdExposeCcplexInterworldShmem)

However, I am not sure if this has other side effects.

I have attached a full serial debug log of attempting to PXE boot with an Orin NX 16GB:
pxe_boot_error_serial.log

@jgarver
Copy link
Contributor

jgarver commented Dec 2, 2024

Thanks for the report, @pnewman-cti! We're looking into it.

@pnewman-cti
Copy link
Contributor Author

I dug into this a little more. Here are the output of the UEFI shell memmap command before and after my above fix:

memmap_before_fix.txt
memmap_after_fix.txt

Before removing the XUSB carveout from the "usable regions" it shows as "Reserved" when running memmap.
Reserved 0000000472F00000-0000000472FFFFFF 0000000000000100 000000000000100F

This makes sense because T234AddBootloaderCarveouts() "Resource Descriptor Hob" with the "EfiReservedMemoryType". This should mean it cannot be used for memory allocation, however it clearly is being used according to ADDR field of the RAS error.

I think the real bug is in the InstallDramWithCarveouts() function in Silicon/NVIDIA/Library/DramCarveoutLib/DramCarveoutLib.c.

It adds all of the "UsableCarveoutRegions" to the "LargestRegions", then it calls BuildResourceDescriptorHob() on all the "LargestRegions".

This creates a second EFI_RESOURCE_SYSTEM_MEMORY (NOT Reserved) "Resource Descriptor Hob" for all the "UsableCarveoutRegions".

 // Add the UsableCarveout regions in the reserved space
  for (UsableCarveoutIndex = 0; UsableCarveoutIndex < UsableCarveoutRegionsCount; UsableCarveoutIndex++) {
    DEBUG ((DEBUG_ERROR, "DRAM Region [Usable Carveout]: %016lx, %016lx\r\n", UsableCarveoutRegions[UsableCarveoutIndex].MemoryBaseAddress, UsableCarveoutRegions[UsableCarveoutIndex].MemoryLength));
    MemoryRegionInsert (LargestRegions, &InstalledRegions, &UsableCarveoutRegions[UsableCarveoutIndex], MAX_USABLE_REGIONS, CompareRegionSizeHighToLow);
  }

  ResourceAttributes = (
                        EFI_RESOURCE_ATTRIBUTE_PRESENT |
                        EFI_RESOURCE_ATTRIBUTE_INITIALIZED |
                        EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE |
                        EFI_RESOURCE_ATTRIBUTE_WRITE_COMBINEABLE |
                        EFI_RESOURCE_ATTRIBUTE_WRITE_THROUGH_CACHEABLE |
                        EFI_RESOURCE_ATTRIBUTE_WRITE_BACK_CACHEABLE |
                        EFI_RESOURCE_ATTRIBUTE_READ_PROTECTABLE |
                        EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTABLE |
                        EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTABLE |
                        EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTABLE
                        );
    
  if (PcdGet64 (PcdExpectedPeiMemoryUsage) == 0) {
    ResourceAttributes |= EFI_RESOURCE_ATTRIBUTE_TESTED;
  }

  // Now that we have the final list, install it in the HOB
  for (ListIndex = 0; ListIndex < InstalledRegions; ListIndex++) {
    BuildResourceDescriptorHob (
      EFI_RESOURCE_SYSTEM_MEMORY,
      ResourceAttributes,
      LargestRegions[ListIndex].MemoryBaseAddress,     
      LargestRegions[ListIndex].MemoryLength
      );
  }

Adding some prints to MemoryRegionInsert() I confirmed the "UsableCarveoutRegions" are being added to the "LargestRegions".

DRAM Region [Usable Carveout]: 0000000472F00000, 0000000000100000
MemoryRegionInsert: Called with RegionsCount = 5, NewRegion->Base = 0x472F00000, NewRegion->Size = 0x100000, RegionCountMax = 992
MemoryRegionInsert: Added entry at index 5, NewRegion->Base = 0x472F00000

The following patch solves the issue (previous patch reverted). I think this is the correct way to solve the issue but it would be good for someone else to double check,

Index: Silicon/NVIDIA/Library/DramCarveoutLib/DramCarveoutLib.c
===================================================================
diff --git a/Silicon/NVIDIA/Library/DramCarveoutLib/DramCarveoutLib.c b/Silicon/NVIDIA/Library/DramCarveoutLib/DramCarveoutLib.c
--- a/Silicon/NVIDIA/Library/DramCarveoutLib/DramCarveoutLib.c	(revision 87009)
+++ b/Silicon/NVIDIA/Library/DramCarveoutLib/DramCarveoutLib.c	(working copy)
@@ -463,7 +463,6 @@
   // Add the UsableCarveout regions in the reserved space
   for (UsableCarveoutIndex = 0; UsableCarveoutIndex < UsableCarveoutRegionsCount; UsableCarveoutIndex++) {
     DEBUG ((DEBUG_VERBOSE, "DRAM Region [Usable Carveout]: %016lx, %016lx\r\n", UsableCarveoutRegions[UsableCarveoutIndex].MemoryBaseAddress, UsableCarveoutRegions[UsableCarveoutIndex].MemoryLength));
-    MemoryRegionInsert (LargestRegions, &InstalledRegions, &UsableCarveoutRegions[UsableCarveoutIndex], MAX_USABLE_REGIONS, CompareRegionSizeHighToLow);
   }
 
   ResourceAttributes = (

@jgarver
Copy link
Contributor

jgarver commented Dec 10, 2024

Thank you so much for your detailed analysis, @pnewman-cti! We think it's on target.

We have been working this issue internally, too. I don't think we'll have a fix in time for the next release (uefi-202412.0), but will include it in the next point release (uefi-202412.1).

@pnewman-cti
Copy link
Contributor Author

Great! Thank you :). Let me know if you need anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants