Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLD] LLD can report "unable to move location counter backward" error too early #66836

Closed
PiJoules opened this issue Sep 19, 2023 · 3 comments · Fixed by #66854
Closed

[LLD] LLD can report "unable to move location counter backward" error too early #66836

PiJoules opened this issue Sep 19, 2023 · 3 comments · Fixed by #66854
Labels

Comments

@PiJoules
Copy link
Contributor

I have the following linker script:

MEMORY
{
  ...
  RAM_DATA               (RW)  : ORIGIN = 0x20008000, LENGTH = 0x004F8000
  ...
}

ram_data_end      = ORIGIN(RAM_DATA) + LENGTH(RAM_DATA);

SECTIONS {
  ...  // other output sections like text, rodata, etc
  __etext = .;
  __DATA_ROM = .;
  __end_of_code_in_ram_data =
      ABSOLUTE(__etext - ORIGIN(RAM_CODE) + ORIGIN(RAM_DATA));
  .data __end_of_code_in_ram_data : AT(__DATA_ROM) ALIGN(4) {
    __data_start__ = .;
    *(.data);
    *(.data*);
    ...  // stuff placed here
    __data_end__ = .;        /* define a global symbol at data end */
  } > RAM_DATA

  .shared_ram.unused_space (NOLOAD) : ALIGN(4) {
    . = ABSOLUTE(ram_data_end);
  } >RAM_DATA

  ... // other output sections
}

which would often report this error: unable to move location counter backward for: .shared_ram.unused_space. This is because on the first call to script->assignAddresses(); in finalizeAddressDependentContent, all of the input data sections placed into .data exceeds the remaining space for RAM_DATA, so by the time we reach . = ABSOLUTE(ram_data_end);, DOT is well past the end of RAM_DATA.

However, on subsequent calls to script->assignAddresses() in the fixed-point loop in finalizeAddressDependentContent, the start of .data is changed and a smaller value than the first iteration because something was changed in previous sections. This could be from linker relaxations, sections being removed, allocation sizes changing, etc. Because the starting address of .data is smaller but all the input data sections remain the same, it's possible for subsequent runs of script->assignAddresses() to not throw the same error because it happens to fit.

The issue here is that lld will still throw this error from the first run even though all the sections can fit as expected in the final memory region after the fixed point loop has finished. I think the right approach here is that LLD should only return this error if DOT ends up moving backwards for the final call to this function, similar to how we delay checking of memory region sizes until the very end of linking.

TODO: Come up with a public-facing minimal reproducer. I haven't made one yet because it's too hard.

@PiJoules PiJoules added the lld label Sep 19, 2023
@PiJoules
Copy link
Contributor Author

@MaskRay

@MaskRay
Copy link
Member

MaskRay commented Sep 20, 2023

The problem in the linker script is:

  .ARM : {
    __exidx_start = .;
    *(.ARM.exidx*)
    __exidx_end = .;

  } > RAM_CODE

The size of .ARM can shrink after finalizeSynthetic(part.armExidx.get()); in finalizeAddressDependentContent. In the initial assignAddresses (before finalizeSynthetic(part.armExidx.get());) run, addresses are over-estimated so dot can be larger than . = ABSOLUTE(ram_data_end); expects.

I was about to send a similar patch, but I want to construct a minimal test first. Therefore, I suggested

  .shared_ram.unused_space (NOLOAD) : ALIGN(4) {
    . += ram_data_end > . ? ram_data_end - . : 0;
  } >RAM_DATA

as a temporary workaround.

@llvmbot
Copy link
Member

llvmbot commented Sep 20, 2023

@llvm/issue-subscribers-lld-elf

I have the following linker script:
MEMORY
{
  ...
  RAM_DATA               (RW)  : ORIGIN = 0x20008000, LENGTH = 0x004F8000
  ...
}

ram_data_end      = ORIGIN(RAM_DATA) + LENGTH(RAM_DATA);

SECTIONS {
  ...  // other output sections like text, rodata, etc
  __etext = .;
  __DATA_ROM = .;
  __end_of_code_in_ram_data =
      ABSOLUTE(__etext - ORIGIN(RAM_CODE) + ORIGIN(RAM_DATA));
  .data __end_of_code_in_ram_data : AT(__DATA_ROM) ALIGN(4) {
    __data_start__ = .;
    *(.data);
    *(.data*);
    ...  // stuff placed here
    __data_end__ = .;        /* define a global symbol at data end */
  } > RAM_DATA

  .shared_ram.unused_space (NOLOAD) : ALIGN(4) {
    . = ABSOLUTE(ram_data_end);
  } >RAM_DATA

  ... // other output sections
}

which would often report this error: unable to move location counter backward for: .shared_ram.unused_space. This is because on the first call to script->assignAddresses(); in finalizeAddressDependentContent, all of the input data sections placed into .data exceeds the remaining space for RAM_DATA, so by the time we reach . = ABSOLUTE(ram_data_end);, DOT is well past the end of RAM_DATA.

However, on subsequent calls to script->assignAddresses() in the fixed-point loop in finalizeAddressDependentContent, the start of .data is changed and a smaller value than the first iteration because something was changed in previous sections. This could be from linker relaxations, sections being removed, allocation sizes changing, etc. Because the starting address of .data is smaller but all the input data sections remain the same, it's possible for subsequent runs of script->assignAddresses() to not throw the same error because it happens to fit.

The issue here is that lld will still throw this error from the first run even though all the sections can fit as expected in the final memory region after the fixed point loop has finished. I think the right approach here is that LLD should only return this error if DOT ends up moving backwards for the final call to this function, similar to how we delay checking of memory region sizes until the very end of linking.

TODO: Come up with a public-facing minimal reproducer. I haven't made one yet because it's too hard.

MaskRay added a commit to MaskRay/llvm-project that referenced this issue Sep 20, 2023
The size of .ARM.exidx may shrink across `assignAddress` calls. It is possible
that the initial iteration has a larger location counter, causing `__code_size =
__code_end - .; osec : { . += __code_size; }` to report an error, while the error would
have been suppressed for subsequent `assignAddress` iterations.

Other sections like .relr.dyn may change sizes across `assignAddress` calls as
well. However, their initial size is zero, so it is difficiult to trigger a
similar error.

Similar to https://reviews.llvm.org/D152170, postpone the error reporting.
Fix llvm#66836. While here, add more information to the error message.
MaskRay added a commit that referenced this issue Sep 20, 2023
The size of .ARM.exidx may shrink across `assignAddress` calls. It is
possible
that the initial iteration has a larger location counter, causing
`__code_size =
__code_end - .; osec : { . += __code_size; }` to report an error, while
the error would
have been suppressed for subsequent `assignAddress` iterations.

Other sections like .relr.dyn may change sizes across `assignAddress`
calls as
well. However, their initial size is zero, so it is difficiult to
trigger a
similar error.

Similar to https://reviews.llvm.org/D152170, postpone the error
reporting.
Fix #66836. While here, add more information to the error message.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment