Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix macOS ld: multiple errors: symbol count from symbol table and dynamic symbol table differ #16178

Closed
wants to merge 5 commits into from

Conversation

ibuclaw
Copy link
Member

@ibuclaw ibuclaw commented Feb 11, 2024

Encode macosx_version_min or build_version into the object file, originally authored by @jacob-carlborg in #10476.

This has been simplified to omit parsing the SDK version. Based on what I see GCC is doing (-platform_version macos $version_min 0.0), this information is not required in order for things to work.


Either this will fix the new ld errors, or we'll have to start adding -L-ld_classic to the linker command.

https://forum.dlang.org/thread/jwmpdecwyazcrxphttoy@forum.dlang.org

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @ibuclaw!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#16178"

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 11, 2024

ld: load command #3 extends beyond the end of the load commands file '../generated/osx/release/64/lexer.o' for architecture x86_64

@ibuclaw ibuclaw force-pushed the xcode_linker_error branch 3 times, most recently from 4d61b66 to 8dd990b Compare February 11, 2024 22:14
@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 11, 2024

It looks like encoding this information into the binary isn't working, but passing it via linker flags works just fine. Someone (not me) will have to repurpose the OSX version handling for dmd.link then.

@ibuclaw ibuclaw added the Phantom Zone Has value/information for future work, but closed for now label Feb 11, 2024
@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 11, 2024

If no one picks this up, I'm going to start downgrading all macOS pipelines to ignore failures, as this is blocking other work.

@RazvanN7
Copy link
Contributor

RazvanN7 commented Feb 12, 2024

I think all that link.d needs is this code:

argv.push("-L-platform_version");
argv.push("-Lmacos");
argv.push("-L0.0");

to be added here https://github.com/dlang/dmd/blob/master/compiler/src/dmd/link.d#L610.

I just picked up the linker flags you used in ci/run.sh. I don't know what to do about -L${MACOSX_DEPLOYMENT_TARGET+10.9}.

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 12, 2024

I think all that link.d needs is this code:

argv.push("-L-platform_version");
argv.push("-Lmacos");
argv.push("-L0.0");

to be added here https://github.com/dlang/dmd/blob/master/compiler/src/dmd/link.d#L610.

As far as I can tell, the syntax of the option is:

-platform_version <platform> <min_version> <sdk_version>

https://opensource.apple.com/source/ld64/ld64-530/doc/man/man1/ld.1.auto.html
(Formatted version)

-platform_version platform min_version sdk_version
            This is set to indicate the platform, oldest supported version of that platform that
            output is to be used on, and the SDK that the output was built against.  platform is a
            numeric value as defined in <mach-o/loader.h>, or it may be one of the following
            strings:
            o macos
            o ios
            o tvos
            o watchos
            o bridgeos
            o mac-catalyst
            o ios-simulator
            o tvos-simulator
            o watchos-simulator
            o driverkit
            Specifying a newer min or SDK version enables the linker to assume features of that OS
            or SDK in the output file. The format of min_version and sdk_version is a version number
            such as 10.13 or 10.14

The sdk_version doesn't seem to be important, but the min_version would be rejected if invalid.


I just picked up the linker flags you used in ci/run.sh. I don't know what to do about -L${MACOSX_DEPLOYMENT_TARGET+10.9}.

getenv and sscanf (the first commit in this PR is a simplified version of #10476 - c01f8cc)

I wouldn't hard-code -Lplatform-version without first checking the version of ld64 (seems to be >= 512 - apple-opensource/ld64@03dfd55#diff-eaf444f2d9662234ef8a047f73d6b25b7dd2dc39c3e426c6214b4702fec8287bR251) , as it's a new option, and older ld64 linkers won't understand it:

  • -macosx_version_min 10.9 for all versions up to 10.14 inclusive.
  • -platform_version macos 10.9 0.0 for all versions from 10.15 onwards
    (DMD releases are still built on a macOS 10.13 box).

@kinke
Copy link
Contributor

kinke commented Feb 12, 2024

Hmm, so building the compiler itself with the host compiler is no problem at all, but compiling build.d is/was? WTF?

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 12, 2024

Hmm, so building the compiler itself with the host compiler is no problem at all, but compiling build.d is/was? WTF?

I think CI_DFLAGS is used to build the host compiler. As for all the testsuite, my only guess is they aren't complex enough.

@kinke
Copy link
Contributor

kinke commented Feb 12, 2024

CI_DFLAGS is used to build the fresh compiler, but only set for FreeBSD. DMD itself is surely much much more complex than build.d, so when using the same host compiler to build both, I'd obviously expect at least the same issues for the actual compiler as for build.d.

@kinke
Copy link
Contributor

kinke commented Feb 12, 2024

Plus we have

MACOSX_DEPLOYMENT_TARGET: '11'

which should AFAIK prevent using the latest Xcode features/regressions.

@kinke
Copy link
Contributor

kinke commented Feb 14, 2024

I think CI_DFLAGS is used to build the host compiler.

Oh of course it is; I just somehow totally missed that you extend them here in this PR. facepalm [The added tab-indentation doesn't really help.]

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 15, 2024

Plus we have

MACOSX_DEPLOYMENT_TARGET: '11'

which should AFAIK prevent using the latest Xcode features/regressions.

Right, I can see a warning about both conflicting in the CI logs (-platform_version wins)

I could set this to 11 to see if it still fails.

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 15, 2024

Plus we have

MACOSX_DEPLOYMENT_TARGET: '11'

which should AFAIK prevent using the latest Xcode features/regressions.

Right, I can see a warning about both conflicting in the CI logs (-platform_version wins)

I could set this to 11 to see if it still fails.

10.9 good, 11.0 fails

10.12 fails

10.11 fails

10.10 fails

10.9 second attempt fails.

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 15, 2024

Alright, looks like maybe -platform_version 10.9 passing CI the first time was a fluke.

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 15, 2024

-ld_classic gets us further, then fails later in the expected way.

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 15, 2024

Removing macOS as a required pipeline.

@kinke
Copy link
Contributor

kinke commented Feb 16, 2024

10.9 good, 11.0 fails
[…]
10.9 second attempt fails.
[…]
Alright, looks like maybe -platform_version 10.9 passing CI the first time was a fluke.

They must have changed the runner base image; the CI job headers should include the image version somewhere.

The macOS-13 image history is tracked in https://github.com/actions/runner-images/commits/main/images/macos/macos-13-Readme.md; the last change was merged on February 14th (actions/runner-images#9292), but the image version is 20240204.1 (February 4th). So I guess it was deployed in the middle of your attempts here...

@kinke
Copy link
Contributor

kinke commented Feb 16, 2024

So in that latest image bump, they changed the default Xcode version from v14 to v15.

Running sudo xcode-select -switch /Applications/Xcode_14.1.app (early) on macos-13 jobs might allow us to use the currently oldest installed Xcode version, hopefully without these 'nice' new Apple ld64 problems.

Edit: #16194

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 16, 2024

They must have changed the runner base image; the CI job headers should include the image version somewhere.

The macOS-13 image history is tracked in https://github.com/actions/runner-images/commits/main/images/macos/macos-13-Readme.md; the last change was merged on February 14th (actions/runner-images#9292), but the image version is 20240204.1 (February 4th). So I guess it was deployed in the middle of your attempts here...

It was first seen to fail 2 weeks ago.

https://github.com/dlang/dmd/actions/runs/7792736255/job/21251287831

It's probably been a very slow rollout.

@kinke
Copy link
Contributor

kinke commented Feb 16, 2024

It was first seen to fail 2 weeks ago.
https://github.com/dlang/dmd/actions/runs/7792736255/job/21251287831

Oh wow right, using that image already back then (February 6th):

 Runner Image
  Image: macos-13
  Version: 20240204.1

So indeed, looks like the rollout is extremely slow, and starts way before the image readme.md is updated...

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 16, 2024

Github search to the rescue.

https://github.com/apple-oss-distributions/dyld/blob/d1a0f6869ece370913a3f749617e457f3b4cd7c4/mach_o/Image.cpp#L346-L347

    if ( hasIndSymTab && (symCount != indSymCount))
        return Error("symbol count from symbol table and dynamic symbol table differ");

Simplifying the above for mere mortals (such as myself).

 dysymtab_cmd.nlocalsym  = local_symbuf.length;
 dysymtab_cmd.iextdefsym = dysymtab_cmd.nlocalsym;
 dysymtab_cmd.nextdefsym = public_symbuf.length;
 dysymtab_cmd.iundefsym = dysymtab_cmd.iextdefsym + dysymtab_cmd.nextdefsym; 
 dysymtab_cmd.nundefsym  = extern_symbuf.length + comdef_symbuf.length;
 symtab_cmd.nsyms =  dysymtab_cmd.nlocalsym + 
                     dysymtab_cmd.nextdefsym + 
                     dysymtab_cmd.nundefsym; 

-> const prop

 dysymtab_cmd.iundefsym = local_symbuf.length + public_symbuf.length; 
 dysymtab_cmd.nundefsym  = extern_symbuf.length + comdef_symbuf.length;
 symtab_cmd.nsyms =  local_symbuf.length + public_symbuf.length + dysymtab_cmd.nundefsym; 

So linker should see:

symCount = symtab_cmd.nsyms;
indSymCount = dysymtab_cmd.iundefsym + dysymtab_cmd.nundefsym;

Its a bit convoluted, but they add up to the same number. Which then means this might just be corruption of the data structure as the linker reads in the object.

@jacob-carlborg
Copy link
Contributor

I don't know entirely what's up with Apple's static linker, but I do know they wrote a new one recently, hence the -ld_classic flag. The error message that @ibuclaw found is from the dynamic linker (dyld). This error message is not available in any released source code of the static linker (ld64), as far as I can tell. Perhaps the static linker is now performing the same check that the dynamic linker already did, but I haven't not seen this error at load time.

Anyway, as @ibuclaw posted above, the number of symbols in the symbol table and dynamic symbol table need to be the same. A quick investigation shows this:

  1. Compile (without linking) the following code with LDC and DMD:
extern (C) int printf(const char*, ...);

void main()
{
    printf("foo\n");
}
  1. ldc2 -c -ofmain-ldc.o main.d
  2. dmd -c -ofmain-dmd.o main.d
  3. Print the load commands of each object file
  4. otool -l main-ldc.o
  5. otool -l main-dmd.o

This gives the following output:

DMD:

LC_SYMTAB:
nsyms: 10

LC_DYSYMTAB:
iundefsym: 6
nundefsym: 3

LDC:

LC_SYMTAB:
nsyms: 12

LC_DYSYMTAB:
iundefsym: 10
nundefsym: 2

LC_SYMTAB is the symbol table and LC_DYSYMTAB is the dynamic/indirect symbol table.

As mentioned above, the number of symbols are calculated as:

symCount = symtab_cmd.nsyms;
indSymCount = dysymtab_cmd.iundefsym + dysymtab_cmd.nundefsym;

According to otool the number of symbols in both symbols tables are the same for LDC, but for DMD it's not the same. LC_DYSYMTAB has one symbol less than LC_SYMTAB.

Also, trying to print the indirect symbol table give these results:

$ otool -I main-ldc.o
main-ldc.o:
$ otool -I main-dmd.o
main-dmd.o:
indirect symbol table offset is past end of file

So DMD is doing something wrong.

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 18, 2024

  1. Compile (without linking) the following code with LDC and DMD:
extern (C) int printf(const char*, ...);

void main()
{
    printf("foo\n");
}
  1. ldc2 -c -ofmain-ldc.o main.d
  2. dmd -c -ofmain-dmd.o main.d
  3. Print the load commands of each object file
  4. otool -l main-ldc.o
  5. otool -l main-dmd.o

This gives the following output:

DMD:

LC_SYMTAB:
nsyms: 10

Knowing those ten symbol names might help. But as far as I can tell, writing a wrong number to object is improbable.

@ibuclaw
Copy link
Member Author

ibuclaw commented Feb 18, 2024

This is ridiculous

if (extdef)
{
nlist_64 sym = void;
sym.n_strx = extdef;
sym.n_value = 0;
sym.n_type = N_EXT | N_UNDF;
sym.n_desc = 0;
sym.n_sect = 0;
if (I64)
fobjbuf.write(&sym, sym.sizeof);
else
{
nlist sym32 = void;
sym32.n_strx = sym.n_strx;
sym32.n_value = cast(uint)sym.n_value;
sym32.n_type = sym.n_type;
sym32.n_desc = sym.n_desc;
sym32.n_sect = sym.n_sect;
fobjbuf.write(&sym32, sym32.sizeof);
}
symtab_cmd.nsyms++;
}

ibuclaw and others added 5 commits February 19, 2024 00:09
…amic symbol table differ

Encode macosx_version_min or build_version into the object file,
originally authored by @jacob-carlborg in dlang#10476.

This has been simplified to omit parsing the SDK version. Based on what
I see GCC is doing (`-platform_version macos $version_min 0.0`), this
information is not required in order for things to work.

Co-authored-by: Jacob Carlborg <doob@me.com>
… and dynamic symbol table differ"

This reverts commit c01f8cc.
@jacob-carlborg
Copy link
Contributor

Knowing those ten symbol names might help. But as far as I can tell, writing a wrong number to object is improbable.

$ nm -a main-dmd.o
0000000000000050 s EH_frame0
0000000000000068 S _D main.eh
0000000000000038 S __D4main12__ModuleInfoZ
0000000000000000 T __Dmain
                 U __Dmain
                 U __d_run_main
                 U _main
0000000000000018 T _main
0000000000000090 S _main._d_cmain!().main.eh
                 U _printf
$ nm -a main-ldc.o
0000000000000070 S __D4main11__moduleRefZ
0000000000000060 D __D4main12__ModuleInfoZ
0000000000000000 T __Dmain
                 U __d_run_main
0000000000000020 T _main
                 U _printf
0000000000000054 s l_.str
0000000000000000 t ltmp0
0000000000000054 s ltmp1
0000000000000060 d ltmp2
0000000000000070 s ltmp3
0000000000000078 s ltmp4

The above shows all symbols. Unfortunately I haven't figured out which symbol table the nm command prints.

@jacob-carlborg
Copy link
Contributor

jacob-carlborg commented Feb 19, 2024

BTW, DMD can now cross-compile. LLVM contains the necessary tools to inspect the object files (llvm-otool and llvm-nm). So a Mac is not strictly needed. When otool prints the correct number of symbols, then a real Mac can be used to test to actually link. Here's an example:

$ docker run -it --rm --platform linux/amd64 ubuntu:22.04
root@99b0566a8b25:/# apt update && apt install -y curl xz-utils file
root@99b0566a8b25:/# curl -L -O --retry 3 https://github.com/llvm/llvm-project/releases/download/llvmorg-17.0.6/clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04.tar.xz
root@99b0566a8b25:~# curl -L -O --retry 3 https://downloads.dlang.org/releases/2.x/2.107.0/dmd.2.107.0.linux.tar.xz
root@99b0566a8b25:~# tar xf clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04.tar.xz
root@99b0566a8b25:~# tar xf dmd.2.107.0.linux.tar.xz
root@99b0566a8b25:~# ./dmd2/linux/bin64/dmd -target=x86_64-darwin main.d -c
root@99b0566a8b25:~# file main.o
main.o: Mach-O 64-bit x86_64 object, flags:<|SUBSECTIONS_VIA_SYMBOLS>
root@99b0566a8b25:~# ./clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04/bin/llvm-otool -l main.o
root@99b0566a8b25:~# ./clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04/bin/llvm-nm -a main.o

If you need access to a real Mac, you can also borrow a GitHub Action runner using this action: https://github.com/marketplace/actions/debugging-with-tmate. I've used it a lot to debug pipelines and my own action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Fix Phantom Zone Has value/information for future work, but closed for now
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants