-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack smashing when building some packages #7456
Comments
Amazing. No recent trace of "stack smashing" in relation to GHC on the web nor in source, except for https://gitlab.haskell.org/ghc/ghc/-/issues/16046#note_360930. Does it happen with 8.10 and 9.2, too? |
@Mikolaj, I ran the example repository with the following
I also tested the
|
I don't know if this ticket should be in the cabal repo -- shouldn't it be reproducible in an appropriate environment with just e.g. That said, one hunch may be that there's no particular regression in ghc, but that default gcc flags put in more guards now against certain patterns that ghc generates? I note that text-show also has cbits, which may be related (especially in combination with TH). MMark doesn't seem to directly, but it has a fair number of transitive deps, so it depends where it fails, I suppose? |
A great idea to try and extract Regarding GHC 9.2, not sure if you are using https://ghc.gitlab.haskell.org/head.hackage |
I added the reference to |
@gbaz, I attempted to isolate the issue with
I got the following output:
I interpret the outcome of this to mean that this is not a |
Cabal may be inventing wildly incorrect parameters to ghc, but even then, ghc should probably fail gracefully, not throw a fit. Are the six lines enough to repro your example or does it depend on some context? If so, I may try to repro again, even though my ancient Ubuntu has gcc 5.4.0-6ubuntu1~16.04.12. |
@Mikolaj I believe that the 6 lines above are sufficient to reproduce the error. I can only be certain that it is reproducible on my machine. |
I'm getting
so probably a few installed packages and their hashes are a necessary context. The context can be created by building the packages and copying their hashes, but I've not made the effort at this time --- with my old gcc it would probably again go through fine. I guess best chances are with the same version of gcc or at least a new one. |
@Mikolaj I tried with I guess that the hashes being different makes sense. I did copy the whole monolithic command from the logs generated by |
I had to add step 3
to have any base-compat-batteries built at all, but after it appeared, it got a different hash:
so I guess I'd need to copy over such hashes from my |
I updated the steps above the to include the invocation of |
A GHC guru friend suggests "you should be able to infer which executable aborted from the core dump?". |
This comment has been minimized.
This comment has been minimized.
@jneira surely you jest! |
@Mikolaj Does your guru friend mean I should attempt to disentangle the "stack smashing" error from the Haskell build stack; determine if it is thrown by |
@recursion-ninja: yes, I understand that's what he'd do at that point. I guess, it would also be helpful to obtain the precise arguments of the implicated tool's invocation that resulted in the crash. |
Looks like it's coming from either Invocation(Note the
Output
Perhaps there is some issue with the |
Oh, great. So perhaps it's the gold linker bug? I know there are quite a few. Does it happen when you switch ld.gold for ld (or ld.bfd, there are various ways to do that)? What's the version of the gold linker that you use? |
|
I added
I'm not sure that it actually switched linkers. Any advise on passing this to |
From GHC gurus: "lld is the llvm linker, I think you want ld?" |
Unfortunately, the same results with
|
I suspect that your system is somehow configured to be quite paranoid, and doesn't like something GHC does. |
@phadej It is far more pervasive than just
I used
I can try to do this tomorrow if you still think it is worthwhile. |
It totally is, because we need GHC devs to be able to repro and nobody still can, except you. However, the test you outlined right now seems easier to do than the one involving copying hashes, so I will attempt to repro today. If I fail (that is, can't repro), the container may be the only way to either let GHC devs repro or let you troubleshoot your setup (and, e.g., find what version of what bin util in your system is buggy, if that's the case, or which version causes GHC to bug out due to incompatibility that needs to be fixed in GHC). |
Thanks for the additional info. SO you have common hardware and I cannot reproduce with my docker image. It uses hvr-ppa bindists but I'm quite sure it's not the reason. If it fails on your docker, it would be very interesting!
GHCi failing is probably the same reason why Template Haskell fails, i.e. something dynamic linking related. |
I've just repeated @phadej's trascript both in docker and in my own OS and got the same results (can't repro the original issue). |
@phadej , I have not used Docker before, can you tell me how to clone your image to try and replicate the results? |
The first line, |
@Mikolaj , @phadej , I was able to get docker working! When I replicated the steps to reproduce the stack smashing error that occurs on my machine, the docker container did not exhibit the stack smashing behavior. I was able to load Any ideas on how to rehabilitate my machine? |
That may still be a GHC bug, e.g., one that manifests only with new enough C toolchain libraries. But it may as well be a corrupted file in your filesystem. Do you have another partition or can you put another hard drive into your machine? If so, you can try to install Ubuntu 20.04. Or even run it from a live Ubuntu DVD and try to repro from that. This would sort of bisect the problem space. If it works, the issue may be written off as a fluke and you can manually reinstall all apt packages (there are commands for that) and it should vanish, and if not, configurations possibly need to be wiped out. Another possibility is to create a new user and repro. Just find ways to ignore portions of your current hard drive and try to repro. |
I mean, it may be a GHC bug if, e.g., one of your C toolset libraries are newer than on the docker image --- if you started from exactly the same version of OS and never upgraded, this can't be the cause. |
@Mikolaj, My Machine:
Docker Container
|
This is a very old feature of GCC to detect corrupted stacks, see The mechanism it works on is quite simple, you place a canary value on the stack and before returning from function (as you're busy cleaning up the stack) you check if the canary is still there and has the correct value. See https://godbolt.org/z/rW74EzeKE This error means something along the way has corrupted the stack. Distros such as Ubuntu and Redhat have started enabling more and more security features of GCC by default, Ubuntu has enabled it with
I strongly suspect GHC's dynamic linker has a bug here and has incorrectly overridden the stack. This is commonly caused by a calling convention issue. just set
run the program let it generate a core dump. Find out who's at fault by running
which will say something like
then open the coredump
which will say where it went wrong. But I suspect you have a GHC bug. [1] https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html |
@Mistuke I have tried to follow your instructions, but I believe that I have missed something near the end.
I suspect something isn't quite lining up at the end. I'm not very proficient with |
I assume in the actual command you ran you don't have that stray
This is fine.
unfortunately looks like the symbols have been stripped. you'd need a debug build of GHC to see more.
Seems to indicate what I suspected that it's being triggered by something in the libc detecting the corruption.
and repeating it all, and use
to report where all threads are. Since you're on an x86 platform one can use the LBR (Last Branch Record) to figure out where the original call came from, but that's a lot to explain so would be easier to just submit a small repro. Unfortunately without having a debug version of ghc there's not much info you can get other than the above. |
@Mistuke , Thanks for the additional guidance. I was able to recover some more information after installing
Any insights? |
well looks like it was doing some kind of I/O action. Thread 2 is the RTS scheduler thread which is fine, Thread 3 is the libc's read thread which is fine, Thread 4 is the GHC's I/O manager's thread that's polling for I/O completion, also fine. Thread 5 dunno, but thread 1 is fishy but no extra information. You have enough information to submit a GHC bug report. it's much easier to debug with a debug build. They should be able to get the same error with valgrind if the crash doesn't happen when they test. |
Describe the bug
When building some packages (
text-show
,mmark
), the packages fail to build with a-6
error code. Some introspection of logs generated bycabal
report "stack smashing," likely originating from eitherghc
orgcc
.To Reproduce
Steps to reproduce the behavior:
Expected behavior
That
text-show
builds successfully.System information
Additional context
Moved from #7311
The text was updated successfully, but these errors were encountered: