-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release amd64 binary: Illegal hardware instruction #8947
Comments
Color me surprised to learn that Gentoo of all distros is relying on prebuilt upstream binaries for their packaging. |
I expected a comment like this :D |
I'd like to be sure that this is coming from pandoc and not lualatex (which will be called given the command you've used). Can you reproduce this using a simpler command (not producing a PDF)? Also, could you try with this command, but with |
Possibly relevant: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/1306 |
I don't know much about this, but it could be that ghc determines dynamically whether the processor it's running on supports AVX, and then uses these instructions if it does. (I'm guessing our build machine does.) I'm not (yet) seeing any way to tell it not to do this. I haven't seen this reported before: is that because only fairly old machines don't support AVX at this point? |
Actually there is a flag for avx (from ghc 9.6 manual):
My understanding is that ghc uses the native code generator by default. |
Here are the results:
EDIT:
|
OK, that's helpful. Does it matter what is in |
I just tried it with only "test" in |
@mpickering as a ghc dev I was hoping you might have insight into this? |
As far as I'm aware ghcs native backend can't emit this instruction. This means it was likely the result of missguided optimization either in a library, ghcs RTS, or through the llvm backend. For any more insight we would need to know which ghc version/libraries were used to build this release. A likely culprit seems the |
@AndreasPK thanks for commenting here. I don't have the exact list for that build, but I triggered a new release build and made it emit a cabal freeze. These should be roughly the same versions of packages, as the last release was just last week. text is version 2.0.2. Another place to look is the whole new crypton ecosystem, I suppose, since that is new in the last pandoc release; if the problem lies there, it would explain why I haven't gotten other reports like this. (On the other hand, it could just be that people are using pandoc on relatively recent hardware.) ghc version: ghc 9.6.2, from Docker image glcr.b-data.ch/ghc/ghc-musl:9.6.2
|
Seems you depend on text >= 2.0 which comes with the new simd code. One "easy" way to check if it's text should be to disabled simd for text in a build using the |
OK, I think I've built a version using the release build script with a constraint that forces text to use |
I gave it a try, but unfortunately I still get the same error. I also tried --version and noticed that
Thank you for your patience and your efforts so far, I appreciate it! |
OK, that is helpful information. It suggests that the culprit is not |
Actually I think this is a good clue, that versionInfo :: IO ()
versionInfo = do
progname <- getProgName
defaultDatadir <- defaultUserDataDir
scriptingEngine <- getEngine
putStr $ unlines
[ progname ++ " " ++ showVersion pandocVersion ++ versionSuffix
, flagSettings
, "Scripting engine: " ++ T.unpack (engineName scriptingEngine)
, "User data directory: " ++ defaultDatadir
, copyrightMessage
]
exitSuccess That suggests that the error occurs in the "Scripting engine" part (so, To test this hypothesis I'll try making a build without lua support, which you can try. |
OK, the following build disables both the @freijon It will be interesting to see if the problem can be reproduced with this binary. |
Thanks!
|
Does your .md have YAML metadata? I ask because the yaml library embeds a C library. |
My test-.md indeed had some special things like bullet list and headings. I did another test with only one word inside. Still get a SIGILL |
Some notes: We switched to ghc-musl 9.6.2 on June 26 (3.1.5 was built with this). I'm pinging @benz0li who maintains the ghc-musl images and might know something else that could be relevant to this issue. We switched to the crypton ecosystem for the 3.1.4 build (but this doesn't affect 3.1.3). |
I'll note that both this and the related Windows issue point to ghc 9.4 as a possible culprit:
I guess there is an easy way to test this hypothesis. I can do a linux build using ghc 9.2, but otherwise the same as the last release. |
Update: actually, it looks like ghc-musl-9.4.4 was used for release pandoc 3.1.2, and we switched to 9.4.5 for 3.1.3. |
ℹ️ glcr.b-data.ch/ghc/ghc-musl uses the LLVM backend. |
Yes. Advanced Vector Extensions (AVX) were introduced 12 years ago. |
ghc 9.4.5 bumps text to 2.0.2 in core libraries. |
@freijon Does pandoc 3.1.6 work as expected on your old machine? @AndreasPK Thank you for further insights from your side on |
Could anyone who can reproduce this try to run pandoc under gdb to get a backtrace? Alternatively if someone can give me step-by-step instructions which allow to reproduce this I might be able to do so myself depending on the requirements. |
I downloaded the release in question and I can see the instruction in it (although my machine does support it). However it seems the release is naturally stripped of all symbols so that wasn't as informative as I had hoped. |
I built pandoc myself and just grepped for the instruction in the assembly. It function has been there for "forever" and doesn't explicitly use simd. Rather it seems auto vectorization triggers:
So it comes down to whatever flags the version of bytestring pandoc is linked against has been built with. |
https://gitlab.haskell.org/ghc/ghc/-/issues/23718
Edit: At the very least there are avx instructions in the binary which, on my mache, get executed. However I also have an avx cpu and there seem to be runtime checks. So that's not necessarily wrong. |
Indeed, I can confirm that the 3.1.6 release works perfectly on the machine in question! Thank you guys! |
Update: https://gitlab.haskell.org/ghc/ghc/-/issues/23718#note_516256 I suspect this might be an issue with the runtime cpu feature support not working as expected. Which might be ours or the simulators fault. Can someone confirm they have seen this happen outside of QEMU? Additionally can someone who has qemu set up check if has_avx2 from https://gitlab.haskell.org/ghc/packages/bytestring/-/blob/81d041433341fea92605eb1440151d0ab4c9c85b/cbits/x86/is-valid-utf8.c returns true under qemu? |
It ( |
@freijon I try to reproduce this issue in order to get an answer to https://gitlab.haskell.org/ghc/ghc/-/issues/23718#note_516638. What exact system are you emulating with QEMU?
|
@freijon I can not reproduce with a QEMU VM very similar to yours.
Output of
ℹ️ CPU flag Output of
@freijon's CPU seems to be an 64-bit Intel® Xeon® Processor "Nocona" from 2004. |
@freijon Could you please reproduce the following with your QEMU VM?: https://gitlab.haskell.org/ghc/ghc/-/issues/23718#note_516480 Should, to our surprise, the output be Thank you for your feedback. |
QEMU emulator version 6.2.0 (v6.2.0-11889-g5b72bf03f5-dirty)
I don't use the
The Windows build doesn't support host passthrough, nor do most named models work. I tried a lot of them to resolve this issue. One of the reasons might be the
I compiled
do I need to compile some haskell libraries for this test? |
@freijon What surprises me is that the CPU supports P.S.: I am pretty sure the CPU is being passed through by the host. |
I wrote this test under the assumption of ghc-9.4 being used to compile it. The test requires bytestring >= 0.11.2.0 which ships with ghc-9.4 iirc. If you use an older ghc you will need to use/compile a newer bytestring library somehow (make it a cabal project or similar). |
I guess that could be the issue. We only check for avx2, but definitely use avx instructions. |
I opened haskell/bytestring#603 for bytestring. |
@freijon Out of curiosity: What does Coreinfo64 return on the Windows host? If the output contain |
@benz0li: Here the output: https://bpa.st/BUIA
I'm pretty sure this is not the case. My CPU is not from 2004. But if it helps, these are the arguments I use to launch the VM: When I use
|
No, it is not. But that is what I assumed by Both |
I'm not familiar what cpu pass through implies. But based on the comments here
And here: haskell/bytestring#603 (comment)
It seems @freijon is in a situation where cpuid returns avx2 support but the executable crashes if avx[2] is used. The only reasonable explaination for this in my mind is either a missconfigured qemu or a bug in qemu. Either way it seems to be not an issue with bytestring/ghc so I won't look much further into it. Hopefully the info collected here is enough for someone to figure out the configuration issue or qemu bug. |
Thank you everyone for the patience and effort you put into this, I really appreciate this. |
OK, I will close this but the information will remain here in case anyone else experiences the problem. Thanks everyone for working to figure out what is going on here! If I understand correctly, then, we can go back to using ghc 9.4 or 9.6 to compile future versions of pandoc, because this issue only affects use with QEMU? |
It also impacts old computers that don't support AVX. But I guess the number of impacted machines is shrinking every day... |
It should not as there is a cpuid check for avx support. It's just that this check seems broken under qemu under your setup. |
@AndreasPK is right. This is a bug in WHPX acceleration ( |
OK, I'm going to switch back to using more modern ghc versions, then. |
Downstream bug (Gentoo): https://bugs.gentoo.org/910183 - Gentoo uses the binary provided in the released tar.gz
I installed pandoc on my VM. When I use the following command, I get the following error:
Command:
pandoc --pdf-engine=lualatex -H <preamble-file.tex> <input-file.md> -o <output-file.pdf>
Error:
Here some additional information:
Output of
resolve-march-native
:Versions tried:
After some initial debugging with
gdb
, I found:This indicates that the binary appears to be using AVX which isn't available on all 64-bit x86 CPUs
The text was updated successfully, but these errors were encountered: