Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release amd64 binary: Illegal hardware instruction #8947

Closed
freijon opened this issue Jul 10, 2023 · 71 comments
Closed

Release amd64 binary: Illegal hardware instruction #8947

freijon opened this issue Jul 10, 2023 · 71 comments
Labels

Comments

@freijon
Copy link

freijon commented Jul 10, 2023

Downstream bug (Gentoo): https://bugs.gentoo.org/910183 - Gentoo uses the binary provided in the released tar.gz

I installed pandoc on my VM. When I use the following command, I get the following error:

Command: pandoc --pdf-engine=lualatex -H <preamble-file.tex> <input-file.md> -o <output-file.pdf>

Error:

[1] 19126 illegal hardware instruction

Here some additional information:

  • Arch: x86_64
  • Using QEMU on a Windows Host

Output of resolve-march-native:

-march=nocona -madx -mbmi -mbmi2 -mclflushopt -mfsgsbase -mrdseed -msahf -mxsave -mxsavec -mxsaveopt -mxsaves --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=12288

Versions tried:

  • 3.1.3
  • 3.1.5

After some initial debugging with gdb, I found:

Starting program: /usr/bin/pandoc --pdf-engine=lualatex -H preamble.tex dokumentation_cockpit.md -o dokumentation_cockpit.pdf
[New LWP 13421]
[New LWP 13422]
[New LWP 13423]
[New LWP 13424]

Thread 1 "pandoc" received signal SIGILL, Illegal instruction.
0x0000000006413259 in ?? ()

(gdb) x/i $pc
=> 0x6407fdc: vpxor %xmm5,%xmm5,%xmm5

This indicates that the binary appears to be using AVX which isn't available on all 64-bit x86 CPUs

@freijon freijon added the bug label Jul 10, 2023
@alerque
Copy link
Contributor

alerque commented Jul 10, 2023

Color me surprised to learn that Gentoo of all distros is relying on prebuilt upstream binaries for their packaging.

@freijon
Copy link
Author

freijon commented Jul 10, 2023

I expected a comment like this :D
It's the solution for lazy people. The main package is built from source of course, but there is a binary package for people who don't want to compile and maintain 200+ Haskell packages just to run pandoc ;)

@jgm
Copy link
Owner

jgm commented Jul 10, 2023

I'd like to be sure that this is coming from pandoc and not lualatex (which will be called given the command you've used). Can you reproduce this using a simpler command (not producing a PDF)? Also, could you try with this command, but with --verbose, which may give us a better indication of where this is occurring?

@jgm
Copy link
Owner

jgm commented Jul 10, 2023

@jgm
Copy link
Owner

jgm commented Jul 10, 2023

I don't know much about this, but it could be that ghc determines dynamically whether the processor it's running on supports AVX, and then uses these instructions if it does. (I'm guessing our build machine does.) I'm not (yet) seeing any way to tell it not to do this.

I haven't seen this reported before: is that because only fairly old machines don't support AVX at this point?

@jgm
Copy link
Owner

jgm commented Jul 10, 2023

Actually there is a flag for avx (from ghc 9.6 manual):

-m avx
(x86 only) These SIMD instructions are currently not supported by the native code generator. Enabling this flag has no effect and is only present for future extensions.

The LLVM backend may use AVX if your processor supports it, but detects this automatically, so no flag is required.

My understanding is that ghc uses the native code generator by default.

@freijon
Copy link
Author

freijon commented Jul 11, 2023

I'd like to be sure that this is coming from pandoc and not lualatex (which will be called given the command you've used). Can you reproduce this using a simpler command (not producing a PDF)? Also, could you try with this command, but with --verbose, which may give us a better indication of where this is occurring?

Here are the results:

  • pandoc --verbose <old arguments> --> no additional output that would help...
  • pandoc --verbose in_file.md -o out_file.html --> same error "illegal hardware instruction"

EDIT:

  • pandoc --help displays the help correctly

@jgm
Copy link
Owner

jgm commented Jul 11, 2023

OK, that's helpful. Does it matter what is in in_file.md? Can it be just one word, for example?

@freijon
Copy link
Author

freijon commented Jul 11, 2023

I just tried it with only "test" in in_file.md - same result

@jgm
Copy link
Owner

jgm commented Jul 11, 2023

@mpickering as a ghc dev I was hoping you might have insight into this?

@AndreasPK
Copy link

As far as I'm aware ghcs native backend can't emit this instruction. This means it was likely the result of missguided optimization either in a library, ghcs RTS, or through the llvm backend.

For any more insight we would need to know which ghc version/libraries were used to build this release. A likely culprit seems the text library which recently started using SIMD via C bindings for some functionality.

@jgm
Copy link
Owner

jgm commented Jul 11, 2023

@AndreasPK thanks for commenting here. I don't have the exact list for that build, but I triggered a new release build and made it emit a cabal freeze. These should be roughly the same versions of packages, as the last release was just last week. text is version 2.0.2. Another place to look is the whole new crypton ecosystem, I suppose, since that is new in the last pandoc release; if the problem lies there, it would explain why I haven't gotten other reports like this. (On the other hand, it could just be that people are using pandoc on relatively recent hardware.)

ghc version: ghc 9.6.2, from Docker image glcr.b-data.ch/ghc/ghc-musl:9.6.2

Wrote freeze file: /tmp/cirrus-ci-build/cabal.project.freeze
active-repositories: hackage.haskell.org:merge
constraints: any.Cabal ==3.10.1.0,
             any.Cabal-syntax ==3.10.1.0,
             any.Diff ==0.4.1,
             any.Glob ==0.10.2,
             any.HUnit ==1.6.2.0,
             any.JuicyPixels ==3.3.8,
             JuicyPixels -mmap,
             any.OneTuple ==0.4.1.1,
             any.Only ==0.1,
             any.QuickCheck ==2.14.3,
             QuickCheck -old-random +templatehaskell,
             any.SHA ==1.6.4.4,
             SHA -exe,
             any.StateVar ==1.2.2,
             any.aeson ==2.1.2.1,
             aeson -cffi +ordered-keymap,
             any.aeson-pretty ==0.8.10,
             aeson-pretty -lib-only,
             any.alex ==3.4.0.0,
             any.ansi-terminal ==1.0,
             ansi-terminal -example,
             any.ansi-terminal-types ==0.11.5,
             any.appar ==0.1.8,
             any.array ==0.5.5.0,
             any.asn1-encoding ==0.9.6,
             any.asn1-parse ==0.9.5,
             any.asn1-types ==0.3.4,
             any.assoc ==1.1,
             assoc +tagged,
             any.async ==2.2.4,
             async -bench,
             any.attoparsec ==0.14.4,
             attoparsec -developer,
             any.attoparsec-aeson ==2.1.0.0,
             any.attoparsec-iso8601 ==1.1.0.0,
             any.auto-update ==0.1.6,
             any.base ==4.18.0.0,
             any.base-compat ==0.13.0,
             any.base-compat-batteries ==0.13.0,
             any.base-orphans ==0.9.0,
             any.base-unicode-symbols ==0.2.4.2,
             base-unicode-symbols +base-4-8 -old-base,
             any.base16-bytestring ==1.0.2.0,
             any.base64 ==0.4.2.4,
             any.base64-bytestring ==1.2.1.0,
             any.basement ==0.0.16,
             any.bifunctors ==5.6.1,
             bifunctors +tagged,
             any.binary ==0.8.9.1,
             any.bitvec ==1.1.4.0,
             bitvec -libgmp,
             any.blaze-builder ==0.4.2.2,
             any.blaze-html ==0.9.1.2,
             any.blaze-markup ==0.8.2.8,
             any.boring ==0.2.1,
             boring +tagged,
             any.bsb-http-chunked ==0.0.0.4,
             any.byteorder ==1.0.4,
             any.bytestring ==0.11.4.0,
             any.cabal-doctest ==1.0.9,
             any.call-stack ==0.4.0,
             any.case-insensitive ==1.2.1.0,
             any.cassava ==0.5.3.0,
             cassava -bytestring--lt-0_10_4,
             any.cereal ==0.5.8.3,
             cereal -bytestring-builder,
             any.citeproc ==0.8.1,
             citeproc -executable -icu,
             any.cmdargs ==0.10.22,
             cmdargs +quotation -testprog,
             any.colour ==2.3.6,
             any.commonmark ==0.2.3,
             any.commonmark-extensions ==0.2.3.4,
             any.commonmark-pandoc ==0.2.1.3,
             any.comonad ==5.0.8,
             comonad +containers +distributive +indexed-traversable,
             any.conduit ==1.3.5,
             any.conduit-extra ==1.3.6,
             any.constraints ==0.13.4,
             any.containers ==0.6.7,
             any.contravariant ==1.5.5,
             contravariant +semigroups +statevar +tagged,
             any.cookie ==0.4.6,
             any.crypton ==0.33,
             crypton -check_alignment +integer-gmp -old_toolchain_inliner +support_aesni +support_deepseq +support_pclmuldq +support_rdrand -support_sse +use_target_attributes,
             any.crypton-connection ==0.3.1,
             any.crypton-x509 ==1.7.6,
             any.crypton-x509-store ==1.6.9,
             any.crypton-x509-system ==1.6.7,
             any.crypton-x509-validation ==1.6.12,
             any.cryptonite ==0.30,
             cryptonite -check_alignment +integer-gmp -old_toolchain_inliner +support_aesni +support_deepseq -support_pclmuldq +support_rdrand -support_sse +use_target_attributes,
             any.data-default ==0.7.1.1,
             any.data-default-class ==0.1.2.0,
             any.data-default-instances-containers ==0.0.1,
             any.data-default-instances-dlist ==0.0.1,
             any.data-default-instances-old-locale ==0.0.1,
             any.data-fix ==0.3.2,
             any.dec ==0.0.5,
             any.deepseq ==1.4.8.1,
             any.digest ==0.0.1.3,
             digest -bytestring-in-base,
             any.digits ==0.3.1,
             any.directory ==1.3.8.1,
             any.distributive ==0.6.2.1,
             distributive +semigroups +tagged,
             any.dlist ==1.0,
             dlist -werror,
             any.doclayout ==0.4.0.1,
             any.doctemplates ==0.11,
             any.easy-file ==0.2.5,
             any.emojis ==0.1.2,
             any.exceptions ==0.10.7,
             any.fast-logger ==3.2.2,
             any.file-embed ==0.0.15.0,
             any.filepath ==1.4.100.1,
             any.generically ==0.1.1,
             any.ghc-bignum ==1.3,
             any.ghc-boot-th ==9.6.2,
             any.ghc-prim ==0.10.0,
             any.gridtables ==0.1.0.0,
             any.haddock-library ==1.11.0,
             any.happy ==1.20.1.1,
             any.hashable ==1.4.2.0,
             hashable +integer-gmp -random-initial-seed,
             any.haskell-lexer ==1.1.1,
             any.hourglass ==0.2.12,
             any.hsc2hs ==0.68.9,
             hsc2hs -in-ghc-tree,
             any.hslua ==2.3.0,
             any.hslua-aeson ==2.3.0.1,
             any.hslua-classes ==2.3.0,
             any.hslua-cli ==1.4.1,
             hslua-cli -executable,
             any.hslua-core ==2.3.1,
             any.hslua-list ==1.1.1,
             any.hslua-marshalling ==2.3.0,
             any.hslua-module-doclayout ==1.1.0,
             any.hslua-module-path ==1.1.0,
             any.hslua-module-system ==1.1.0.1,
             any.hslua-module-text ==1.1.0.1,
             any.hslua-module-version ==1.1.0,
             any.hslua-module-zip ==1.1.0,
             any.hslua-objectorientation ==2.3.0,
             any.hslua-packaging ==2.3.0,
             any.hslua-repl ==0.1.1,
             hslua-repl -executable,
             any.hslua-typing ==0.1.0,
             any.http-api-data ==0.5.1,
             http-api-data -use-text-show,
             any.http-client ==0.7.13.1,
             http-client +network-uri,
             any.http-client-tls ==0.3.6.2,
             any.http-date ==0.0.11,
             any.http-media ==0.8.0.0,
             any.http-types ==0.12.3,
             any.http2 ==4.1.4,
             http2 -devel -h2spec,
             any.indexed-traversable ==0.1.2.1,
             any.indexed-traversable-instances ==0.1.1.2,
             any.integer-gmp ==1.1,
             any.integer-logarithms ==1.0.3.1,
             integer-logarithms -check-bounds +integer-gmp,
             any.iproute ==1.7.12,
             any.ipynb ==0.2,
             any.isocline ==1.0.9,
             any.jira-wiki-markup ==1.5.1,
             any.libyaml ==0.1.2,
             libyaml -no-unicode -system-libyaml,
             any.lpeg ==1.0.4,
             lpeg -rely-on-shared-lpeg-library,
             any.lua ==2.3.1,
             lua +allow-unsafe-gc -apicheck -cross-compile +export-dynamic -lua_32bits -pkg-config -system-lua,
             any.lua-arbitrary ==1.0.1.1,
             any.memory ==0.18.0,
             memory +support_bytestring +support_deepseq,
             any.mime-types ==0.1.1.0,
             any.mmorph ==1.2.0,
             any.monad-control ==1.0.3.1,
             any.mono-traversable ==1.0.15.3,
             any.mtl ==2.3.1,
             any.network ==3.1.4.0,
             network -devel,
             any.network-byte-order ==0.1.6,
             any.network-uri ==2.6.4.2,
             any.old-locale ==1.0.0.7,
             any.old-time ==1.1.0.3,
             any.optparse-applicative ==0.18.1.0,
             optparse-applicative +process,
             any.ordered-containers ==0.2.3,
             pandoc +embed_data_files,
             pandoc-cli +lua -nightly +server,
             any.pandoc-lua-marshal ==0.2.2,
             any.pandoc-types ==1.23.0.1,
             any.parsec ==3.1.16.1,
             any.pem ==0.2.4,
             any.pretty ==1.1.3.6,
             any.pretty-show ==1.10,
             any.prettyprinter ==1.7.1,
             prettyprinter -buildreadme +text,
             any.prettyprinter-ansi-terminal ==1.1.3,
             any.primitive ==0.8.0.0,
             any.process ==1.6.17.0,
             any.psqueues ==0.2.7.3,
             any.random ==1.2.1.1,
             any.recv ==0.1.0,
             any.regex-base ==0.94.0.2,
             any.regex-tdfa ==1.3.2.1,
             regex-tdfa -force-o2,
             any.resourcet ==1.3.0,
             any.rts ==1.0.2,
             any.safe ==0.3.19,
             any.safe-exceptions ==0.1.7.4,
             any.scientific ==0.3.7.0,
             scientific -bytestring-builder -integer-simple,
             any.semialign ==1.3,
             semialign +semigroupoids,
             any.semigroupoids ==6.0.0.1,
             semigroupoids +comonad +containers +contravariant +distributive +tagged +unordered-containers,
             any.servant ==0.20,
             any.servant-server ==0.20,
             any.simple-sendfile ==0.2.32,
             simple-sendfile +allow-bsd -fallback,
             any.singleton-bool ==0.1.7,
             any.skylighting ==0.13.4,
             skylighting -executable,
             any.skylighting-core ==0.13.4,
             skylighting-core -executable,
             any.skylighting-format-ansi ==0.1,
             any.skylighting-format-blaze-html ==0.1.1,
             any.skylighting-format-context ==0.1.0.2,
             any.skylighting-format-latex ==0.1,
             any.socks ==0.6.1,
             any.some ==1.0.5,
             some +newtype-unsafe,
             any.sop-core ==0.5.0.2,
             any.split ==0.2.3.5,
             any.splitmix ==0.1.0.4,
             splitmix -optimised-mixer,
             any.stm ==2.5.1.0,
             any.streaming-commons ==0.2.2.6,
             streaming-commons -use-bytestring-builder,
             any.strict ==0.5,
             any.string-conversions ==0.4.0.1,
             any.syb ==0.7.2.3,
             any.tagged ==0.8.7,
             tagged +deepseq +transformers,
             any.tagsoup ==0.14.8,
             any.tasty ==1.4.3,
             tasty +unix,
             any.tasty-bench ==0.3.4,
             tasty-bench -debug +tasty,
             any.tasty-golden ==2.3.5,
             tasty-golden -build-example,
             any.tasty-hunit ==0.10.0.3,
             any.tasty-lua ==1.1.0,
             any.tasty-quickcheck ==0.10.2,
             any.template-haskell ==2.20.0.0,
             any.temporary ==1.3,
             any.texmath ==0.12.8,
             texmath -executable -server,
             any.text ==2.0.2,
             any.text-conversions ==0.3.1.1,
             any.text-short ==0.1.5,
             text-short -asserts,
             any.th-abstraction ==0.5.0.0,
             any.th-compat ==0.1.4,
             any.th-lift ==0.8.3,
             any.th-lift-instances ==0.1.20,
             any.these ==1.2,
             any.time ==1.12.2,
             any.time-compat ==1.9.6.1,
             time-compat -old-locale,
             any.time-manager ==0.0.0,
             any.tls ==1.7.0,
             tls +compat -hans +network,
             any.toml-parser ==1.2.0.0,
             any.transformers ==0.6.1.0,
             any.transformers-base ==0.4.6,
             transformers-base +orphaninstances,
             any.transformers-compat ==0.7.2,
             transformers-compat -five +five-three -four +generic-deriving +mtl -three -two,
             any.type-equality ==1,
             any.typed-process ==0.2.11.0,
             any.typst ==0.3.0.0,
             typst -executable,
             any.typst-symbols ==0.1.2,
             any.unicode-collation ==0.1.3.4,
             unicode-collation -doctests -executable,
             any.unicode-data ==0.4.0.1,
             unicode-data -ucd2haskell,
             any.unicode-transforms ==0.4.0.1,
             unicode-transforms -bench-show -dev -has-icu -has-llvm -use-gauge,
             any.uniplate ==1.6.13,
             any.unix ==2.8.1.0,
             any.unix-compat ==0.7,
             unix-compat -old-time,
             any.unix-time ==0.4.10,
             any.unliftio ==0.2.25.0,
             any.unliftio-core ==0.2.1.0,
             any.unordered-containers ==0.2.19.1,
             unordered-containers -debug,
             any.utf8-string ==1.0.2,
             any.uuid-types ==1.0.5,
             any.vault ==0.3.1.5,
             vault +useghc,
             any.vector ==0.13.0.0,
             vector +boundschecks -internalchecks -unsafechecks -wall,
             any.vector-algorithms ==0.9.0.1,
             vector-algorithms +bench +boundschecks -internalchecks -llvm +properties -unsafechecks,
             any.vector-stream ==0.1.0.0,
             any.wai ==3.2.3,
             any.wai-app-static ==3.1.7.4,
             wai-app-static +cryptonite -print,
             any.wai-cors ==0.2.7,
             any.wai-extra ==3.1.13.0,
             wai-extra -build-example,
             any.wai-logger ==2.4.0,
             any.warp ==3.3.28,
             warp +allow-sendfilefd -network-bytestring -warp-debug +x509,
             any.witherable ==0.4.2,
             any.word8 ==0.1.3,
             any.xml ==1.3.14,
             any.xml-conduit ==1.9.1.3,
             any.xml-types ==0.3.8,
             any.yaml ==0.11.11.2,
             yaml +no-examples +no-exe,
             any.zip-archive ==0.4.3,
             zip-archive -executable,
             any.zlib ==0.6.3.0,

@AndreasPK
Copy link

AndreasPK commented Jul 12, 2023

Seems you depend on text >= 2.0 which comes with the new simd code.

One "easy" way to check if it's text should be to disabled simd for text in a build using the simdutf cabal flag and see if the error still persis.

@jgm
Copy link
Owner

jgm commented Jul 12, 2023

OK, I think I've built a version using the release build script with a constraint that forces text to use -simdutf.
@freijon could you try downloading the build artifact from here and see if you still get the error on your system?
https://cirrus-ci.com/task/4511237447352320

@freijon
Copy link
Author

freijon commented Jul 13, 2023

I gave it a try, but unfortunately I still get the same error. I also tried --version and noticed that pandoc outputs some text and then fails:

/tmp/pandoc/pandoc-3.1.5/bin/pandoc --version --verbose

pandoc 3.1.5
Features: +server +lua
[1] 3102 illegal hardware instruction /tmp/pandoc/pandoc-3.1.5/bin/pandoc --version --verbose

Thank you for your patience and your efforts so far, I appreciate it!

@jgm
Copy link
Owner

jgm commented Jul 13, 2023

OK, that is helpful information. It suggests that the culprit is not +simdutf in text. @AndreasPK any other ideas?

@jgm
Copy link
Owner

jgm commented Jul 14, 2023

Actually I think this is a good clue, that --version emits those lines then stops.

versionInfo :: IO ()        
versionInfo = do           
  progname <- getProgName
  defaultDatadir <- defaultUserDataDir
  scriptingEngine <- getEngine            
  putStr $ unlines                          
   [ progname ++ " " ++ showVersion pandocVersion ++ versionSuffix
   , flagSettings                
   , "Scripting engine: " ++ T.unpack (engineName scriptingEngine)
   , "User data directory: " ++ defaultDatadir
   , copyrightMessage
   ]                                      
  exitSuccess

That suggests that the error occurs in the "Scripting engine" part (so, getEngine).
That may implicate the Lua subsystem, which obviously has pieces in C. Maybe the C is being compiled with these optimizations; we just need to figure out how to turn that off.

To test this hypothesis I'll try making a build without lua support, which you can try.

@jgm
Copy link
Owner

jgm commented Jul 14, 2023

OK, the following build disables both the server and the lua flags (as well as simdutf for text):
https://cirrus-ci.com/task/4556227045228544

@freijon It will be interesting to see if the problem can be reproduced with this binary.

@freijon
Copy link
Author

freijon commented Jul 14, 2023

Thanks!

pandoc --version now works! I see the complete version info. Some progress!
Unfortunately, converting a .md to .html still fails with a SIGILL

@jgm
Copy link
Owner

jgm commented Jul 14, 2023

Does your .md have YAML metadata? I ask because the yaml library embeds a C library.
Do you still get the problem when converting a minimal md file (one word)?

@freijon
Copy link
Author

freijon commented Jul 14, 2023

My test-.md indeed had some special things like bullet list and headings. I did another test with only one word inside. Still get a SIGILL

@jgm
Copy link
Owner

jgm commented Jul 16, 2023

Some notes:

We switched to ghc-musl 9.6.2 on June 26 (3.1.5 was built with this).
And to ghc-musl 9.4.5 on April 20 (3.1.3 and 3.1.4 were built with this).

I'm pinging @benz0li who maintains the ghc-musl images and might know something else that could be relevant to this issue.

We switched to the crypton ecosystem for the 3.1.4 build (but this doesn't affect 3.1.3).

@jgm
Copy link
Owner

jgm commented Jul 16, 2023

I'll note that both this and the related Windows issue point to ghc 9.4 as a possible culprit:

  • The linux binaries are created on the ghc-musl docker image with cabal and latest packages.
  • The Windows binaries are created using stack and a curated list of packages.
  • We're getting the illegal instruction issues on Windows with pandoc versions 3.1.4 and later, and on linux with 3.1.3 and later.
  • We started using ghc 9.4 for version 3.1.4 on Windows and for version 3.1.3 on linux.

I guess there is an easy way to test this hypothesis. I can do a linux build using ghc 9.2, but otherwise the same as the last release.

@jgm
Copy link
Owner

jgm commented Jul 16, 2023

Update: actually, it looks like ghc-musl-9.4.4 was used for release pandoc 3.1.2, and we switched to 9.4.5 for 3.1.3.

jgm added a commit that referenced this issue Jul 16, 2023
For background see #8947.
@benz0li
Copy link
Contributor

benz0li commented Jul 16, 2023

Actually there is a flag for avx (from ghc 9.6 manual):

-m avx
(x86 only) These SIMD instructions are currently not supported by the native code generator. Enabling this flag has no effect and is only present for future extensions.

The LLVM backend may use AVX if your processor supports it, but detects this automatically, so no flag is required.

My understanding is that ghc uses the native code generator by default.

ℹ️ glcr.b-data.ch/ghc/ghc-musl uses the LLVM backend.

@benz0li
Copy link
Contributor

benz0li commented Jul 16, 2023

I haven't seen this reported before: is that because only fairly old machines don't support AVX at this point?

Yes. Advanced Vector Extensions (AVX) were introduced 12 years ago.

https://en.wikipedia.org/wiki/Advanced_Vector_Extensions

@jgm
Copy link
Owner

jgm commented Jul 16, 2023

ghc 9.4.5 bumps text to 2.0.2 in core libraries.

@benz0li
Copy link
Contributor

benz0li commented Jul 24, 2023

@AndreasPK
Copy link

Could anyone who can reproduce this try to run pandoc under gdb to get a backtrace?

Alternatively if someone can give me step-by-step instructions which allow to reproduce this I might be able to do so myself depending on the requirements.

@AndreasPK
Copy link

I downloaded the release in question and I can see the instruction in it (although my machine does support it). However it seems the release is naturally stripped of all symbols so that wasn't as informative as I had hoped.

@AndreasPK
Copy link

I built pandoc myself and just grepped for the instruction in the assembly.
This seems to come from the function _hs_bytestring_long_long_uint_hex which is part of bytestring.

It function has been there for "forever" and doesn't explicitly use simd. Rather it seems auto vectorization triggers:

// unsigned long ints (64 bit words)
char* _hs_bytestring_long_long_uint_hex (long long unsigned int x, char* buf) {
    // write hex representation in reverse order
    char c, *ptr = buf, *next_free;
    do {
        *ptr++ = digits[x & 0xf];
        x >>= 4;
    } while ( x );
    // invert written digits
    next_free = ptr--;
    while(buf < ptr) {
        c      = *ptr;
        *ptr-- = *buf;
        *buf++ = c;
    }
    return next_free;
};

So it comes down to whatever flags the version of bytestring pandoc is linked against has been built with.

@AndreasPK
Copy link

AndreasPK commented Jul 24, 2023

https://gitlab.haskell.org/ghc/ghc/-/issues/23718

I can confirm it's an upstream issue. The libraries shipping with ghc seem to have avx enabled.

Edit: At the very least there are avx instructions in the binary which, on my mache, get executed. However I also have an avx cpu and there seem to be runtime checks. So that's not necessarily wrong.

@freijon
Copy link
Author

freijon commented Jul 25, 2023

@freijon Does pandoc 3.1.6 work as expected on your old machine?

Indeed, I can confirm that the 3.1.6 release works perfectly on the machine in question! Thank you guys!

@AndreasPK
Copy link

Update: https://gitlab.haskell.org/ghc/ghc/-/issues/23718#note_516256

I suspect this might be an issue with the runtime cpu feature support not working as expected. Which might be ours or the simulators fault. Can someone confirm they have seen this happen outside of QEMU?

Additionally can someone who has qemu set up check if has_avx2 from https://gitlab.haskell.org/ghc/packages/bytestring/-/blob/81d041433341fea92605eb1440151d0ab4c9c85b/cbits/x86/is-valid-utf8.c returns true under qemu?

@benz0li
Copy link
Contributor

benz0li commented Jul 29, 2023

Additionally can someone who has qemu set up check if has_avx2 from https://gitlab.haskell.org/ghc/packages/bytestring/-/blob/81d041433341fea92605eb1440151d0ab4c9c85b/cbits/x86/is-valid-utf8.c returns true under qemu?

It (q35) does not. See https://gitlab.haskell.org/ghc/ghc/-/issues/23718#note_516480.

@benz0li
Copy link
Contributor

benz0li commented Jul 31, 2023

@freijon I try to reproduce this issue in order to get an answer to https://gitlab.haskell.org/ghc/ghc/-/issues/23718#note_516638.

What exact system are you emulating with QEMU?

  1. OS (Arch): Which version?
  2. QEMU: Which version?
  3. Machine (-machine): pc or q35?
  4. CPU (-cpu): Host passthrough or a named model?
    • What is the output of /proc/cpuinfo from your QEMU guest VM?

@benz0li
Copy link
Contributor

benz0li commented Jul 31, 2023

@freijon I can not reproduce with a QEMU VM very similar to yours.

  1. OS: macOS 13.5 (arm64)
  2. QEMU: v7.2.0
  3. Machine (-machine): pc,vmport=off,i8042=off
  4. CPU (-cpu): kvm64,+adx,+bmi1,+bmi2,+clflushopt,+fsgsbase,+rdseed,+xsave,+xsavec,+xsaveopt,+xsaves

Output of resolve-march-native:

-march=nocona -madx -mbmi -mbmi2 -mclflushopt -mfsgsbase -mxsave -mxsaveopt --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=16384

ℹ️ CPU flag sahf is not available in QEMU. https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-msahf
ℹ️ CPU flags rdseed, xsavec and xsaves do not show up in this output although set with argument -cpu.

Output of cat /proc/cpuinfo:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 6
model name	: Common KVM processor
stepping	: 1
microcode	: 0x1
cpu MHz		: 999.998
cache size	: 16384 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc nopl xtopology cpuid pni cx16 xsave hypervisor pti fsgsbase bmi1 bmi2 adx clflushopt xsaveopt
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips	: 1999.99
clflush size	: 64
cache_alignment	: 128
address sizes	: 40 bits physical, 48 bits virtual
power management:

@freijon's CPU seems to be an 64-bit Intel® Xeon® Processor "Nocona" from 2004.

@benz0li
Copy link
Contributor

benz0li commented Jul 31, 2023

@freijon Could you please reproduce the following with your QEMU VM?: https://gitlab.haskell.org/ghc/ghc/-/issues/23718#note_516480

Should, to our surprise, the output be 1: If you have ghc available, please compile and run the code according to https://gitlab.haskell.org/ghc/ghc/-/issues/23718#note_516638.

Thank you for your feedback.

@freijon
Copy link
Author

freijon commented Aug 3, 2023

What exact system are you emulating with QEMU?

  1. OS (Arch): Which version?
  • Host: Windows 10 (64-bit) Version 2009 (OS build 19044)
  • Guest: Gentoo Linux x86_64
  1. QEMU: Which version?

QEMU emulator version 6.2.0 (v6.2.0-11889-g5b72bf03f5-dirty)

  1. Machine (-machine): pc or q35?

I don't use the -machine option, so the QEMU default will be used

  1. CPU (-cpu): Host passthrough or a named model?

The Windows build doesn't support host passthrough, nor do most named models work. I tried a lot of them to resolve this issue. One of the reasons might be the -accel whpx flag (Hyper V acceleration). This flag is mandatory for me, without acceleration the VM is unbearably slow.

  • What is the output of /proc/cpuinfo from your QEMU guest VM?
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 107
model name	: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
stepping	: 1
microcode	: 0xffffffff
cpu MHz		: 1608.049
cache size	: 12288 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 21
wp		: yes
flags		: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc nopl cpuid pni cx16 hypervisor lahf_lm cmp_legacy svm fsgsbase bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit mmio_unknown retbleed
bogomips	: 3216.09
clflush size	: 64
cache_alignment	: 128
address sizes	: 39 bits physical, 48 bits virtual
power management:

@freijon Could you please reproduce the following with your QEMU VM?:
https://gitlab.haskell.org/ghc/ghc/-/issues/23718#note_516480

❯ ./has_avx2
Output is: 1

@freijon Could you please reproduce the following with your QEMU VM?:
https://gitlab.haskell.org/ghc/ghc/-/issues/23718#note_516480

I compiled ghc-9.0.2 from source to test this. Unfortunately I get an error when trying to compile the example:

[1 of 1] Compiling Main             ( simple-repro.hs, simple-repro.o )

simple-repro.hs:6:16: error:
    Variable not in scope: isValidUtf8 :: t0 -> a0
  |
6 | main = print $ isValidUtf8 $ fromString ['a' .. 'Ê']
  |                ^^^^^^^^^^^

do I need to compile some haskell libraries for this test?

@benz0li
Copy link
Contributor

benz0li commented Aug 3, 2023

@freijon What surprises me is that the CPU supports avx2 but not avx.

P.S.: I am pretty sure the CPU is being passed through by the host.

@AndreasPK
Copy link

do I need to compile some haskell libraries for this test?

I wrote this test under the assumption of ghc-9.4 being used to compile it. The test requires bytestring >= 0.11.2.0 which ships with ghc-9.4 iirc. If you use an older ghc you will need to use/compile a newer bytestring library somehow (make it a cabal project or similar).

@AndreasPK
Copy link

@freijon What surprises me is that the CPU supports avx2 but not avx.

I guess that could be the issue. We only check for avx2, but definitely use avx instructions.

@AndreasPK
Copy link

I opened haskell/bytestring#603 for bytestring.

@benz0li
Copy link
Contributor

benz0li commented Aug 3, 2023

@freijon Out of curiosity: What does Coreinfo64 return on the Windows host?

If the output contain AVX, you are doing something odd with QEMU.

@freijon
Copy link
Author

freijon commented Aug 3, 2023

@benz0li: Here the output: https://bpa.st/BUIA

P.S.: I am pretty sure the CPU is being passed through by the host.

I'm pretty sure this is not the case. My CPU is not from 2004. But if it helps, these are the arguments I use to launch the VM:
-accel whpx -display none -m 16G -smp cores=12 -drive file=C:\Users\jonas.frei\Documents\VM\gentoo.cow,format=qcow2,if=virtio -drive file=C:\Users\jonas.frei\Documents\VM\swap.cow,format=qcow2,if=virtio -net user,hostfwd=tcp::2222-:22,hostfwd=tcp::8384-:8384,hostfwd=udp::21027-:21027,hostfwd=tcp::22000-:22000,hostfwd=udp::22000-:22000,hostfwd=tcp::6080-:6080 -net nic

When I use -cpu host (that would allow passthrough), I get:

C:\Program Files\qemu\qemu-system-x86_64.exe: unable to find CPU model 'host'

@benz0li
Copy link
Contributor

benz0li commented Aug 3, 2023

My CPU is not from 2004.

No, it is not. But that is what I assumed by resolve-march-native returning -march=nocona (see #8947 (comment)) .

Both /proc/cpuinfo (QEMU guest VM; Linux) and Coreinfo64 (QEMU host; Windows) report a Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz.
👉 This means that the CPU is being passed through to the guest VM by the host.

@AndreasPK
Copy link

I'm not familiar what cpu pass through implies. But based on the comments here

#8947 (comment)

❯ ./has_avx2
Output is: 1

And here:

haskell/bytestring#603 (comment)

Compile with:
gcc -mavx -o hello_avx hello_avx.c
    
    ./hello_avx
    [1] 879 illegal hardware instruction ./hello_avx

It seems @freijon is in a situation where cpuid returns avx2 support but the executable crashes if avx[2] is used. The only reasonable explaination for this in my mind is either a missconfigured qemu or a bug in qemu. Either way it seems to be not an issue with bytestring/ghc so I won't look much further into it. Hopefully the info collected here is enough for someone to figure out the configuration issue or qemu bug.

@freijon
Copy link
Author

freijon commented Aug 4, 2023

Thank you everyone for the patience and effort you put into this, I really appreciate this.
The latest pandoc release 3.1.6 which was built with ghc 9.2 works fine. At least for the time being the issue is solved for me. It will only come back when building future release binaries with more recent ghc versions. I don't know if / how long you can stick with ghc 9.2? Anyway, it seems I'm representing a small minority that is affected by this. I think it doesn't warrant any more of your valuable time to look into this. You can close this issue and I'll reference it in a new QEMU issue.

@jgm
Copy link
Owner

jgm commented Aug 4, 2023

OK, I will close this but the information will remain here in case anyone else experiences the problem. Thanks everyone for working to figure out what is going on here!

If I understand correctly, then, we can go back to using ghc 9.4 or 9.6 to compile future versions of pandoc, because this issue only affects use with QEMU?

@jgm jgm closed this as completed Aug 4, 2023
@freijon
Copy link
Author

freijon commented Aug 4, 2023

It also impacts old computers that don't support AVX. But I guess the number of impacted machines is shrinking every day...

@AndreasPK
Copy link

It also impacts old computers that don't support AVX. But I guess the number of impacted machines is shrinking every day...

It should not as there is a cpuid check for avx support. It's just that this check seems broken under qemu under your setup.

@benz0li
Copy link
Contributor

benz0li commented Aug 4, 2023

It also impacts old computers that don't support AVX. But I guess the number of impacted machines is shrinking every day...

It should not as there is a cpuid check for avx support. It's just that this check seems broken under qemu under your setup.

@AndreasPK is right. This is a bug in WHPX acceleration (-accel whpx) with QEMU and only affects Windows hosts (x86_64).

@jgm
Copy link
Owner

jgm commented Aug 4, 2023

OK, I'm going to switch back to using more modern ghc versions, then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants