Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack smashing when building some packages #7456

Open
recursion-ninja opened this issue Jun 22, 2021 · 39 comments
Open

Stack smashing when building some packages #7456

recursion-ninja opened this issue Jun 22, 2021 · 39 comments

Comments

@recursion-ninja
Copy link
Contributor

recursion-ninja commented Jun 22, 2021

Describe the bug
When building some packages (text-show, mmark), the packages fail to build with a -6 error code. Some introspection of logs generated by cabal report "stack smashing," likely originating from either ghc or gcc.

To Reproduce
Steps to reproduce the behavior:

$ git clone https://github.com/recursion-ninja/cabal-build-failure-7456
$ cd cabal-build-failure-7456
$ git checkout e0007b2d7f8e7bf6455c930330a31bcc2f0c36fa
$ cabal build -v --with-compiler=ghc-9.0.1
### Expect a -6 error code
### Look for the file path of the text-show build log
$ tail ~/.cabal/logs/ghc-9.0.1/text-show-3.9-<some-hash-value>.log
[ 1 of 66] Compiling TextShow.Data.OldTypeable ( src/TextShow/Data/OldTypeable.hs, dist/build/TextShow/Data/OldTypeable.o, dist/build/TextShow/Data/OldTypeable.dyn_o )
[ 2 of 66] Compiling TextShow.GHC.Conc.Windows ( src/TextShow/GHC/Conc/Windows.hs, dist/build/TextShow/GHC/Conc/Windows.o, dist/build/TextShow/GHC/Conc/Windows.dyn_o )
[ 3 of 66] Compiling TextShow.GHC.Stats ( src/TextShow/GHC/Stats.hs, dist/build/TextShow/GHC/Stats.o, dist/build/TextShow/GHC/Stats.dyn_o )
[ 4 of 66] Compiling TextShow.Options ( src/TextShow/Options.hs, dist/build/TextShow/Options.o, dist/build/TextShow/Options.dyn_o )
[ 5 of 66] Compiling TextShow.TH.Names ( shared/TextShow/TH/Names.hs, dist/build/TextShow/TH/Names.o, dist/build/TextShow/TH/Names.dyn_o )
[ 6 of 66] Compiling TextShow.Utils   ( src/TextShow/Utils.hs, dist/build/TextShow/Utils.o, dist/build/TextShow/Utils.dyn_o )
[ 7 of 66] Compiling TextShow.Classes ( src/TextShow/Classes.hs, dist/build/TextShow/Classes.o, dist/build/TextShow/Classes.dyn_o )
[ 8 of 66] Compiling TextShow.TH.Internal ( src/TextShow/TH/Internal.hs, dist/build/TextShow/TH/Internal.o, dist/build/TextShow/TH/Internal.dyn_o )
[ 9 of 66] Compiling TextShow.Data.Tuple ( src/TextShow/Data/Tuple.hs, dist/build/TextShow/Data/Tuple.o, dist/build/TextShow/Data/Tuple.dyn_o )
*** stack smashing detected ***: terminated

Expected behavior
That text-show builds successfully.

System information

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.2 LTS
Release:	20.04
Codename:	focal
$ cabal --version
cabal-install version 3.4.0.0
compiled using version 3.4.0.0 of the Cabal library 
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 9.0.1
$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

Additional context
Moved from #7311

@Mikolaj
Copy link
Member

Mikolaj commented Jun 23, 2021

Amazing. No recent trace of "stack smashing" in relation to GHC on the web nor in source, except for https://gitlab.haskell.org/ghc/ghc/-/issues/16046#note_360930. Does it happen with 8.10 and 9.2, too?

@recursion-ninja
Copy link
Contributor Author

recursion-ninja commented Jun 23, 2021

@Mikolaj, I ran the example repository with the following ghc releases:

for ver in "9.0.1" "8.10.4" "8.8.4" "8.6.5" "8.4.4" "8.2.2" "8.0.2" "7.10.3"; do cabal build --with-compiler=ghc-$ver | tail -n 1; done
cabal: Failed to build text-show-3.9 (which is required by failing-0.0.0). The build process terminated with exit code -6
cabal: Failed to build text-show-3.9 (which is required by failing-0.0.0). The build process terminated with exit code -6
cabal: Failed to build text-show-3.9 (which is required by failing-0.0.0). The build process terminated with exit code -6
cabal: Failed to build text-show-3.9 (which is required by failing-0.0.0). The build process terminated with exit code -6
cabal: Failed to build text-show-3.9 (which is required by failing-0.0.0). The build process terminated with exit code -6
cabal: Failed to build text-show-3.9 (which is required by failing-0.0.0). The build process terminated with exit code -6
cabal: Failed to build text-show-3.9 (which is required by failing-0.0.0). The build process terminated with exit code -6
cabal: Failed to build text-show-3.9 (which is required by failing-0.0.0). The build process terminated with exit code -6

I also tested the ghc-9.2.0 release candidate, but base-compat had a compilation error, so the test was inconclusive:

cabal build --with-compiler=ghc-9.2.0.20210422 --allow-newer
Resolving dependencies...
Build profile: -w ghc-9.2.0.20210422 -O1
In order, the following will be built (use -v for more details):
 - base-compat-0.11.2 (lib) (requires build)
 - base-orphans-0.8.4 (lib) (requires build)
 - bytestring-builder-0.10.8.2.0 (lib) (requires build)
 - indexed-traversable-0.1.1 (lib) (requires build)
 - tagged-0.8.6.1 (lib) (requires build)
 - th-abstraction-0.4.2.0 (lib) (requires build)
 - transformers-compat-0.6.6 (lib) (requires build)
 - base-compat-batteries-0.11.2 (lib) (requires build)
 - distributive-0.6.2.1 (lib) (requires build)
 - th-lift-0.8.2 (lib) (requires build)
 - generic-deriving-1.14 (lib) (requires build)
 - comonad-5.0.8 (lib) (requires build)
 - bifunctors-5.5.11 (lib) (requires build)
 - text-show-3.9 (lib) (requires build)
 - failing-0.0.0 (lib) (first run)
Starting     base-orphans-0.8.4 (lib)
Starting     indexed-traversable-0.1.1 (lib)
Starting     bytestring-builder-0.10.8.2.0 (lib)
Starting     tagged-0.8.6.1 (lib)
Starting     transformers-compat-0.6.6 (lib)
Starting     th-abstraction-0.4.2.0 (lib)
Starting     base-compat-0.11.2 (lib)
Building     th-abstraction-0.4.2.0 (lib)
Building     indexed-traversable-0.1.1 (lib)
Building     base-orphans-0.8.4 (lib)
Building     bytestring-builder-0.10.8.2.0 (lib)
Building     transformers-compat-0.6.6 (lib)
Building     base-compat-0.11.2 (lib)
Building     tagged-0.8.6.1 (lib)
Installing   bytestring-builder-0.10.8.2.0 (lib)
Completed    bytestring-builder-0.10.8.2.0 (lib)
Installing   base-orphans-0.8.4 (lib)
Completed    base-orphans-0.8.4 (lib)
Installing   tagged-0.8.6.1 (lib)
Completed    tagged-0.8.6.1 (lib)
Installing   transformers-compat-0.6.6 (lib)
Completed    transformers-compat-0.6.6 (lib)
Installing   indexed-traversable-0.1.1 (lib)
Completed    indexed-traversable-0.1.1 (lib)
Installing   th-abstraction-0.4.2.0 (lib)
Completed    th-abstraction-0.4.2.0 (lib)

Failed to build base-compat-0.11.2.
Build log (
/home/washburn/.cabal/logs/ghc-9.2.0.20210422/base-compat-0.11.2-b856958f7a49838d97b17396af997221ce61ed53f61ef3e7967d99a698a5f106.log
):
Configuring library for base-compat-0.11.2..
Preprocessing library for base-compat-0.11.2..
Building library for base-compat-0.11.2..
<snip>
[ 63 of 118] Compiling Data.Semigroup.Compat ( src/Data/Semigroup/Compat.hs, dist/build/Data/Semigroup/Compat.o, dist/build/Data/Semigroup/Compat.dyn_o )

src/Data/Semigroup/Compat.hs:25:5: error:
    Not in scope: type constructor or class ‘Option’
   |
25 |   , Option(..)
   |     ^^^^^^^^^^

src/Data/Semigroup/Compat.hs:26:5: error: Not in scope: ‘option’
   |
26 |   , option
   |     ^^^^^^
cabal: Failed to build base-compat-0.11.2 (which is required by
failing-0.0.0). See the build log above for details.

@gbaz
Copy link
Collaborator

gbaz commented Jun 23, 2021

I don't know if this ticket should be in the cabal repo -- shouldn't it be reproducible in an appropriate environment with just e.g. ghc --make?

That said, one hunch may be that there's no particular regression in ghc, but that default gcc flags put in more guards now against certain patterns that ghc generates?

I note that text-show also has cbits, which may be related (especially in combination with TH). MMark doesn't seem to directly, but it has a fair number of transitive deps, so it depends where it fails, I suppose?

@Mikolaj
Copy link
Member

Mikolaj commented Jun 23, 2021

A great idea to try and extract ghc --make with proper arguments from the logs and filing as a GHC ticket, if it succeeds.

Regarding GHC 9.2, not sure if you are using https://ghc.gitlab.haskell.org/head.hackage

@recursion-ninja
Copy link
Contributor Author

I added the reference to head.hackage to the cabal.project.local file and text-show failed to build with the same -6 error code.

@recursion-ninja
Copy link
Contributor Author

recursion-ninja commented Jun 23, 2021

@gbaz, I attempted to isolate the issue with ghc --make as you suggested. I performed the following:

$ git clone https://github.com/RyanGlScott/text-show
$ cd text-show
$ cabal build . --allow-newer
$ mkdir -p dist/build/autogen
$ wget -P dist/build/autogen https://gitlab.scss.tcd.ie/hwarren/Software-Engineering/-/raw/117c96fd62f69f918375f59fcb5261a17d161c52/LCA-problem/.stack-work/dist/ca59d0ab/build/autogen/cabal_macros.h
$ /opt/ghc/bin/ghc-pkg-9.2.0.20210422 init dist/package.conf.inplace -v2
$ /opt/ghc/bin/ghc-9.2.0.20210422 --make -fbuilding-cabal-package -O -static -dynamic-too -dynosuf dyn_o -dynhisuf dyn_hi -outputdir dist/build -odir dist/build -hidir dist/build -stubdir dist/build -i -idist/build -isrc -ishared -idist/build/autogen -idist/build/global-autogen -Idist/build/autogen -Idist/build/global-autogen -Idist/build -Iinclude -Idist/build/include -optP-DNEW_FUNCTOR_CLASSES -optP-DNEW_FUNCTOR_CLASSES -optP-include -optPdist/build/autogen/cabal_macros.h -this-unit-id text-show-3.9-8fd643f614a4704a039416bcbb030fcb4d5a85bd923175650cce4ec5c2ef78cd -hide-all-packages -no-user-package-db -package-db /home/washburn/.cabal/store/ghc-9.2.0.20210422/package.db -package-db dist/package.conf.inplace -package-id array-0.5.4.0 -package-id base-4.16.0.0 -package-id base-compat-batteries-0.11.2-2d2e20f583262e5d46bffb8e916672c41856a1bf2f1bca2ea2fbffb0d4154eb2 -package-id bifunctors-5.5.11-8eed96c5d546353844e8a7dc7683ace2ad279e02231fa97c67cc76632b51605e -package-id bytestring-0.11.1.0 -package-id bytestring-builder-0.10.8.2.0-43380362051eaa6d5672945e48831e2427750f4925c37b2e1feb678137f83c4a -package-id containers-0.6.4.1 -package-id generic-deriving-1.14-215c1c32c0075030d223ca2e84d0da694650e7f839657c46bc440efb925f5265 -package-id ghc-boot-th-9.2.0.20210422 -package-id ghc-prim-0.8.0 -package-id integer-gmp-1.1 -package-id template-haskell-2.18.0.0 -package-id text-1.2.4.2 -package-id th-abstraction-0.4.2.0-0938d9eef6e8af848b09b2e6af35756298a3949c3e095ef28c47149a1864e98a -package-id th-lift-0.8.2-170640c603f00034203a57d1c28ab498f6b9e559df134c887194842a62b1cea1 -package-id transformers-0.5.6.2 -package-id transformers-compat-0.6.6-d45e2b0821499c6f4b71cf9bc368cee1200be5dabbe99db3af690d47ce471e5f -XHaskell2010 src/TextShow src/TextShow.Control.Applicative src/TextShow.Control.Concurrent src/TextShow.Control.Exception src/TextShow.Control.Monad.ST src/TextShow.Data.Array src/TextShow.Data.Bool src/TextShow.Data.ByteString src/TextShow.Data.Char src/TextShow.Data.Complex src/TextShow.Data.Data src/TextShow.Data.Dynamic src/TextShow.Data.Either src/TextShow.Data.Fixed src/TextShow.Data.Floating src/TextShow.Data.Functor.Compose src/TextShow.Data.Functor.Identity src/TextShow.Data.Functor.Product src/TextShow.Data.Functor.Sum src/TextShow.Debug.Trace src/TextShow.Debug.Trace.Generic src/TextShow.Debug.Trace.TH src/TextShow.Generic src/TextShow.Data.Integral src/TextShow.Data.List src/TextShow.Data.List.NonEmpty src/TextShow.Data.Maybe src/TextShow.Data.Monoid src/TextShow.Data.Ord src/TextShow.Data.Proxy src/TextShow.Data.Ratio src/TextShow.Data.Semigroup src/TextShow.Data.Text src/TextShow.Data.Tuple src/TextShow.Data.Typeable src/TextShow.Data.Version src/TextShow.Data.Void src/TextShow.Foreign.C.Types src/TextShow.Foreign.Ptr src/TextShow.Functions src/TextShow.GHC.Fingerprint src/TextShow.GHC.Generics src/TextShow.GHC.Stats src/TextShow.Numeric.Natural src/TextShow.System.Exit src/TextShow.System.IO src/TextShow.System.Posix.Types src/TextShow.Text.Read src/TextShow.TH src/TextShow.GHC.Conc.Windows src/TextShow.GHC.Event src/TextShow.GHC.TypeLits src/TextShow.Data.Type.Coercion src/TextShow.Data.Type.Equality src/TextShow.Data.OldTypeable src/TextShow.GHC.RTS.Flags src/TextShow.GHC.StaticPtr src/TextShow.GHC.Stack src/TextShow.Classes src/TextShow.Data.Typeable.Utils src/TextShow.FromStringTextShow src/TextShow.Instances src/TextShow.Options src/TextShow.TH.Internal src/TextShow.TH.Names src/TextShow.Utils -Wall -Wno-star-is-type -hide-all-packages

I got the following output:

< snip >
< some warnings related to cabal_macros.h of the form: >
< warning: "MACRO_NAME_HERE" redefined >
< snip >
[ 1 of 60] Compiling TextShow.Data.OldTypeable ( src/TextShow/Data/OldTypeable.hs, dist/build/TextShow/Data/OldTypeable.o, dist/build/TextShow/Data/OldTypeable.dyn_o )
[ 2 of 60] Compiling TextShow.GHC.Conc.Windows ( src/TextShow/GHC/Conc/Windows.hs, dist/build/TextShow/GHC/Conc/Windows.o, dist/build/TextShow/GHC/Conc/Windows.dyn_o )
[ 3 of 60] Compiling TextShow.GHC.Stats ( src/TextShow/GHC/Stats.hs, dist/build/TextShow/GHC/Stats.o, dist/build/TextShow/GHC/Stats.dyn_o )
[ 4 of 60] Compiling TextShow.Options ( src/TextShow/Options.hs, dist/build/TextShow/Options.o, dist/build/TextShow/Options.dyn_o )
[ 5 of 60] Compiling TextShow.TH.Names ( shared/TextShow/TH/Names.hs, dist/build/TextShow/TH/Names.o, dist/build/TextShow/TH/Names.dyn_o )
[ 6 of 60] Compiling TextShow.Utils   ( src/TextShow/Utils.hs, dist/build/TextShow/Utils.o, dist/build/TextShow/Utils.dyn_o )
[ 7 of 60] Compiling TextShow.Classes ( src/TextShow/Classes.hs, dist/build/TextShow/Classes.o, dist/build/TextShow/Classes.dyn_o )
[ 8 of 60] Compiling TextShow.TH.Internal ( src/TextShow/TH/Internal.hs, dist/build/TextShow/TH/Internal.o, dist/build/TextShow/TH/Internal.dyn_o )
[ 9 of 60] Compiling TextShow.Data.Tuple ( src/TextShow/Data/Tuple.hs, dist/build/TextShow/Data/Tuple.o, dist/build/TextShow/Data/Tuple.dyn_o )
*** stack smashing detected ***: terminated
Aborted (core dumped)

I interpret the outcome of this to mean that this is not a cabal issue, but a ghc issue. Is that correct?

@Mikolaj
Copy link
Member

Mikolaj commented Jun 23, 2021

Cabal may be inventing wildly incorrect parameters to ghc, but even then, ghc should probably fail gracefully, not throw a fit.

Are the six lines enough to repro your example or does it depend on some context? If so, I may try to repro again, even though my ancient Ubuntu has gcc 5.4.0-6ubuntu1~16.04.12.

@recursion-ninja
Copy link
Contributor Author

@Mikolaj I believe that the 6 lines above are sufficient to reproduce the error. I can only be certain that it is reproducible on my machine.

@Mikolaj
Copy link
Member

Mikolaj commented Jun 23, 2021

I'm getting

<command line>: cannot satisfy -package-id base-compat-batteries-0.11.2-2d2e20f583262e5d46bffb8e916672c41856a1bf2f1bca2ea2fbffb0d4154eb2

so probably a few installed packages and their hashes are a necessary context. The context can be created by building the packages and copying their hashes, but I've not made the effort at this time --- with my old gcc it would probably again go through fine. I guess best chances are with the same version of gcc or at least a new one.

@recursion-ninja
Copy link
Contributor Author

@Mikolaj I tried with gcc-7, the oldest version of gcc I have and I still got the -6 exit code and stack smashing warning.

I guess that the hashes being different makes sense. I did copy the whole monolithic command from the logs generated by cabal. maybe the build plan wasn't exactly the same as whatever you have in your package-db.

@Mikolaj
Copy link
Member

Mikolaj commented Jun 23, 2021

I had to add step 3

cabal build -j4 . --allow-newer

to have any base-compat-batteries built at all, but after it appeared, it got a different hash:

base-compat-batteries-0.11.2-5545666821d3122aa6ad8036f381b36d5f75ad1bf60a04bd36b898a577c70cf9

so I guess I'd need to copy over such hashes from my ~/.cabal/store/ghc-9.2.0.20210331. Not sure if there's an easier way.

@recursion-ninja
Copy link
Contributor Author

recursion-ninja commented Jun 24, 2021

I updated the steps above the to include the invocation of cabal build.

@Mikolaj
Copy link
Member

Mikolaj commented Jun 24, 2021

A GHC guru friend suggests "you should be able to infer which executable aborted from the core dump?".

@jneira

This comment has been minimized.

@recursion-ninja
Copy link
Contributor Author

@jneira surely you jest!

@recursion-ninja
Copy link
Contributor Author

recursion-ninja commented Jun 24, 2021

@Mikolaj Does your guru friend mean I should attempt to disentangle the "stack smashing" error from the Haskell build stack; determine if it is thrown by ghc, gcc, ld, etc?

@Mikolaj
Copy link
Member

Mikolaj commented Jun 25, 2021

@recursion-ninja: yes, I understand that's what he'd do at that point. I guess, it would also be helpful to obtain the precise arguments of the implicated tool's invocation that resulted in the crash.

@recursion-ninja
Copy link
Contributor Author

recursion-ninja commented Jul 8, 2021

Looks like it's coming from either gcc or ld based on this verbose output from ghc:

Invocation

(Note the -v3 flag on the ghc invocation)

$ git clone https://github.com/RyanGlScott/text-show
$ cd text-show
$ cabal build . --allow-newer
$ mkdir -p dist/build/autogen
$ wget -P dist/build/autogen https://gitlab.scss.tcd.ie/hwarren/Software-Engineering/-/raw/117c96fd62f69f918375f59fcb5261a17d161c52/LCA-problem/.stack-work/dist/ca59d0ab/build/autogen/cabal_macros.h
$ /opt/ghc/bin/ghc-pkg-9.2.0.20210422 init dist/package.conf.inplace -v2
$ /opt/ghc/bin/ghc-9.2.0.20210422 --make -v3 -fbuilding-cabal-package -O -static -dynamic-too -dynosuf dyn_o -dynhisuf dyn_hi -outputdir dist/build -odir dist/build -hidir dist/build -stubdir dist/build -i -idist/build -isrc -ishared -idist/build/autogen -idist/build/global-autogen -Idist/build/autogen -Idist/build/global-autogen -Idist/build -Iinclude -Idist/build/include -optP-DNEW_FUNCTOR_CLASSES -optP-DNEW_FUNCTOR_CLASSES -optP-include -optPdist/build/autogen/cabal_macros.h -this-unit-id text-show-3.9-8fd643f614a4704a039416bcbb030fcb4d5a85bd923175650cce4ec5c2ef78cd -hide-all-packages -no-user-package-db -package-db /home/washburn/.cabal/store/ghc-9.2.0.20210422/package.db -package-db dist/package.conf.inplace -package-id array-0.5.4.0 -package-id base-4.16.0.0 -package-id base-compat-batteries-0.11.2-2d2e20f583262e5d46bffb8e916672c41856a1bf2f1bca2ea2fbffb0d4154eb2 -package-id bifunctors-5.5.11-8eed96c5d546353844e8a7dc7683ace2ad279e02231fa97c67cc76632b51605e -package-id bytestring-0.11.1.0 -package-id bytestring-builder-0.10.8.2.0-43380362051eaa6d5672945e48831e2427750f4925c37b2e1feb678137f83c4a -package-id containers-0.6.4.1 -package-id generic-deriving-1.14-215c1c32c0075030d223ca2e84d0da694650e7f839657c46bc440efb925f5265 -package-id ghc-boot-th-9.2.0.20210422 -package-id ghc-prim-0.8.0 -package-id integer-gmp-1.1 -package-id template-haskell-2.18.0.0 -package-id text-1.2.4.2 -package-id th-abstraction-0.4.2.0-0938d9eef6e8af848b09b2e6af35756298a3949c3e095ef28c47149a1864e98a -package-id th-lift-0.8.2-170640c603f00034203a57d1c28ab498f6b9e559df134c887194842a62b1cea1 -package-id transformers-0.5.6.2 -package-id transformers-compat-0.6.6-d45e2b0821499c6f4b71cf9bc368cee1200be5dabbe99db3af690d47ce471e5f -XHaskell2010 src/TextShow src/TextShow.Control.Applicative src/TextShow.Control.Concurrent src/TextShow.Control.Exception src/TextShow.Control.Monad.ST src/TextShow.Data.Array src/TextShow.Data.Bool src/TextShow.Data.ByteString src/TextShow.Data.Char src/TextShow.Data.Complex src/TextShow.Data.Data src/TextShow.Data.Dynamic src/TextShow.Data.Either src/TextShow.Data.Fixed src/TextShow.Data.Floating src/TextShow.Data.Functor.Compose src/TextShow.Data.Functor.Identity src/TextShow.Data.Functor.Product src/TextShow.Data.Functor.Sum src/TextShow.Debug.Trace src/TextShow.Debug.Trace.Generic src/TextShow.Debug.Trace.TH src/TextShow.Generic src/TextShow.Data.Integral src/TextShow.Data.List src/TextShow.Data.List.NonEmpty src/TextShow.Data.Maybe src/TextShow.Data.Monoid src/TextShow.Data.Ord src/TextShow.Data.Proxy src/TextShow.Data.Ratio src/TextShow.Data.Semigroup src/TextShow.Data.Text src/TextShow.Data.Tuple src/TextShow.Data.Typeable src/TextShow.Data.Version src/TextShow.Data.Void src/TextShow.Foreign.C.Types src/TextShow.Foreign.Ptr src/TextShow.Functions src/TextShow.GHC.Fingerprint src/TextShow.GHC.Generics src/TextShow.GHC.Stats src/TextShow.Numeric.Natural src/TextShow.System.Exit src/TextShow.System.IO src/TextShow.System.Posix.Types src/TextShow.Text.Read src/TextShow.TH src/TextShow.GHC.Conc.Windows src/TextShow.GHC.Event src/TextShow.GHC.TypeLits src/TextShow.Data.Type.Coercion src/TextShow.Data.Type.Equality src/TextShow.Data.OldTypeable src/TextShow.GHC.RTS.Flags src/TextShow.GHC.StaticPtr src/TextShow.GHC.Stack src/TextShow.Classes src/TextShow.Data.Typeable.Utils src/TextShow.FromStringTextShow src/TextShow.Instances src/TextShow.Options src/TextShow.TH.Internal src/TextShow.TH.Names src/TextShow.Utils -Wall -Wno-star-is-type -hide-all-packages

Output

Loading package time-1.9.3 ... linking ... done.
*** systool:linker:
*** gcc:
gcc '-fuse-ld=gold' -B/home/washburn/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/unix-2.7.2.2 --print-file-name librt.so
!!! systool:linker: finished in 0.20 milliseconds, allocated 0.064 megabytes
*** systool:linker:
*** gcc:
gcc '-fuse-ld=gold' -B/home/washburn/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/unix-2.7.2.2 --print-file-name libutil.so
!!! systool:linker: finished in 0.14 milliseconds, allocated 0.064 megabytes
*** systool:linker:
*** gcc:
gcc '-fuse-ld=gold' -B/home/washburn/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/unix-2.7.2.2 --print-file-name libdl.so
!!! systool:linker: finished in 0.14 milliseconds, allocated 0.064 megabytes
*** systool:linker:
*** gcc:
gcc '-fuse-ld=gold' -B/home/washburn/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/unix-2.7.2.2 --print-file-name libpthread.so
!!! systool:linker: finished in 0.14 milliseconds, allocated 0.064 megabytes
Loading package unix-2.7.2.2 ... *** stack smashing detected ***: terminated
Aborted (core dumped)

Perhaps there is some issue with the pthread system library or the way the Haskell unix package is being installed?

@Mikolaj
Copy link
Member

Mikolaj commented Jul 8, 2021

Oh, great. So perhaps it's the gold linker bug? I know there are quite a few. Does it happen when you switch ld.gold for ld (or ld.bfd, there are various ways to do that)? What's the version of the gold linker that you use?

@recursion-ninja
Copy link
Contributor Author

$ gold --version
GNU gold (GNU Binutils for Ubuntu 2.34) 1.16
Copyright (C) 2020 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
$ ld --version
GNU ld (GNU Binutils for Ubuntu) 2.34
Copyright (C) 2020 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.

@recursion-ninja
Copy link
Contributor Author

I added -optl-fuse-ld=lld to the suffix of the ghc invocation and was met with this result:

Loading package time-1.9.3 ... linking ... done.
*** systool:linker:
*** gcc:
gcc '-fuse-ld=gold' '-fuse-ld=lld' -B/home/washburn/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/unix-2.7.2.2 --print-file-name librt.so
!!! systool:linker: finished in 0.25 milliseconds, allocated 0.067 megabytes
systool:linker: alloc=70264 time=0.253
*** systool:linker:
*** gcc:
gcc '-fuse-ld=gold' '-fuse-ld=lld' -B/home/washburn/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/unix-2.7.2.2 --print-file-name libutil.so
!!! systool:linker: finished in 0.18 milliseconds, allocated 0.067 megabytes
systool:linker: alloc=70408 time=0.182
*** systool:linker:
*** gcc:
gcc '-fuse-ld=gold' '-fuse-ld=lld' -B/home/washburn/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/unix-2.7.2.2 --print-file-name libdl.so
!!! systool:linker: finished in 0.27 milliseconds, allocated 0.067 megabytes
systool:linker: alloc=70264 time=0.271
*** systool:linker:
*** gcc:
gcc '-fuse-ld=gold' '-fuse-ld=lld' -B/home/washburn/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/unix-2.7.2.2 --print-file-name libpthread.so
!!! systool:linker: finished in 0.23 milliseconds, allocated 0.067 megabytes
systool:linker: alloc=70640 time=0.229
Loading package unix-2.7.2.2 ... *** stack smashing detected ***: terminated
Aborted (core dumped)

I'm not sure that it actually switched linkers. Any advise on passing this to ghc?

@Mikolaj
Copy link
Member

Mikolaj commented Jul 9, 2021

From GHC gurus: "lld is the llvm linker, I think you want ld?"
From me: on my ancient Ubuntu it's also called ld.bfd, as I mentioned above.

@recursion-ninja
Copy link
Contributor Author

Unfortunately, the same results with -optl-fuse-ld=bfd

Loading package time-1.9.3 ... linking ... done.
*** systool:linker:
*** gcc:
gcc '-fuse-ld=gold' '-fuse-ld=bfd' -B/home/washburn/.cabal/store/ghc-9.0.1/unix-2.8.0.0-8276f5e709556e7c8d723731fa61d5b2aa6e8eec36238b2069b2feb47d1ad590/lib --print-file-name librt.so
!!! systool:linker: finished in 0.25 milliseconds, allocated 0.072 megabytes
systool:linker: alloc=74976 time=0.251
*** systool:linker:
*** gcc:
gcc '-fuse-ld=gold' '-fuse-ld=bfd' -B/home/washburn/.cabal/store/ghc-9.0.1/unix-2.8.0.0-8276f5e709556e7c8d723731fa61d5b2aa6e8eec36238b2069b2feb47d1ad590/lib --print-file-name libutil.so
!!! systool:linker: finished in 0.17 milliseconds, allocated 0.072 megabytes
systool:linker: alloc=75120 time=0.170
*** systool:linker:
*** gcc:
gcc '-fuse-ld=gold' '-fuse-ld=bfd' -B/home/washburn/.cabal/store/ghc-9.0.1/unix-2.8.0.0-8276f5e709556e7c8d723731fa61d5b2aa6e8eec36238b2069b2feb47d1ad590/lib --print-file-name libdl.so
!!! systool:linker: finished in 0.22 milliseconds, allocated 0.072 megabytes
systool:linker: alloc=74976 time=0.219
*** systool:linker:
*** gcc:
gcc '-fuse-ld=gold' '-fuse-ld=bfd' -B/home/washburn/.cabal/store/ghc-9.0.1/unix-2.8.0.0-8276f5e709556e7c8d723731fa61d5b2aa6e8eec36238b2069b2feb47d1ad590/lib --print-file-name libpthread.so
!!! systool:linker: finished in 0.18 milliseconds, allocated 0.072 megabytes
systool:linker: alloc=75352 time=0.176
Loading package unix-2.8.0.0 ... *** stack smashing detected ***: terminated
Aborted (core dumped)

@phadej
Copy link
Collaborator

phadej commented Jul 9, 2021

  • Is the text-show the only package failing to build?
  • Could you try e.g. generics-sop, (which also uses TemplateHaskell)?
  • What kind of machine you have, is it ordinary 64bit Intel?
  • Can you reproduce your issue in a clean docker container, e.g. phadej/ghc:9.0.1-focal (I cannot, it has the same system information as in your reproducer notes - though GHC is from hvr-ppa, not the official bindists, they may differ)?
  • How you installed GHC, are all of these official bindists, they are not built for focal, which one do you use?

I suspect that your system is somehow configured to be quite paranoid, and doesn't like something GHC does.

@recursion-ninja
Copy link
Contributor Author

Is the text-show the only package failing to build?

@phadej It is far more pervasive than just text-show. Consider the following Haskell module Test.hs:

module Test where
import Control.Monad.Loops
$ rm -R ~/.cabal/store/ghc-9.0.1
$ rm -R ~/.ghc/x86_64-linux-9.0.1/environments/default
$ ghci Test.hs 
GHCi, version 9.0.1: https://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from ~/.ghci
[1 of 1] Compiling Data.Test        ( Test.hs, interpreted )

Test.hs:4:1: error:
    Could not find module ‘Control.Monad.Loops’
    Perhaps you meant
      Control.Monad.Cont (from mtl-2.2.2)
      Control.Monad.List (from mtl-2.2.2)
      Control.Monad.Trans (from mtl-2.2.2)
    Use -v (or `:set -v` in ghci) to see a list of the files searched for.
  |
2 | import Control.Monad.Loops
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^
Failed, no modules loaded.

$ cabal update
$ cabal install monad-loops --lib
Resolving dependencies...
Build profile: -w ghc-9.0.1 -O1
In order, the following will be built (use -v for more details):
 - monad-loops-0.4.3 (lib) (requires build)
Starting     monad-loops-0.4.3 (lib)
Building     monad-loops-0.4.3 (lib)
Installing   monad-loops-0.4.3 (lib)
Completed    monad-loops-0.4.3 (lib)

$ ghci Test.hs 
Loaded package environment from /home/washburn/.ghc/x86_64-linux-9.0.1/environments/default
GHCi, version 9.0.1: https://www.haskell.org/ghc/  :? for help
*** stack smashing detected ***: terminated
Aborted (core dumped)

What kind of machine you have, is it ordinary 64bit Intel?

$ lshw | sed -n 26,37p
     *-cpu                  
          description: CPU
          product: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
          vendor: Intel Corp.
          physical id: 4
          bus info: cpu@0
          version: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
          slot: CPU0
          size: 3270MHz
          capacity: 4900MHz
          width: 64 bits
          clock: 100MHz

How you installed GHC, are all of these official bindists, they are not built for focal, which one do you use?

I used ghcup, I am not sure if the supplied binaries are from the official bindists.

$ ghcup install ghc 9.0.1
$ ghcup set ghc 9.0.1

Can you reproduce your issue in a clean docker container, e.g. phadej/ghc:9.0.1-focal (I cannot, it has the same system information as in your reproducer notes - though GHC is from hvr-ppa, not the official bindists, they may differ)?

I can try to do this tomorrow if you still think it is worthwhile.

@Mikolaj
Copy link
Member

Mikolaj commented Jul 12, 2021

I can try to do this tomorrow if you still think it is worthwhile.

It totally is, because we need GHC devs to be able to repro and nobody still can, except you. However, the test you outlined right now seems easier to do than the one involving copying hashes, so I will attempt to repro today. If I fail (that is, can't repro), the container may be the only way to either let GHC devs repro or let you troubleshoot your setup (and, e.g., find what version of what bin util in your system is buggy, if that's the case, or which version causes GHC to bug out due to incompatibility that needs to be fixed in GHC).

@phadej
Copy link
Collaborator

phadej commented Jul 12, 2021

Thanks for the additional info. SO you have common hardware and ghcup uses official bindists.

I cannot reproduce with my docker image. It uses hvr-ppa bindists but I'm quite sure it's not the reason.
Please try the below on your machine.

If it fails on your docker, it would be very interesting!

% docker run --rm -ti phadej/ghc:9.0.1-focal /bin/bash
root@2b9acddb1f78:/# 
root@2b9acddb1f78:/# cd
root@2b9acddb1f78:~# cabal update
Config file path source is default config file.
Config file /root/.cabal/config not found.
Writing default configuration to /root/.cabal/config
Downloading the latest package list from hackage.haskell.org
Updated package list of hackage.haskell.org to the index-state 2021-07-12T08:44:51Z
root@2b9acddb1f78:~# cat > Test.hs
module Test where
import Control.Monad.Loops
root@2b9acddb1f78:~# cat Test.hs 
module Test where
import Control.Monad.Loops
root@2b9acddb1f78:~# ghci Test.hs 
GHCi, version 9.0.1: https://www.haskell.org/ghc/  :? for help
[1 of 1] Compiling Test             ( Test.hs, interpreted )

Test.hs:2:1: error:
    Could not find module ‘Control.Monad.Loops’
    Perhaps you meant
      Control.Monad.Cont (from mtl-2.2.2)
      Control.Monad.List (from mtl-2.2.2)
      Control.Monad.Trans (from mtl-2.2.2)
    Use -v (or `:set -v` in ghci) to see a list of the files searched for.
  |
2 | import Control.Monad.Loops
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^
Failed, no modules loaded.
ghci> 
Leaving GHCi.
root@2b9acddb1f78:~# cabal install monad-loops --lib
Resolving dependencies...
Build profile: -w ghc-9.0.1 -O1
In order, the following will be built (use -v for more details):
 - monad-loops-0.4.3 (lib) (requires download & build)
Downloading  monad-loops-0.4.3
Downloaded   monad-loops-0.4.3
Starting     monad-loops-0.4.3 (lib)
Building     monad-loops-0.4.3 (lib)
Installing   monad-loops-0.4.3 (lib)
Completed    monad-loops-0.4.3 (lib)
root@2b9acddb1f78:~# ghci Test.hs 
Loaded package environment from /root/.ghc/x86_64-linux-9.0.1/environments/default
GHCi, version 9.0.1: https://www.haskell.org/ghc/  :? for help
[1 of 1] Compiling Test             ( Test.hs, interpreted )
Ok, one module loaded.
ghci> 

GHCi failing is probably the same reason why Template Haskell fails, i.e. something dynamic linking related.

@Mikolaj
Copy link
Member

Mikolaj commented Jul 12, 2021

I've just repeated @phadej's trascript both in docker and in my own OS and got the same results (can't repro the original issue).

@recursion-ninja
Copy link
Contributor Author

@phadej , I have not used Docker before, can you tell me how to clone your image to try and replicate the results?

@Mikolaj
Copy link
Member

Mikolaj commented Jul 12, 2021

The first line, docker run --rm -ti phadej/ghc:9.0.1-focal /bin/bash clones the docker image. Just install the docker package in your Ubuntu. https://www.docker.com/community-edition

@recursion-ninja
Copy link
Contributor Author

@Mikolaj , @phadej , I was able to get docker working! When I replicated the steps to reproduce the stack smashing error that occurs on my machine, the docker container did not exhibit the stack smashing behavior. I was able to load ghci successfully in the image.

Any ideas on how to rehabilitate my machine?

@Mikolaj
Copy link
Member

Mikolaj commented Jul 12, 2021

That may still be a GHC bug, e.g., one that manifests only with new enough C toolchain libraries. But it may as well be a corrupted file in your filesystem.

Do you have another partition or can you put another hard drive into your machine? If so, you can try to install Ubuntu 20.04. Or even run it from a live Ubuntu DVD and try to repro from that. This would sort of bisect the problem space. If it works, the issue may be written off as a fluke and you can manually reinstall all apt packages (there are commands for that) and it should vanish, and if not, configurations possibly need to be wiped out. Another possibility is to create a new user and repro. Just find ways to ignore portions of your current hard drive and try to repro.

@Mikolaj
Copy link
Member

Mikolaj commented Jul 12, 2021

I mean, it may be a GHC bug if, e.g., one of your C toolset libraries are newer than on the docker image --- if you started from exactly the same version of OS and never upgraded, this can't be the cause.

@recursion-ninja
Copy link
Contributor Author

recursion-ninja commented Jul 12, 2021

@Mikolaj,
I think I have the same version of gcc:

My Machine:

$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Docker Container

$ docker run --rm -ti phadej/ghc:9.0.1-focal /bin/bash
root@650eec3f6ab7:/# gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@Mistuke
Copy link
Collaborator

Mistuke commented Jul 25, 2021

Amazing. No recent trace of "stack smashing" in relation to GHC on the web nor in source, except for https://gitlab.haskell.org/ghc/ghc/-/issues/16046#note_360930. Does it happen with 8.10 and 9.2, too?

This is a very old feature of GCC to detect corrupted stacks, see -fstack-protector [1]

The mechanism it works on is quite simple, you place a canary value on the stack and before returning from function (as you're busy cleaning up the stack) you check if the canary is still there and has the correct value. See https://godbolt.org/z/rW74EzeKE

This error means something along the way has corrupted the stack. Distros such as Ubuntu and Redhat have started enabling more and more security features of GCC by default, Ubuntu has enabled it with 20.04 and newer. This error is likely being generated by the libc which would have been compiled with security mitigations in place.

$ ghci Test.hs 
Loaded package environment from /home/washburn/.ghc/x86_64-linux-9.0.1/environments/default
GHCi, version 9.0.1: https://www.haskell.org/ghc/  :? for help
*** stack smashing detected ***: terminated

I strongly suspect GHC's dynamic linker has a bug here and has incorrectly overridden the stack. This is commonly caused by a calling convention issue.

just set

ulimit -c unlimited

run the program let it generate a core dump. Find out who's at fault by running

file <coredump>

which will say something like

core:     ELF 64-bit MSB core file <Arch> Version 1, from '<program>'

then open the coredump

gdb <program> <coredump>
bt full

which will say where it went wrong. But I suspect you have a GHC bug.

[1] https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html

@recursion-ninja
Copy link
Contributor Author

recursion-ninja commented Jul 26, 2021

@Mistuke I have tried to follow your instructions, but I believe that I have missed something near the end.

$ systemctl enable apport.service # Because Ubuntu
$ ulimit -c unlimited
$ git clone https://github.com/RyanGlScott/text-show
$ cd text-show
$ cabal build . # For dependencies
$ /home/recursion-ninja/.ghcup/bin/ghc --make -fbuilding-cabal-package -O -static -dynamic-too -dynosuf dyn_o -dynhisuf dyn_hi -outputdir /home/recursion-ninja/Code/text-show/dist-newstyle/build/x86_64-linux/ghc-9.0.1/text-show-3.9/build -odir /home/recursion-ninja/Code/text-show/dist-newstyle/build/x86_64-linux/ghc-9.0.1/text-show-3.9/build -hidir /home/recursion-ninja/Code/text-show/dist-newstyle/build/x86_64-linux/ghc-9.0.1/text-show-3.9/build -stubdir /home/recursion-ninja/Code/text-show/dist-newstyle/build/x86_64-linux/ghc-9.0.1/text-show-3.9/build -i -i/home/recursion-ninja/Code/text-show/dist-newstyle/build/x86_64-linux/ghc-9.0.1/text-show-3.9/build -isrc -ishared -i/home/recursion-ninja/Code/text-show/dist-newstyle/build/x86_64-linux/ghc-9.0.1/text-show-3.9/build/autogen -i/home/recursion-ninja/Code/text-show/dist-newstyle/build/x86_64-linux/ghc-9.0.1/text-show-3.9/build/global-autogen -I/home/recursion-ninja/Code/text-show/dist-newstyle/build/x86_64-linux/ghc-9.0.1/text-show-3.9/build/autogen -I/home/recursion-ninja/Code/text-show/dist-newstyle/build/x86_64-linux/ghc-9.0.1/text-show-3.9/build/global-autogen -I/home/recursion-ninja/Code/text-show/dist-newstyle/build/x86_64-linux/ghc-9.0.1/text-show-3.9/build -Iinclude -I/home/recursion-ninja/Code/text-show/dist-newstyle/build/x86_64-linux/ghc-9.0.1/text-show-3.9/build/include -optP-DNEW_FUNCTOR_CLASSES -optP-DNEW_FUNCTOR_CLASSES -optP-include -optP/home/recursion-ninja/Code/text-show/dist-newstyle/build/x86_64-linux/ghc-9.0.1/text-show-3.9/build/autogen/cabal_macros.h -this-unit-id text-show-3.9-inplace -hide-all-packages -Wmissing-home-modules -no-user-package-db -package-db /home/recursion-ninja/.cabal/store/ghc-9.0.1/package.db -package-db /home/recursion-ninja/Code/text-show/dist-newstyle/packagedb/ghc-9.0.1 -package-db /home/recursion-ninja/Code/text-show/dist-newstyle/build/x86_64-linux/ghc-9.0.1/text-show-3.9/package.conf.inplace -package-id array-0.5.4.0 -package-id base-4.15.0.0 -package-id base-compat-batteries-0.11.2-dc2f9df945acedea868b07c7137de52d26b56e86904a6943da4a304170856265 -package-id bifunctors-5.5.11-2a3841d53457ffc1384c87930764a68d23b0847db853b89257d3078136df7f30 -package-id bytestring-0.10.12.1 -package-id bytestring-builder-0.10.8.2.0-f1fc0db34ded57ac80a7a16014cd0244372b73c4501aa1b8f94948da8ee98daa -package-id containers-0.6.4.1 -package-id generic-deriving-1.14-60cf634e0d5e72a427899f654aec208994b91f68d01fb801568599238e6072e6 -package-id ghc-boot-th-9.0.1 -package-id ghc-prim-0.7.0 -package-id integer-gmp-1.1 -package-id template-haskell-2.17.0.0 -package-id text-1.2.4.1 -package-id th-abstraction-0.4.2.0-97bfd77d8a680b19c286c078730c0f3ae84c5d9a6ac20db1acc36cdd9250e6d4 -package-id th-lift-0.8.2-0fa84c263173fc179670fe14cfd1a7e9068a8c3619afca23dfafbdaecc45d44b -package-id transformers-0.5.6.2 -package-id transformers-compat-0.6.6-5b8dcd1799d8b869b303cd54c9e5f8e0e781d80c42b5cdde1476265310540ee2 -XHaskell2010 TextShow TextShow.Control.Applicative TextShow.Control.Concurrent TextShow.Control.Exception TextShow.Control.Monad.ST TextShow.Data.Array TextShow.Data.Bool TextShow.Data.ByteString TextShow.Data.Char TextShow.Data.Complex TextShow.Data.Data TextShow.Data.Dynamic TextShow.Data.Either TextShow.Data.Fixed TextShow.Data.Floating TextShow.Data.Functor.Compose TextShow.Data.Functor.Identity TextShow.Data.Functor.Product TextShow.Data.Functor.Sum TextShow.Debug.Trace TextShow.Debug.Trace.Generic TextShow.Debug.Trace.TH TextShow.Generic TextShow.Data.Integral TextShow.Data.List TextShow.Data.List.NonEmpty TextShow.Data.Maybe TextShow.Data.Monoid TextShow.Data.Ord TextShow.Data.Proxy TextShow.Data.Ratio TextShow.Data.Semigroup TextShow.Data.Text TextShow.Data.Tuple TextShow.Data.Typeable TextShow.Data.Version TextShow.Data.Void TextShow.Foreign.C.Types TextShow.Foreign.Ptr TextShow.Functions TextShow.GHC.Fingerprint TextShow.GHC.Generics TextShow.GHC.Stats TextShow.Numeric.Natural TextShow.System.Exit TextShow.System.IO TextShow.System.Posix.Types TextShow.Text.Read TextShow.TH TextShow.GHC.Conc.Windows TextShow.GHC.Event TextShow.GHC.TypeLits TextShow.Data.Type.Coercion TextShow.Data.Type.Equality TextShow.Data.OldTypeable TextShow.GHC.RTS.Flags TextShow.GHC.StaticPtr TextShow.GHC.Stack TextShow.Classes TextShow.Data.Typeable.Utils TextShow.FromStringTextShow TextShow.Instances TextShow.Options TextShow.TH.Internal TextShow.TH.Names TextShow.Utils -Wall -Wno-star-is-type -hide-all-packages
$ ls -t | head -n 1
core
$ file core
core: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '/home/recursion-ninja/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/bin/ghc -B/home/recursion-ninja/.ghcup/g', real uid: 1000, effective uid: 1000, real gid: 1000, effective gid: 1000, execfn: '/home/recursion-ninja/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/bin/ghc', platform: 'x86_64'
$ gdb '/home/recursion-ninja/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/bin/ghc core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/recursion-ninja/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/bin/ghc...
(No debugging symbols found in /home/recursion-ninja/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/bin/ghc)
[New LWP 646507]
[New LWP 646511]
[New LWP 646509]
[New LWP 646508]
[New LWP 646510]

<just hangs here>

^C
(gdb) bt full
#0  0x00007f8ae457918b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000000000000000 in ?? ()
No symbol table info available.
(gdb) quit

I suspect something isn't quite lining up at the end. I'm not very proficient with gdb. Anyone able to give me a bit of advice so I can submit a well informed GHC defect report?

@Mistuke
Copy link
Collaborator

Mistuke commented Jul 26, 2021

gdb '/home/recursion-ninja/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/bin/ghc core

I assume in the actual command you ran you don't have that stray ' ?

<just hangs here>

^C

This is fine.

(No debugging symbols found in /home/recursion-ninja/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/bin/ghc)

unfortunately looks like the symbols have been stripped. you'd need a debug build of GHC to see more.

(gdb) bt full
#0  0x00007f8ae457918b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x0000000000000000 in ?? ()
No symbol table info available.

Seems to indicate what I suspected that it's being triggered by something in the libc detecting the corruption.
The stack frame seems corrupt.
You can try to get more information by installing the debug symbols for the libc

sudo apt-get install libc6-dbg

and repeating it all, and use

thread apply all bt

to report where all threads are. Since you're on an x86 platform one can use the LBR (Last Branch Record) to figure out where the original call came from, but that's a lot to explain so would be easier to just submit a small repro.

Unfortunately without having a debug version of ghc there's not much info you can get other than the above.

@recursion-ninja
Copy link
Contributor Author

recursion-ninja commented Jul 26, 2021

@Mistuke , Thanks for the additional guidance. I was able to recover some more information after installing libc6-dbg. However, I have no idea what any of this information means:

recursion-ninja@kusabimaru:~/Code/text-show$ gdb /home/recursion-ninja/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/bin/ghc core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/recursion-ninja/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/bin/ghc...
(No debugging symbols found in /home/recursion-ninja/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/bin/ghc)
[New LWP 740319]
[New LWP 740323]
[New LWP 740320]
[New LWP 740321]
[New LWP 740322]
^CQuit
(gdb) thread apply all bt

Thread 5 (LWP 740322):
#0  0x00007f3c6a239aff in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000004200105c30 in ?? ()
#2  0x00000042001067b0 in ?? ()
#3  0x0000000000000002 in ?? ()
#4  0xffffffff6bacb6b0 in ?? ()
#5  0x0000000000000090 in ?? ()
#6  0x00007f3c6b4eca34 in ?? () from /home/recursion-ninja/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/bin/../base-4.15.0.0/libHSbase-4.15.0.0-ghc9.0.1.so
#7  0x0000000000000000 in ?? ()

Thread 4 (LWP 740321):
#0  0x00007f3c6a2465ce in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x000000420037cc90 in ?? ()
#2  0x0000000402aa8038 in ?? ()
#3  0x0000004200106010 in ?? ()
#4  0xffffffff00000040 in ?? ()
#5  0x000000420037cab8 in ?? ()
#6  0x00007f3c6b4d8e65 in base_GHCziEventziEPoll_new10_info () from /home/recursion-ninja/.ghcup/ghc/9.0.1/lib/ghc-9.0.1/bin/../base-4.15.0.0/libHSbase-4.15.0.0-ghc9.0.1.so
#7  0x0000000000000000 in ?? ()

Thread 3 (LWP 740320):
#0  __libc_read (nbytes=8, buf=0x7f3c69a5cea8, fd=3) at ../sysdeps/unix/sysv/linux/read.c:26
#1  __libc_read (fd=fd@entry=3, buf=buf@entry=0x7f3c69a5cea8, nbytes=nbytes@entry=8) at ../sysdeps/unix/sysv/linux/read.c:24
#2  0x00007f3c6a697d6e in itimer_thread_func (_handle_tick=0x7f3c6a67f300 <handle_tick>) at rts/posix/itimer/Pthread.c:127
#3  0x00007f3c6a3a3609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#4  0x00007f3c6a246293 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 2 (LWP 740323):
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xa1ebd8) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0xa1ebe0, cond=0xa1ebb0) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=cond@entry=0xa1ebb0, mutex=mutex@entry=0xa1ebe0) at pthread_cond_wait.c:647
#3  0x00007f3c6a697979 in waitCondition (pCond=pCond@entry=0xa1ebb0, pMut=pMut@entry=0xa1ebe0) at rts/posix/OSThreads.c:117
#4  0x00007f3c6a66d72b in waitForWorkerCapability (task=<optimized out>) at rts/Capability.c:706
#5  yieldCapability (pCap=pCap@entry=0x7f3c63ffee78, task=task@entry=0xa1eba0, gcAllowed=gcAllowed@entry=true) at rts/Capability.c:999
#6  0x00007f3c6a678973 in scheduleYield (task=0xa1eba0, pcap=0x7f3c63ffee70) at rts/Schedule.c:721
#7  schedule (initialCapability=initialCapability@entry=0x7f3c6a6c8dc0 <MainCapability>, task=task@entry=0xa1eba0) at rts/Schedule.c:317
#8  0x00007f3c6a679cbc in scheduleWorker (cap=cap@entry=0x7f3c6a6c8dc0 <MainCapability>, task=task@entry=0xa1eba0) at rts/Schedule.c:2670
#9  0x00007f3c6a6753f8 in workerStart (task=0xa1eba0) at rts/Task.c:446
#10 0x00007f3c6a3a3609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007f3c6a246293 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()

Thread 1 (LWP 740319):
#0  0x00007f3c6a16a18b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000000000000000 in ?? ()
(gdb)

Any insights?

@Mistuke
Copy link
Collaborator

Mistuke commented Jul 27, 2021

well looks like it was doing some kind of I/O action. Thread 2 is the RTS scheduler thread which is fine, Thread 3 is the libc's read thread which is fine, Thread 4 is the GHC's I/O manager's thread that's polling for I/O completion, also fine.

Thread 5 dunno, but thread 1 is fishy but no extra information.

You have enough information to submit a GHC bug report. it's much easier to debug with a debug build. They should be able to get the same error with valgrind if the crash doesn't happen when they test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants