Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malloc double free error when disconnecting in finalizer #35

Open
hesselink opened this issue May 14, 2014 · 15 comments
Open

Malloc double free error when disconnecting in finalizer #35

hesselink opened this issue May 14, 2014 · 15 comments

Comments

@hesselink
Copy link
Member

We're seeing a crash using HDBC-postgresql together with resource-pool 0.2.2.0. I originally filed this as bos/pool#18 but there it seemed the problem was in HDBC-postgresql and its handling of disconnect in finalizers. See that ticket for details, but in short it seems that if external code calls disconnect from a finalizer, HDBC-postgresql will try to disconnect again from its own finalizer.

@zenzike
Copy link
Member

zenzike commented May 14, 2014

Hm. Thanks for reporting this. By the way, the HDBC-odbc drivers are a lot more stable and performant than the HDBC-postgresql ones (even on postgresql databases): it's received a lot more attention and will soon have a GSOC student working on it. Is there a reason you're still using HDBC-postgresql?

@hesselink
Copy link
Member Author

We're using the error types defined in HDBC-postgresql in a few places, not sure if there are any alternatives for that.

If you need any help trying to debug this, just let me know. I've tried to come up with a small, reproducible test case, but I can't: it seems to be some weird interplay between resource-pool, HDBC-postgresql and, for some reason, aws.

@zenzike
Copy link
Member

zenzike commented May 14, 2014

In all honesty I won't be able to take a close look at this for quite some time. Any help would be greatly appreciated.

@hesselink
Copy link
Member Author

Hmm, I could try to get a good test case again (though I'm not hopeful), but I'm not really up for debugging refcounting C code...

@dmjio
Copy link
Contributor

dmjio commented Aug 23, 2015

By the way, the HDBC-odbc drivers are a lot more stable and performant than the HDBC-postgresql ones (even on postgresql databases): it's received a lot more attention and will soon have a GSOC student working on it

@zenzike are you referring to the HDBC-odbc package? I'm after a sane backend for relational-record for PostgreSQL that ideally has a thread pool. The hdbc-postgresql seems to work, but the hasql benchmarks report its 7x slower for SELECTs. https://nikita-volkov.github.io/hasql-benchmarks/

@zenzike
Copy link
Member

zenzike commented Aug 24, 2015

Yes, I was talking about the HDBC-odbc package. It was faster than hdbc-postgresql about 2 years ago, and I think things haven't moved on too much since then.

@anton-dessiatov
Copy link

Just wanted to say that there is a 2.5 branch with HDBC-odbc internals changed quite a bit. It doesn't use refcounting anymore (because that wasn't safe in multithreaded environments and programs crashed), instead it relies on GHC finalizers to do cleanups.

The problem with 2.5 branch is that it offers deterministic resources cleanup, which requires patched version of HDBC with this pull request: hdbc/hdbc#32

If that pull request would ever be accepted, I am ready to release new HDBC-odbc version. We are using it in production for a long time already (with MSSQL driver) and it seems to be very stable.

@dmjio
Copy link
Contributor

dmjio commented Aug 24, 2015

@zenzike can we quantify how much faster? Setting up odbc isn't that fun either, at least on OSX, compared to using postgresql-simple and HDBC-postgresql. But maybe I'm just complaining.
@anton-dessiatov I think this package uses ref counting as well :/ (at least, when looking at the C code included, it seems that way).

Have we thought about rewriting this package using postgresql-libpq ? And including a safe pooling abstraction? Nikita's work with hasql is good, but I like the auto-generated ADT's from a schema that the relational-query-HDBC packages gives with the defineTableFromDB function.

The IConnection typeclass seems kind of rigid, but we could probably integrate it somehow.

@dmjio
Copy link
Contributor

dmjio commented Aug 24, 2015

It seems like a lot of what I'm looking for has been done in the hdbi-postgresql package. (https://hackage.haskell.org/package/hdbi-postgresql). But this probably won't be usable w/ relational-record

@tfausak
Copy link
Contributor

tfausak commented May 26, 2020

I spent way too long debugging this before finding this issue and bos/pool#18. Luckily I was able to reproduce this behavior. Here's a tiny program that reliably crashes:

-- base-4.13.0.0, HDBC-2.4.0.3, HDBC-postgresql-2.3.2.7, resource-pool-0.2.3.2
module Main ( main ) where
import qualified Data.Pool as Pool
import qualified Database.HDBC as Sql
import qualified Database.HDBC.PostgreSQL as Postgres

main :: IO ()
main = do
  putStr $ unlines
    [ "This program will attempt to crash."
    , "It will print out attempt numbers as it goes."
    , "Press Ctrl-C to exit."
    ]
  attempt 1

attempt :: Int -> IO ()
attempt n = if n > 32 then fail "Failed to crash!" else do
  print n
  pool <- Pool.createPool (Postgres.connectPostgreSQL "") Sql.disconnect 1 1 1
  Pool.withResource pool . const $ pure ()
  attempt $ n + 1

I'm not familiar with the internals of HDBC-postgresql, HDBC, or resource-pool. And I've never written C in anger. But it seems like maybe resource-pool is calling disconnect after the connection has been garbage collected, or something like that. Regardless, I have found two workarounds:

  • Don't call disconnect. HDBC-postgresql will automatically disconnect when the connection is garbage collected, so you don't need to do it manually.

  • Call destroyAllResources. Usually resource-pool's reaper runs once every second. By manually destroying all of its resources, you can make sure that disconnect is called before they get garbage collected.

I don't have any insight into how to fix the root problem here, but hopefully that helps. Maybe someone who knows more about this stuff can use my minimal example here to fix things. Happy hacking!

Edited to add: When running these tests, the PostgreSQL outputs logs like this:

2020-05-26 20:35:02.536 UTC [1996] LOG:  unexpected EOF on client connection with an open transaction
2020-05-26 20:35:02.536 UTC [2001] LOG:  could not receive data from client: Connection reset by peer

@dmjio
Copy link
Contributor

dmjio commented Aug 28, 2020

@tfausak not sure the state of this issue, but I attempted to retry failure and could not reproduce with GHC 8.8.4 and resource-pool on Darwin.

[nix-shell:~/Desktop/hdbc-code]$ ghc Main.hs -o main && ./main
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking main ...
This program will attempt to crash.
It will print out attempt numbers as it goes.
Press Ctrl-C to exit.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
main: user error (Failed to crash!)

@tfausak
Copy link
Contributor

tfausak commented Aug 28, 2020

Huh, I can't seem to reproduce the error anymore either. No clue what changed 😕

@dmjio
Copy link
Contributor

dmjio commented Aug 28, 2020

@tfausak GHC 8.8.4 came out with "self-healing mode" it seems.

@dmjio
Copy link
Contributor

dmjio commented Aug 30, 2020

@tfausak the issues resurfaces when you compile with -threaded and run with +RTS -N -RTS.

@dmjio
Copy link
Contributor

dmjio commented Aug 30, 2020

@tfausak I can confirm now that it crashes give your example with -threaded.

[nix-shell:~/Desktop/hdbc-code]$ ghc -threaded Foo.hs -o foo && ./foo +RTS -N -RTS && echo $?
[1 of 1] Compiling Main             ( Foo.hs, Foo.o )
Linking foo ...
This program will attempt to crash.
It will print out attempt numbers as it goes.
Press Ctrl-C to exit.
1
2
3
4
5
6
7
8
9
10
11
foo(72181,0x700002f32000) malloc: *** error for object 0x7fc74dc09f60: pointer being freed was not allocated
foo(72181,0x700002f32000) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants