Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: retry connection on failed schema cache load #1685

Merged
merged 3 commits into from
Dec 9, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 26 additions & 11 deletions main/Main.hs
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@ import PostgREST.DbStructure (getDbStructure, getPgVersion)
import PostgREST.Error (PgError (PgError), checkIsFatal,
errorPayload)
import PostgREST.Types (ConnectionStatus (..), DbStructure,
PgVersion (..), minimumPgVersion)
PgVersion (..), SCacheStatus (..),
minimumPgVersion)
import Protolude hiding (hPutStrLn, head, toS)
import Protolude.Conv (toS)

Expand Down Expand Up @@ -177,7 +178,7 @@ _1s = 1000000 :: Int -- 1 second
1. Tries to connect to pg server and will keep trying until success.
2. Checks if the pg version is supported and if it's not it kills the main
program.
3. Obtains the dbStructure.
3. Obtains the dbStructure. If this fails, it goes back to 1.
-}
connectionWorker
:: ThreadId -- ^ Main thread id. Killed if pg version is unsupported
Expand All @@ -203,7 +204,11 @@ connectionWorker mainTid pool refConf refDbStructure refIsWorkerOn (dbChannelEna
NotConnected -> return () -- Unreachable because connectionStatus will keep trying to connect
Connected actualPgVersion -> do -- Procede with initialization
putStrLn ("Connection successful" :: Text)
fillSchemaCache pool actualPgVersion refConf refDbStructure
scStatus <- loadSchemaCache pool actualPgVersion refConf refDbStructure
case scStatus of
SCLoaded -> pure () -- do nothing and proceed if the load was successful
SCOnRetry -> work -- retry
SCFatalFail -> killThread mainTid -- die if our schema cache query has an error
liftIO $ atomicWriteIORef refIsWorkerOn False

{-|
Expand Down Expand Up @@ -244,21 +249,27 @@ connectionStatus pool =
putStrLn $ "Attempting to reconnect to the database in " <> (show delay::Text) <> " seconds..."
return itShould

-- | Fill the DbStructure by using a connection from the pool
fillSchemaCache :: P.Pool -> PgVersion -> IORef AppConfig -> IORef (Maybe DbStructure) -> IO ()
fillSchemaCache pool actualPgVersion refConf refDbStructure = do
-- | Load the DbStructure by using a connection from the pool.
loadSchemaCache :: P.Pool -> PgVersion -> IORef AppConfig -> IORef (Maybe DbStructure) -> IO SCacheStatus
loadSchemaCache pool actualPgVersion refConf refDbStructure = do
conf <- readIORef refConf
result <- P.use pool $ HT.transaction HT.ReadCommitted HT.Read $ getDbStructure (toList $ configDbSchemas conf) (configDbExtraSearchPath conf) actualPgVersion (configDbPreparedStatements conf)
case result of
Left e -> do
-- If this error happens it would mean the connection is down again. Improbable because connectionStatus ensured the connection.
-- It's not a problem though, because App.postgrest would retry the connectionWorker or the user can do a SIGSUR1 again.
hPutStrLn stderr . toS . errorPayload $ PgError False e
putStrLn ("Failed to load the schema cache" :: Text)
let err = PgError False e
putErr = hPutStrLn stderr . toS . errorPayload $ err
case checkIsFatal err of
Just _ -> do
hPutStrLn stderr ("A fatal error ocurred when loading the schema cache" :: Text) >> putErr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding wording along the lines of "this is probably a bug in PostgREST. Please report it at "?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that sounds nice for the user.

"This is probably a bug in PostgREST. Please report it at https://github.com/PostgREST/postgrest/issues"?

Is that good?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can be sure that I will then report my syntax errors during development to the issue tracker :D

return SCFatalFail
Nothing -> do
hPutStrLn stderr ("An error ocurred when loading the schema cache" :: Text) >> putErr
return SCOnRetry

Right dbStructure -> do
atomicWriteIORef refDbStructure $ Just dbStructure
putStrLn ("Schema cache loaded" :: Text)
return SCLoaded

{-|
Starts a dedicated pg connection to LISTEN for notifications.
Expand All @@ -274,8 +285,12 @@ listener dbUri dbChannel pool refConf refDbStructure mvarConnectionStatus connWo
Connected actualPgVersion -> void $ forkFinally (do -- forkFinally allows to detect if the thread dies
dbOrError <- C.acquire dbUri
-- Debounce in case too many NOTIFYs arrive. Could happen on a migration(assuming a pg EVENT TRIGGER is set up).
-- This might not be needed according to pg docs https://www.postgresql.org/docs/12/sql-notify.html:
-- "If the same channel name is signaled multiple times from the same transaction with identical payload strings, the database server can decide to deliver a single notification only."
-- But we do it to be extra safe.
scFiller <- mkDebounce (defaultDebounceSettings {
debounceAction = fillSchemaCache pool actualPgVersion refConf refDbStructure,
-- It's not necessary to check the loadSchemaCache success here. If the connection drops, the thread will die and proceed to recover below.
debounceAction = void $ loadSchemaCache pool actualPgVersion refConf refDbStructure,
debounceEdge = trailingEdge, -- wait until the function hasn’t been called in _1s
debounceFreq = _1s })
case dbOrError of
Expand Down
6 changes: 4 additions & 2 deletions src/PostgREST/Error.hs
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ compressedRel rel =
(_, _, _) ->
mempty

data PgError = PgError Authenticated P.UsageError
data PgError = PgError Authenticated P.UsageError deriving Show
type Authenticated = Bool

instance PgrstError PgError where
Expand Down Expand Up @@ -216,7 +216,9 @@ checkIsFatal (PgError _ (P.ConnectionError e))
| isAuthFailureMessage = Just $ toS failureMessage
| otherwise = Nothing
where isAuthFailureMessage = "FATAL: password authentication failed" `isPrefixOf` toS failureMessage
failureMessage = fromMaybe "" e
failureMessage = fromMaybe mempty e
-- Chek for a syntax error(42601 is the pg code). This would mean the error is on our part somehow, so we treat it as fatal.
checkIsFatal (PgError _ (P.SessionError (H.QueryError _ _ (H.ResultError (H.ServerError "42601" e _ _))))) = Just $ toS e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I'm not sure if the json cast error mentioned here has the same number, but this will allow me to easily extend it to cover that error as well.

checkIsFatal _ = Nothing


Expand Down
7 changes: 7 additions & 0 deletions src/PostgREST/Types.hs
Original file line number Diff line number Diff line change
Expand Up @@ -560,6 +560,13 @@ data ConnectionStatus
| FatalConnectionError Text
deriving (Eq, Show)

-- | Schema cache status
data SCacheStatus
= SCLoaded
| SCOnRetry
| SCFatalFail
deriving (Eq, Show)

data LogLevel = LogCrit | LogError | LogWarn | LogInfo deriving (Eq)

instance Show LogLevel where
Expand Down