Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: retry connection on failed schema cache load #1685

Merged
merged 3 commits into from
Dec 9, 2020

Conversation

steve-chavez
Copy link
Member

When the "Failed to load the schema cache" error happens, the connection is not retried. This can be reproduced on master...steve-chavez:failed-to-load-schema.

This error is actually a regression that happened on #1542(check this deleted comment).

When the "Failed to load the schema cache" error happens,
the connection is not retried.
@steve-chavez
Copy link
Member Author

Ideally we'd have "recovery tests" that could detect these errors. But they're a bit complicated - need to stop/start pg, recover with SIGUSR1 or NOTIFY, check pgrest is alive, etc. Maybe we can look at how to do them after #1682 is merged.

So, I think this one should be good to merge as it is.

@wolfgangwalther
Copy link
Member

In #1632 (comment) I assumed that there is no error message when one of the db schema scripts fails. I just tested with this branch by introducing a syntax error in the pfkSourceColumns query - the error is reported. It doesn't look like this changed here, so I am not sure what happened back when I did not get any errors.

But this also leads to flooding my console:

Attempting to connect to the database...
Connection successful
{"hint":null,"details":null,"code":"42601","message":"syntax error at or near \"contype\""}
Failed to load the schema cache
Attempting to connect to the database...
Connection successful
{"hint":null,"details":null,"code":"42601","message":"syntax error at or near \"contype\""}
Failed to load the schema cache
Attempting to connect to the database...
Connection successful
{"hint":null,"details":null,"code":"42601","message":"syntax error at or near \"contype\""}
Failed to load the schema cache
Attempting to connect to the database...
Connection successful
{"hint":null,"details":null,"code":"42601","message":"syntax error at or near \"contype\""}
Failed to load the schema cache
Attempting to connect to the database...
...

and so on.

I'm not sure we actually want to retry like this? This seems like an error that it's not recoverable from. I think all SQL errors in dbStructure are errors that should not be there at all and are not caused by the user or the environment. They are bugs in PostgREST, so unlikely to change by restarting.

Are there any errors that can be recovered from? I think connection errors are caught before even trying the query, right?

I feel like we should exit on an error like this. Or at least debounce the retry (we do this for the database connection, right?).

@steve-chavez
Copy link
Member Author

@wolfgangwalther I think a check for a syntax error can be added here:

checkIsFatal :: PgError -> Maybe Text
checkIsFatal (PgError _ (P.ConnectionError e))
| isAuthFailureMessage = Just $ toS failureMessage
| otherwise = Nothing
where isAuthFailureMessage = "FATAL: password authentication failed" `isPrefixOf` toS failureMessage
failureMessage = fromMaybe "" e
checkIsFatal _ = Nothing

I agree it'd be useful, reminds me of an error that happened some time ago with FDWs: #1352 (not a syntax error though).

I'll check it out.

@steve-chavez
Copy link
Member Author

We should be careful in quitting on retrying the schema cache query.

So for now I've only added the strict case of dying on a syntax error.

@wolfgangwalther WDYT?

failureMessage = fromMaybe "" e
failureMessage = fromMaybe mempty e
-- Chek for a syntax error(42601 is the pg code). This would mean the error is on our part somehow, so we treat it as fatal.
checkIsFatal (PgError _ (P.SessionError (H.QueryError _ _ (H.ResultError (H.ServerError "42601" e _ _))))) = Just $ toS e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I'm not sure if the json cast error mentioned here has the same number, but this will allow me to easily extend it to cover that error as well.

main/Main.hs Outdated
putErr = hPutStrLn stderr . toS . errorPayload $ err
case checkIsFatal err of
Just _ -> do
hPutStrLn stderr ("A fatal error ocurred when loading the schema cache" :: Text) >> putErr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding wording along the lines of "this is probably a bug in PostgREST. Please report it at "?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that sounds nice for the user.

"This is probably a bug in PostgREST. Please report it at https://github.com/PostgREST/postgrest/issues"?

Is that good?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can be sure that I will then report my syntax errors during development to the issue tracker :D

@wolfgangwalther
Copy link
Member

We should be careful in quitting on retrying the schema cache query.

* On #1287, theres an `XX000` error that looks like it might be fatal to our query. But according to https://stackoverflow.com/a/2447576/4692662, that could be fixed with a reconnection.

* On #1352, an undefined function(42883) error shows. But that could also be fixed by creating the function? postgrest would reconnect when that's done.

So for now I've only added the strict case of dying on a syntax error.

@wolfgangwalther WDYT?

Of course - if there is any chance to recover from the error without user interaction, then we should retry.

If the user needs to create a function to make this work, they could also trigger a restart of PostgREST? This seems to be a case that's somewhere in the middle. Not sure about that. However both your examples seem to be about fdws - it seems reasonable to restart for fdw stuff, because we can't tell why the fdw failed. It could very well be just a connection loss or whatever, so easily retryable.

Dying has the advantage that the user needs to take action and it's more likely that they will report a bug to us - so for those cases where it's highly likely that we have a bug, this should be the best thing to do.

Starting with the syntax error seems reasonable, even if it's just for convenience while developing (with still hypothetical postgrest-watch postgrest-run --dump-schema, I'm sure we will get to that point). We can add more errors along the way, when we have other cases where it makes sense to die.

If in doubt, retrying should be the safe choice.

@steve-chavez
Copy link
Member Author

Dying has the advantage that the user needs to take action and it's more likely that they will report a bug to us

True. Now that you mention it, I think it would be fit even on fdws. As it is, the inifinite loop can discourage the user in trying to make postgrest work.

We can add more errors along the way, when we have other cases where it makes sense to die.

Yeah. When trying to get a syntax error I got a "42703"(undefined column), we can refine it progressively.

@steve-chavez
Copy link
Member Author

The code coverage job is taking some time, but I believe this should be good to merge.

@steve-chavez steve-chavez merged commit ebd474a into PostgREST:master Dec 9, 2020
monacoremo pushed a commit to monacoremo/postgrest that referenced this pull request Jul 17, 2021
Retry the connection when the
"Failed to load the schema cache" error happens.

Also die if the schema cache query has a syntax error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants