-
Notifications
You must be signed in to change notification settings - Fork 49
Fix initial database connection issue #348
Comments
@wadeking98, The changes in PR #351 don't seem to be working as expected. The initial server selection timeout does not appear to be configurable. I can set the Error message is:
|
@WadeBarnes a custom |
@wadeking98, @esune suggestion is a good one, it would resolve other issues, like the pod being made available before the database connection is actually established, and would provide k8s more control over the app in general. The current solution helps, but the pod will still appear to be running before a database connection is available. On startup the This signaling would replace the app shutdown on initial database connection failure. An easy feature for you to add? |
@WadeBarnes did you still want this feature now that we've fixed the timeout issue? |
Yes please |
If the issuer kit database (mongo) is unavailable for a period of time when the issuer kit api attempts to connect (https://github.com/bcgov/issuer-kit/blob/main/api/src/app.ts#L38) a connection error can be thrown by the MongoClient driver (https://github.com/bcgov/issuer-kit/blob/main/api/src/mongodb.ts), resulting it the following log message:
At this stage the driver has failed to make and initial connection to the database and will never attempt to connect with the database again. This renders the api completely inoperable. This manifests as an inability to issue credentials to a wallet, the initial connections are made but the credential issuing flow appears to hang on the wallet and issuer-web side due to this error. In contrast when the api is able to make the initial connection the database can become unavailable and available again and the driver is able to reconnect. It is only the initial connection that appears critical.
This issue occurs in our OpenShift environments during rollouts and evacuations due to the fact the API startup process is much faster than the database's startup process. Therefore in the majority of cases the API has started while the database is unavailable.
There are a couple ways we can deal with this:
serverSelectionTimeoutMS
parameter (which defaults to 30 seconds) could be made configurable so it can be adjusted to give the connection attempt a bit more time. In OpenShift this will cause a new pod to be started which will try to connect to the database.Therefore it looks like the most appropriate way to handle this issue is using the first option.
The text was updated successfully, but these errors were encountered: