Fix failover process after disk crash #637

maksm90 · 2019-04-29T14:50:10Z

I have encountered the problem on test mini installation (cluster based on raspberry pi) that autofailover didn't performed after disk (SD card, in case of raspberry) manual insulation.

The problem lies in PostgresKeeper.updatePGState routine (cmd/keeper/cmd/keeper.go:561) that Postgres state isn't updated in DCS (consul or etcd) when PostgresKeeper.GetPGState routine returns some not null error. Such error is emitted after check whether PGDATA is initialized (Manager.IsInitialized routine, cmd/keeper/cmd/keeper.go:641) what is induced after disk file system operation failure (my case).

The current pull request fixes this issue.

sgotti

@maksm90 Thanks for your PR! Some comments inline.

sgotti · 2019-04-30T21:26:33Z

cmd/keeper/cmd/keeper.go

@@ -641,7 +640,7 @@ func (p *PostgresKeeper) GetPGState(pctx context.Context) (*cluster.PostgresStat

 	initialized, err := p.pgm.IsInitialized()
 	if err != nil {
-		return nil, err
+		return pgState, err


We could just do as the rest of the function, return the current pgstate (that will have healthy set to false) without returning an error. For the same reason we should change the signature of the function removing the error return since it will never return an error:

func (p *PostgresKeeper) GetPGState(pctx context.Context) *cluster.PostgresState

OK, done below. But I'm confused that the function GetPGState is called in stage of node resync and returned error interrupts the resync process obviously. How to deal with this issue? For now, I doesn't interrupt the resync process anyway.

@maksm90 you're right. Currently GetPGState pourpose is confusing. We want to gather all the possible available information and return them also if failed with the exception of that case were we wan't to handle the error.

So, sorry, just ignore my previous request. The first version where you just ignored the error was better...

@sgotti I have reverted the last commit and leave the first version

sgotti · 2019-04-30T21:27:58Z

cmd/keeper/cmd/keeper.go

@@ -558,7 +558,6 @@ func (p *PostgresKeeper) updatePGState(pctx context.Context) {
 	pgState, err := p.GetPGState(pctx)
 	if err != nil {
 		log.Errorw("failed to get pg state", zap.Error(err))
-		return


For the other comment just remote the error check since GetPGState will become:

func (p *PostgresKeeper) GetPGState(pctx context.Context) *cluster.PostgresState

sgotti · 2019-05-23T11:07:35Z

@maksm90 Thanks. Can you please squash in a single commit so I can merge this PR?

maksm90 · 2019-05-27T11:20:38Z

Can you please squash in a single commit so I can merge this PR?

Done

sgotti · 2019-05-27T12:39:58Z

@maksm90 Thanks a lot! Merging.

maksm90 · 2019-05-27T12:48:01Z

@maksm90 Thanks a lot! Merging.

Thanks too!

Fix failover routine after disk crash

7afe258

sgotti requested changes Apr 30, 2019

View reviewed changes

sgotti merged commit 984dc87 into sorintlab:master May 27, 2019

sgotti added this to the v0.14.0 milestone Jun 6, 2019

maksm90 deleted the fix_disk_crash_failover branch January 25, 2020 10:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix failover process after disk crash #637

Fix failover process after disk crash #637

maksm90 commented Apr 29, 2019

sgotti left a comment

sgotti Apr 30, 2019

maksm90 May 6, 2019

sgotti May 8, 2019

maksm90 May 15, 2019

sgotti Apr 30, 2019

sgotti commented May 23, 2019

maksm90 commented May 27, 2019

sgotti commented May 27, 2019

maksm90 commented May 27, 2019

Fix failover process after disk crash #637

Fix failover process after disk crash #637

Conversation

maksm90 commented Apr 29, 2019

sgotti left a comment

Choose a reason for hiding this comment

sgotti Apr 30, 2019

Choose a reason for hiding this comment

maksm90 May 6, 2019

Choose a reason for hiding this comment

sgotti May 8, 2019

Choose a reason for hiding this comment

maksm90 May 15, 2019

Choose a reason for hiding this comment

sgotti Apr 30, 2019

Choose a reason for hiding this comment

sgotti commented May 23, 2019

maksm90 commented May 27, 2019

sgotti commented May 27, 2019

maksm90 commented May 27, 2019