Skip to content
This repository has been archived by the owner on Mar 26, 2020. It is now read-only.

transaction: Resilient Txn Engine #1268

Merged
merged 7 commits into from
Dec 11, 2018
Merged

Conversation

oshankkumar
Copy link

The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document #1003

Signed-off-by: Oshank Kumar okumar@redhat.com

@ghost ghost assigned oshankkumar Oct 8, 2018
@ghost ghost added the in progress label Oct 8, 2018
@kshlm kshlm requested review from prashanthpai and kshlm October 8, 2018 14:00
@oshankkumar oshankkumar force-pushed the newTxnFrmwrk branch 2 times, most recently from 55bb33c to 89a0b73 Compare October 8, 2018 14:37
@atinmu
Copy link
Contributor

atinmu commented Oct 8, 2018

@oshankkumar Can we please have 'Fixes' or 'Updates' tag in the commit message?

@atinmu atinmu changed the title transaction: Implemented Txn Engine transaction: Resilient Txn Engine Oct 8, 2018
@atinmu
Copy link
Contributor

atinmu commented Oct 8, 2018

@oshankkumar It might be worth to mention which all parts of #919 are covered here and what's pending. This will give us a good heads up for the reviewers.

@oshankkumar oshankkumar force-pushed the newTxnFrmwrk branch 3 times, most recently from 847ff5f to c537254 Compare October 10, 2018 12:37
@oshankkumar
Copy link
Author

oshankkumar commented Oct 11, 2018

Task List

  • Transaction Engine
  • Creating and running a transaction.
  • Synchronized step execution
  • Modify global data structures
  • Handling Failure Txn
  • Cleanup Leader
  • Handling peer restart during transaction (partially Done)

@oshankkumar oshankkumar force-pushed the newTxnFrmwrk branch 5 times, most recently from 8fe9508 to 1733ca1 Compare October 17, 2018 08:39
@oshankkumar oshankkumar force-pushed the newTxnFrmwrk branch 11 times, most recently from 210a04f to 1055b69 Compare October 26, 2018 03:31
Copy link
Member

@kshlm kshlm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Looks good overall from the logic and structuring. I have several general comments about style. In addition, please try to add more comment through the code. Especially for the unexported helper functions, this will make understanding the code much easier in the future.

@@ -33,29 +33,32 @@ type TxnCtx interface {
// Logger returns the Logrus logger associated with the context
Logger() log.FieldLogger

// commit writes all locally cached keys and values into the store using
// Commit writes all locally cached keys and values into the store using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear here. You're exporting this and the other types and functions in this package, because of the new package, right? Once we refactor out the old package, these can go back to being unexported.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes Kaushal, I have made few things exported because I wanted to use in new package


const (
leaderKey = "leader"
cleanupTimerDur = time.Second * 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the timer durations for longer. I'm looking at this in minutes rather than seconds.


// StartElecting triggers a new election after every `electionTimerDur`.
// If it succeeded then it assumes the leader role and returns
func (c *CleanupHandler) StartElecting() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For leader election, I'd prefer if we use github.com/coreos/etcd/clientv3/concurrency.Election, instead of rolling our own.

The concurrency package is already used elsewhere in the store package for concurrency.Session.


// TxnEngine executes the given transaction across the cluster.
// It makes use of etcd as the means of communication between nodes.
type TxnEngine struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment on the naming. You can have this be just Engine. The rest of the names with txn prefix can also do without it. For things like the StepManager you can have the prefix.

)

// TransactionEngine is responsible for executing newly added txn
var TransactionEngine *TxnEngine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to use this variable outside this package? I don't think you really need to export this.

}

if err := txnEng.stepManager.RunStep(ctx, step, txn.Ctx); err != nil {
txn.Ctx.Logger().WithError(err).Errorf("failed in executing step %+v", step)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using log.*f variants of the log functions. Instead add context to the log message using WithField or WithFields.

txn.Ctx.Logger().WithError(err).Errorf("encounter an error in synchronizing txn step %+v", step)
return err
}
txn.Ctx.Logger().Info("transaction got synchronized")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a debug log. Other than transaction started and finished logs, everything else should be debug. If needed any error logs should contain the full context to help identify the error cause.

@oshankkumar oshankkumar force-pushed the newTxnFrmwrk branch 7 times, most recently from a866695 to 526cb04 Compare November 5, 2018 13:15
@oshankkumar
Copy link
Author

retest this please

@Madhu-1
Copy link
Member

Madhu-1 commented Nov 5, 2018

@nigelbabu we have a CI failure, any hint why its failing?

13:16:00 [gluster_glusterd2] $ /bin/sh -xe /tmp/jenkins82820089879639684.sh
13:16:00 + SSID_FILE=/home/gluster/workspace/gluster_glusterd2/cico-ssid
13:16:00 ++ cat /home/gluster/workspace/gluster_glusterd2/cico-ssid
13:16:00 cat: /home/gluster/workspace/gluster_glusterd2/cico-ssid: No such file or directory

Copy link
Member

@kshlm kshlm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'll approve once I test this out.

@atinmu
Copy link
Contributor

atinmu commented Nov 16, 2018

@kshlm Did you get a chance to test this? Can we please expedite to get this PR in?

Copy link
Member

@kshlm kshlm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in my response here. I tested this out, and it mostly works. Still though, the existing transactions need to be updated to sync the required steps. This mostly is having the 'store' steps be synchronized. There are also a few other steps that would require synchronization, like the barrier activate step in snapshot. Please complete this before we can move on to merging.

return ErrLockExists
}

logger := t.Ctx.Logger().WithField("lockID", lockID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore the logs here. They can be helpful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restored the previous logs

Oshank Kumar and others added 5 commits December 11, 2018 15:44
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
Signed-off-by: Oshank Kumar <okumar@redhat.com>
Signed-off-by: Oshank Kumar <okumar@redhat.com>
Signed-off-by: Oshank Kumar <okumar@redhat.com>
@ghost ghost assigned kshlm Dec 11, 2018
@kshlm kshlm merged commit 5f88917 into gluster:master Dec 11, 2018
@ghost ghost removed the in progress label Dec 11, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants