Skip to content
This repository has been archived by the owner on Mar 26, 2020. It is now read-only.

doc: Add a design document for the transaction framework #1003

Merged
merged 2 commits into from
Dec 11, 2018

Conversation

kshlm
Copy link
Member

@kshlm kshlm commented Jul 12, 2018

TODO:

  • Add more examples

Closes #919

This document contains the design for the proposed update to the
transaction framework to make it more reliable and distributed.

Closes gluster#919
@kshlm
Copy link
Member Author

kshlm commented Jul 12, 2018

Use the rich-diff view to get the rendered document. Or view the document directly

@phlogistonjohn
Copy link
Contributor

I finally sat down and found time to read through the doc. This is really good!
I especially like the "timeline tables".

Here's a couple thoughts that occurred to me as I worked my way through the doc:

  • Early on the document talks about "verify if all the listed peers are online" which makes it sound like initiator needs to know the state of peers. Is this a "soft" state?
  • This also makes me think about transactions that don't "know" what peers might be invoved at the start - for example I'm thinking of a scenario like "device replace" where all bricks that reside on device x must be relocated to new devices [a, b, c...] dynamically.
  • I'd love to see a little high-level overview of the data structures would look in etcd, what the communication channels and results reporting looks like.
  • A couple of times the phrase "timer expires" appears. Can I assume that this is an etcd level construct that persists in the store rather than a per-process timer?
  • To confirm something I'm thinking: Would etcd dying look like a failure to make progress and eventfully get into the cleanup state once etcd comes back?

All all in this jives with a lot of thinks I was thinking before I read it so I can only think this is heading in a great direction! :-)

@kshlm
Copy link
Member Author

kshlm commented Jul 24, 2018

Early on the document talks about "verify if all the listed peers are online" which makes it sound like initiator needs to know the state of peers. Is this a "soft" state?

All peers that are involved in the transaction are checked if they're online. This is just to ensure that the transaction has a possibility of success when we launch it. The initiator needs to know that the peers involved are connected to the store. This check already exists and isn't something new that we need to do.

This also makes me think about transactions that don't "know" what peers might be invoved at the start - for example I'm thinking of a scenario like "device replace" where all bricks that reside on device x must be relocated to new devices [a, b, c...] dynamically.

Transactions always need to know where they need to run. The transaction itself and the transaction framework are not intelligent, and cannot decide where it needs to run. It's the upper layer callers into the transaction framework, who have the intelligence and would craft a transaction that does actions they want. So, for the "device replace" case, the API handler for the request would identify the new devices and nodes, and prepare a transaction with that information. The transaction framework would then execute the transaction.

I'd love to see a little high-level overview of the data structures would look in etcd, what the communication channels and results reporting looks like.

I'd need a little more time to think about the actual internal structures that would be implemented. I'll try to what is possible to this document.

A couple of times the phrase "timer expires" appears. Can I assume that this is an etcd level construct that persists in the store rather than a per-process timer?

The timers are all local, Go timers started by each peer. They are not provided by etcd. We would most likely use timeout contexts for transaction timer. The cleanup timer is just a periodic timer, that would call the cleanup function every X time-period. To help the cleanup function, we would also need to add a CreatedAt field to the transaction when we add it to the pending namespace.

I'll clarify this in the document.

To confirm something I'm thinking: Would etcd dying look like a failure to make progress and eventfully get into the cleanup state once etcd comes back?

Yes. This would be correct. It would be sort of similar to the transaction leader going offline. Any held global locks would be lost, so any transactions in progress cannot be resumed. All of them will need to be cleaned-up.

@phlogistonjohn
Copy link
Contributor

So, for the "device replace" case, the API handler for the request would identify the new devices and nodes, and prepare a transaction with that information. The transaction framework would then execute the transaction.

I'm not 100% sold on this and I'll try to explain why. Please feel free to try and convince me it is not an issue. :-)
In heketi we have some architectural problems around device remove and node remove (these generally boil down to evacuating bricks from devices A, B, C...). What comes in from the client is the request "remove device A" and so we do our best to record that action to be taken. However, this ends up turning into an iteration over the "ivp algorithm" for one bick in many volumes. I have been thinking about it as one operation (device remove) that spawn a number of sub operations (brick replace) that are semi-independent.
Would it be reasonable for gd2 to have transaction X (replace device) spawn multiple "helper" transactions? One of my concerns comes from the fact that these actions like IVP/device remove take complex planning and need to take many locks or do other complex things with etcd that I'm not sure should happen outside of a transaction.

To help the cleanup function, we would also need to add a CreatedAt field to the transaction when we add it to the pending namespace.

Could etcd provide a way to help with this? I worry about timestamps in distributed systems because of clock skew issues.

@kshlm
Copy link
Member Author

kshlm commented Jul 26, 2018

I'm not 100% sold on this and I'll try to explain why. Please feel free to try and convince me it is not an issue. :-)
In heketi we have some architectural problems around device remove and node remove (these generally boil down to evacuating bricks from devices A, B, C...). What comes in from the client is the request "remove device A" and so we do our best to record that action to be taken. However, this ends up turning into an iteration over the "ivp algorithm" for one bick in many volumes. I have been thinking about it as one operation (device remove) that spawn a number of sub operations (brick replace) that are semi-independent.

I still see this being possibly be done with the proposed framework. But I may not be seeing it as you see it. We should work together and write down the workflow for this. It'll help us understand if the same can be reasonably done with the framework or if we need more changes like the transactions spawning transactions below.

Would it be reasonable for gd2 to have transaction X (replace device) spawn multiple "helper" transactions? One of my concerns comes from the fact that these actions like IVP/device remove take complex planning and need to take many locks or do other complex things with etcd that I'm not sure should happen outside of a transaction.

We could possibly do this, or something that provides this desired effect.
I had recently done changes to the current transaction framework to allow global transaction locks to be taken before transaction steps are executed or even populated. This also changed the lifetime of a transaction. Instead of a transaction beginning from the moment the first step was executed, and ending when the last step was executed, the transaction now begins when it is created, transaction.NewTxn and ends with the cleanup transaction.(*Txn).Done. The steps are populated and exectued between these two calls. At the moment there is just one list of steps. But we could have multiple create multiple lists of steps and execute each list of steps one after the other, and the results of execution can be used to prepare new list of steps. This propsed framework can be built to handle this.

@kshlm
Copy link
Member Author

kshlm commented Jul 26, 2018

Could etcd provide a way to help with this? I worry about timestamps in distributed systems because of clock skew issues.

Not as far as I know. Etcd provides revisions, but not any sort of timestamps.

oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 8, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 8, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 8, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 8, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 8, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 8, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 9, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 9, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 10, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 16, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 16, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 17, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 22, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

Signed-off-by: Oshank Kumar <okumar@redhat.com>

transaction: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 23, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

transaction: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 23, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

transaction: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 23, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 24, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 24, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 25, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 25, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 25, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 25, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 26, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 26, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 29, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 31, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Oct 31, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Nov 5, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Nov 7, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
@atinmu
Copy link
Contributor

atinmu commented Nov 10, 2018

@kshlm @oshankkumar can we please get the agreed upon pieces of the design what we're implementing as part of #1268 ? This PR is open for long time now and we should get this to the closure.

oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Nov 14, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
oshankkumar pushed a commit to oshankkumar/glusterd2 that referenced this pull request Nov 21, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
@ghost ghost added the in progress label Dec 11, 2018
kshlm pushed a commit to oshankkumar/glusterd2 that referenced this pull request Dec 11, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document gluster#1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
@atinmu atinmu merged commit 0a44716 into gluster:master Dec 11, 2018
@ghost ghost removed the in progress label Dec 11, 2018
kshlm pushed a commit that referenced this pull request Dec 11, 2018
The transaction engine executes the given transaction
across the cluster.The engine is designed to make use
of etcd as the means of communication between peers.

Please refer Design Document #1003

cleanup leader: added a cleanup leader which will perform
all cleaning operation.

A leader is elected among the peers in the cluster to
cleanup stale transactions. The leader periodically
scans the pending transaction namespace for failed
and stale transactions, and cleans them up if rollback
is completed by all peers involved in the transaction.

Signed-off-by: Oshank Kumar <okumar@redhat.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants