-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 48 atomic updates #69
Issue 48 atomic updates #69
Conversation
I added a test for the scenario mentioned in the TODO section and fixed via a double delete. |
new MessageIterator(persistenceId, fromSequenceNr, toSequenceNr, max).foreach( msg => { | ||
replayCallback(msg) | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unintended changes from adding logging and removing, will remove
Hi @chbatey thanks for the pull request and sorry for the late reply. The mechanism for having varied partition sizes so that a batch fits into a single partition looks good to me. However, ensuring that only The |
Regarding the name |
preparedDeletePermanent.bind(mid.persistenceId, partitionNr(mid.sequenceNr): JLong, mid.sequenceNr: JLong) | ||
val firstPnr: JLong = partitionNr(mid.sequenceNr) | ||
val stmt = preparedDeletePermanent.bind(mid.persistenceId, firstPnr: JLong, mid.sequenceNr: JLong) | ||
// the message could be in next partition as a result of an AtomicWrite, alternative is a read before write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a read-before-delete be done for every message or can this be implemented more efficiently? If we could implement that with a single read per partition (or n reads where n is a small constant) I'd prefer going for that approach. We only append data, so race conditions shouldn't be an issue. If we need a read for every message, we should go for the redundant-delete approach (as users might delete millions of old messages with a single request).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - let me change this and maybe put another static field we can read if we've moved sequences into the next partition
On 1 Aug 2015, at 15:32, Martin Krasser notifications@github.com wrote:
In src/main/scala/akka/persistence/cassandra/journal/CassandraJournal.scala:
}
private def asyncDeleteMessages(messageIds: Seq[MessageId]): Future[Unit] = executeBatch { batch =>
messageIds.foreach { mid =>
val stmt =
preparedDeletePermanent.bind(mid.persistenceId, partitionNr(mid.sequenceNr): JLong, mid.sequenceNr: JLong)
val firstPnr: JLong = partitionNr(mid.sequenceNr)
val stmt = preparedDeletePermanent.bind(mid.persistenceId, firstPnr: JLong, mid.sequenceNr: JLong)
Would a read-before-delete be done for every message or can this be implemented more efficiently? If we could implement that with a single read per partition (or n reads where n is a small constant) I'd prefer going for that approach. We only append data, so race conditions shouldn't be an issue. If we need a read for every message, we should go for the redundant-delete approach (as users might delete millions of old messages with a single request).// the message could be in next partition as a result of an AtomicWrite, alternative is a read before write
—
Reply to this email directly or view it on GitHub.
+1 for the schema changes. |
Regarding the property name. I think the normal case is not using atomic writes, and then this only happens when an atomic write spans a partition boundary so we should use a config name for when the span doesn't happen e.g minPartitionSize or perhaps targetPartitionSize |
My interpretation is that atomic writes are when the actor uses persistAll and we're only meant to guarantee atomicity for these messages. With the use case that one command produces multiple events so recovering a subset of the events from a persistAll would leave the actor in an inconsistent state. The seq is an internal optimisation for persisting many calls to persistAsync in a single trip to the journal plugin. The language changed quite a bit for M2 here is the snipper for AtomicWrites:
And from the ScalaDoc of AsyncWriteJournal:
WDYT? Worth checking with @ktoso |
You're right. I wrongly assumed that we have logged C* batches but that's not the case for a single-partition batch-write (which is then atomic and isolated). Great implementation then, and sorry for the noise 😉. |
+1 for |
/me catching up with the discussion here, I like what I see! :-)
Yeah, I don't think that will happen, and if it happens it's an user error - feel free to reject such large atomic writes with "hey, that's crazy large!" ;-)
Correct. The atomicity guarantee is only about that group of events. The
Correct again, the seq is there to spare multiple round trips to the journal plugin, instead we batch them up and send them together (this is to allow batch inserts in certain datastores).
I see what edge case is meant there, but I don't think it relates to this PR since here we're talking about the AtomicWrite, which we can keep in one partition (as I understood). I'll dive into code later on, looks great from what I've skimmed so far! |
@ktoso @chbatey I agree with all statements made regarding atomicity but after thinking more about the specification, I see an issue with the assumption made in the
This is actually not the case. From a business logic perspective, later messages in the Consequently, this means that these messages must not be written independently to Cassandra (btw, this problem is unrelated to the atomicity of To avoid this, we need to ensure that earlier messages have been successfully written to C* before newer messages are written. The easiest way to achieve this is writing the whole Thoughts? |
Good catch - indeed that would be a problem. I'll open a ticket to change the wording in the docs as well. The solution you propose looks good. |
Right, same algorithm applied to the whole I should mention that we may run into these problems only when using We also do not run into these problems when using In my opinion, the best solution would be to get rid of tl;dr: for the majority of use cases the current PR is good enough. If |
Great point @krasserm we can either do as you said or chain atomic writes for the same persistenceId to ensure previous batches aren't executed until the previous one succeeds. Or at the akka-persistence loosen the constraints for persistAsync (but I don't think that is a useful programming model). I'll update this PR to merge atomic writes for the same persistenceId in the mean time. |
I agree that batching across different persistentIds are very interesting and should be possible to add with the current api. I don't think we even say how the batching is performed and that all messages in the batch come from the same persistenceId, so if a journal is basing something on that I think that is only based on vague implementation assumptions. That kind of batching should increase the system overall throughtput when using many persistent actors, which is also what we think is most important. However, it is not a replacement for persistAsync for high throughput use cases for a single persistent actor. I don't fully understand your concern about persistAsync and why we should remove it. The tentative update of the state is indeed exotic. The new failure handling that stops the actor unconditionally in case of storage failures might influence the design of such usage? |
Akka Persistence should definitely offer functionality to allow batching (high-throughput) of a single writer. I just think that
for writing events to the journal. To derive state from the written events, applications should use a In my opinion, the main idea behind A separate I think the main reason that
Yes it would. And it would be great if you could make the unconditional a conditional for making a migration of older apps possible. Anyway, that sort of failure handling is most useful if actor state is in sync with the journal i.e. for a However, it doesn't really make sense for This is another reason why a separation of concerns would make sense. Having a separate |
I think chaining would be even better as we may run into the very same issue for multiple
See my previous comment for a detailed description of motivation why |
Martin, thanks for the detailed clarification. PersistentWriter is an interesting idea. Perhaps it could be stream based? I will discuss this topic with Konrad the next few days.
|
Yeah, that would really make sense.
In this case, With this combination of |
After some thinking and discussions we have concluded that for 2.4 we will not touch persistAsync. The idea with the PersistentWriter can be explored more later. Since streams is experimental we can anyway not incorporate it in standard akka-persistent now. |
Reminder to what needs doing on this:
|
Regarding 3) @krasserm for this PR I am planning on just putting them in the same partition. When we do M3 and have the persistenceId in the AtomicWrite I'll change it + I want to add some extra features to stubbed cassandra (discussed here: scassandra/scassandra-server#103) so we can actually test that batch N+1 isn't executed until N is complete. At the same time I'll add tests for all the failure scenarios as stubbed cassandra can deterministically create them all. |
@chbatey +1 to your plan. Also, would love to see stubbed cassandra being used for testing. |
Hi @krasserm i've updated this to do a single read per partition before starting deletes i.e select inuse, sequence_nr from messages where persistence_id = ? and partition_nr = ? order by sequence_nr desc limit 1 Which results in no unnecessary tombstones. Each partition is then split into delete batches based on max delete size. Given this is quite a significant change I'll wait for you to get back to review. |
|
||
This version of `akka-persistence-cassandra` depends on Akka 2.3.9 and is cross-built against Scala 2.10.4 and 2.11.6. It is compatible with Cassandra 2.1.0 or higher. Versions of the Cassandra plugins that are compatible with Cassandra 1.2.x are maintained on the [cassandra-1.2](https://github.com/krasserm/akka-persistence-cassandra/tree/cassandra-1.2) branch. | ||
This version of `akka-persistence-cassandra` depends on Akka 2.4.M2 and is cross-built against Scala 2.10.4 and 2.11.6. It is compatible with Cassandra 2.1.0 or higher. Versions of the Cassandra plugins that are compatible with Cassandra 1.2.x are maintained on the [cassandra-1.2](https://github.com/krasserm/akka-persistence-cassandra/tree/cassandra-1.2) branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Akka 2.4-RC1
Great work, thanks a lot @chbatey. Most of my comments are only minor ones except the general comment on atomicity of deletes (which has already been an issue in all previous versions). Curious what you think ... |
Updated based on your comments @krasserm let me know if it is okay and I'll do some squashing. |
LGTM @chbatey |
… they span logical partitions
Squashed into a commit for AtomicWrites and a commit for merging AtomicWrites for the the persistence id. |
Do you plan further updates on this PR or is it ready to merge (tracking further work by separate tickets)? |
Ready for merge, will raise a new issue to look at deletes that i'll do before 2.4 release. When are you planning on making master 2.4? |
We should merge to master when tickets in columns Ready and InProgress are done (except #64, #77 and #79). WDYT? |
Can you please close all tickets that are covered by this PR? |
+1 |
…triknw make C* test server more lightweight
Hi @krasserm - this will needs some clean up I just wanted some feedback / early discussions on the schema changes. I'll remove all the logging I was just 100% checking that I understood how/when akka-persistence called into the plugin.
I removed marker and added a static boolean column called inUse. This will be set on any write and when we skip an entire partition due to a large persistAll. This is used for scans (like the marker used to be) to make sure we keep scanning if entire partitions have been deleted or skipped.
I removed COMPACT STORAGE as this will limit our future schema changes and the space issue is being addressed in the next version of Cassandra (https://issues.apache.org/jira/browse/CASSANDRA-8099).
For atomicity I went with a similar approach to discussed on the issue:
So the trade off is that we may end up with slightly varied partition sizes but I don't see that as an issue and will only be noticeable if people have huge AtomicWrites.
TODO:
Couple other things: