-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maintain a time index to support an akka read journal #103
base: master
Are you sure you want to change the base?
Conversation
@jypma thanks for your pull request. I'll review and comment in the next couple of days. Cheers, Martin |
Thanks a lot. I think the initial feedback that might be covered could include:
|
@jypma we designed writes to Cassandra in a way that they always go to a single partition in order to avoid issues discussed in #48. With your addition, writes may again go to different partitions, resulting in a logged batch which suffers from the problems described in #48. We are currently discussing a general architecture for creating indices and supporting akka-persistence-query in #77 (/cc @zapletal-martin). The index is created asynchronously so that writing additional tables is not on the fast write path of akka-persistence. It would be great to additionally implement a time index based on this architecture. WDYT? |
Just commented over there. I fully agree this should go in the same direction. However, our project timelines might require us to go on with this forked branch for the moment. I'll at least add some test cases for the query side seeing index values and main table values out of order. Secondly, I'll play with the idea to have the time/window be extracted from the main event (as an offline indexer would have to do). That would make it easier to upgrade/transition later. |
@jypma I fully understand that waiting for #77 to be ready is in conflict with your project timelines. I'm willing to merge your contribution as a temporary solution to support your query plugin but it needs modification so that writes go to a single partition. Can you imagine creating the time index with a background indexer running concurrently to the journal actor? Or do you plan to continue with the |
@krasserm I understand your concerns. I've changed the PR itself to at least not touch the main events table, and derive an event's timestamp from the event itself (which would be needed anyways to allow async, replayable indexing). This way, upgrading should be a little easier. I'm undecided whether I'd go on to make the actual indexing async at this point. I expect I'd run into some of the same challenges that #77 tries to address. Plus, our particular application is somewhat latency-sensitive (time from an event being emitted to any real-time source picking it up shouldn't be more than a second or so). Let's just keep the PR open for now, for reference, and come back to it when an async indexer is in play. By the way, since akka 2.4 targets Java 8+, can this plugin as well? I prefer to use |
@jypma ok, let's keep the PR open for now. Thanks anyway for your contribution. Regarding Java version, we should of course also target to 8+. |
This adds an index table to cassandra, so events can be queried "roughly" by time. The akka journal query plugin implementation is in a separate library.
The way this works, is for every time window (say, 1 minute) to add a persistenceId to the index table once, if it's changed in that time window. Index size will be somewhat limited by only indexing the first change to a persistenceId during a time window.
The query API can then find what changed when, up to the accuracy of a time window. This allows remote / distributed views to resume into the event stream, without having to re-start from 0.
There are working integration tests in the query implementation repository.