-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Akka.Persistence.Sql.Common: add separate tags table for indexing individual tags #5296
Comments
One thing to note about how tags are currently stored: they are unordered in the semicolon-delimited list. The tags "A", "B", and "C" will appear in any order in the
|
@malclear and that works today because the SQL we use has a What I'd propose as part of an ETL process is parsing that field - each tag into its own row. |
Would that mean duplicating the payload for each tag? If so, that's not insignificant. I think a many-to-many join table, or a tag-specific child table would be better for us. |
@malclear yeah, after thinking about it - for SQL it has to be less expensive to just rely on an index rather than denormalization. |
This was eventually addressed in the IMO, we should put all the effort in the |
Yes. We should. |
I wish we had this thread a few months ago when this was fresher in my head... The nice thing about doing this all in Persistence.Linq2Db is that we will only have to do it once for everyone. However, there's some caveats that should be kept in mind:
If we don't do any of the above and go the 'lazy' route (individual writes for everything but at least transacted) we'll see a huge perf drop on scale-out. Mind you I still think it would be faster than if done in Sql.Common, but we are probably talking a 50% or more drop in performance. |
Will there be any performance hit if you don't use tags? |
If we adopt a "blind write" approach and have the primary key be: Tag, PersistentId, SeqNo With a secondary index on Ordering (that can be a server-side timestamp) Would that work? All of that should be predictable and sorted on the index. |
Yes. how much depends on the route chosen. In addition to the write aspect I mentioned, there is also the aspect of Reads. Adding the Tag table means we either need to do two DB queries and combine the results or do a one left-join DB query. Either are easy but if you are in a 'mixed mode' (some events with tags, some without) as the tagged events grow there could be a read side impact as well. I'd expect that if you are never using tags in a design, the impact on reads should be negligible in any of the cases. As far as Writes go:
'Work' yes. But then you'd need to include all three columns on every Tag row for a given record, and it's a harder join on read. |
I'm going to be annoying here, but for your use case at work which of these three options would make you feel least concerned? When the technical merits don't present a clear winner, I like to use a "regret minimization" framework :p |
I've never used tags in any capacity with Akka.Persistence. So I'm trying to approach the question from a balance of 'what would make a DBA happy' vs 'what would be most efficient'. I think Option A is going to provide the best balance of:
One thing to keep in mind with row-per-tag; if there is an identity primary key, there is a chance that workloads with a LOT of events with a LOT of tags could conceivably overflow a bigint. We can make a primary key on the Tags table optional but -that- is something a lot more contentious in DBA circles. |
@Aaronontheweb I can write this up as an issue in Akka.Persistence.Linq2Db, but a couple questions/doublechecks came to mind:
|
Yep, that's always preferable - we don't want to treat a production users' database roughly without their consent. Feature flags that are opt-in are a low-friction way to implement this.
Yes, we should do an exact text match.
Yep, fine with it.
Can you open an issue with a checklist of specific things to look at or do you want an Akka.Persistence.Query veteran (i.e. @ismaelhamed or @Arkatufus ) to just give it a look over?
I like the idea of spitting out a SQL script that the DBA can squint at. Happy to create a repo for a utility tool for this if you think it's wise to separate it. Will the migration path include current Akka.Persistence.SqlServer users too? I think that'd be a good idea to help jumpstart the migration over. |
We already handle migration from Akka.Persistence.SqlServer in a way similar to what I describe above (In 'pure' mode we don't use Journal Metatdata for tracking sequence numbers on deletes.) This change shouldn't be an issue for the compatibility mode switches we already have in place. IOW Yes. There's even a migration guide |
@Aaronontheweb the more I think about this, I think the right thing to do is NOT do the Ulid bit, and Take the performance hit on Tagged Writes until linq2db/linq2db#2960 is resolved. We can give guidelines to users that use tags that a different configuration (with more parallel writers) may be wanted to help mitigate impact. |
Done as part of Akka.Persistence.Sql in August of this year. |
Is your feature request related to a problem? Please describe.
Right now all of our tags in Akka.Persistence.SqlServer and other related plugins are simply stored as comma-delimited text that is, effectively, not indexed:
https://github.com/akkadotnet/Akka.Persistence.SqlServer/blob/ccf62119cf99daf9ea77edd1f7596d74f53ea5a6/src/Akka.Persistence.SqlServer/Journal/SqlServerQueryExecutor.cs#L56-L78
When we perform a "read by tags" query, it performs a full table scan of all items in the Akka.Persistence.Journal:
https://github.com/akkadotnet/Akka.Persistence.SqlServer/blob/ccf62119cf99daf9ea77edd1f7596d74f53ea5a6/src/Akka.Persistence.SqlServer/Journal/SqlServerQueryExecutor.cs#L35-L41
For any reasonably large journal (i.e. millions of events or more) this is going to be non-performant.
Describe the solution you'd like
I'm trying to decide if it would be better to:
The idea in either case - avoid full table scans and use an index to query events by tag.
Describe alternatives you've considered
One alternative is to simply leave as-is and maybe have users implement their own custom journals.
Additional context
One nasty thing about introducing this change is how we'd make it backwards compatible with current journal implementations - there would, essentially, need to be an ETL process or tool of some kind introduced.
The text was updated successfully, but these errors were encountered: