-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider not doing a roundtrip for BeginTransaction #1554
Comments
/cc @scoriani |
I had a brief look around and we don't seem to need the transaction id in order to send an rpc to the server. The transaction is part of session state. So this might be possible. It is worth noting though that since the transaction itself isn't an rpc it isn't something you'd send batched. You'd just have to run the existing code to get the transaction started and then batch up rpc afterwards. I think what you're asking for is to not stall the sending of the rpc until the transaction has returned which again might be possible but would probably be a bit of a headache to implement. I think it would be confusing to have successfully called BeginTransaction and then have a transaction based error occur later when you attempt to do some other work, that would leave people needing to handle transaction failures everywhere any query can occur rather than at the point where they ask for it. |
Yeah, error handling is what I mentioned above. Though can you give and example or two of errors that can happen when starting a transaction? |
Thanks for looping me in @roji. My concern for SQL Server is that BEGIN TRANSACTION increments @@TRANCOUNT system variable when invoked (while not really touching the log until an actual command is executed) so, if BeginTransaction() is not calling the T-SQL command as expected, it could break some logic. Also, while the impact of the additional roundtrip can be significant for a transaction containing a single command, in real world it is very likely that between BeginTransaction() and CommitTransaction() there will be multiple command executions, so the impact of the additional roundtrip will be less critical. |
The question is, would the later increment of @@TRANCOUNT be visible (or because logic breakage) if it would still be guaranteed to be done before the next command is executed... Do you have an example scenario in mind?
It's true that the more roundtrips are done in the transaction, the less an impact this has. However, even with two roundtrips in the transaction, the full thing currently takes 4 roundtrips in total (one for BEGIN, two for each actual command, one for COMMIT). Reducing that from 4 to 3 seems like it could be quite significant. Also, importantly, multiple commands in the transaction do not necessarily imply multiple roundtrips. For example, when SaveChanges is invoked in EF Core, we batch all changes as much as possible, frequently achieving a single batch/roundtrip with multiple commands (my work on this in EF is what prompted this issue). The transaction overhead still adds 2 roundtrips to that, and reducing that to one could be very valuable. Finally, IMHO roundtrips are particularly important to perf since their impact increases with the latency between the application and the database. In cloud scenarios, latency is typically a more significant factor than on-premises, so I'd say this optimization is particularly valuable for Azure SQL. If needed, this is something that could be measured. |
The only consequence I can see is that the server won't know that there is an open transaction on the connection. From what I can find the client side of the connection doesn't change much in the presence of a transaction. The transaction is just some state the actual rpc that are executes don't refer to the transaction. I don't know if it will improve performance dramatically. I suspect not. We will still have to open request the transaction, request the rpc, process transaction session update, process rpc result. It'll just mean we do them interleaved and temporally localiized instead of doing them spread over two different methods. The time taken shouldn't be any shorter you just do more work in a one chunk. Pay now or pay later. |
The idea here isn't to refactor some code in SqlClient - it's to actually batch the BEGIN TRANSACTION with the next command being executed. In other words, you wouldn't wait for confirmation that BEGIN TRANSACTION completed before executing the next command. If you do this, you will have eliminated a full network roundtrip - otherwise this has no value. |
Not completely sure, but I was thinking about things like statement interleaving in a MARS-enabled connection or such, when that concept of "next command" can be less obvious.
You're preaching to the choir on this one 😊 was just worried that in this very specific scenario worth the (potential) risk. Not mentioning that, if you're truly latency sensitive, I would probably recommend to wrap your SQL commands in batch with transactional logic controlled by explicit BEGIN/COMMIT TRAN, possibly in a stored procedure, rather than execute multiple individual commands from the client wherever possible. |
Eugh, MARS is why we can't have nice things. Can we just delete the feature entirely? |
It's an interesting point, though I'd want to understand this fully... It's true that MARS allows multiple resultsets to be open (and consumed) at the same time, but you still have sequential, fully-ordered API calls into SqlClient in .NET (and concurrent usage is not supported). In other words, at some point in the code you have a BeginTransaction(), and AFAIK any Execute* that happens after that is inside the transaction until the commit. So I'm not sure exactly how MARS complicates the situation here. I do agree this would need to be thought about more thoroughly - but at the moment I'm not seeing any specific issues which would result with deferring the BEGIN TRANSACTION to the next execution.
I don't disagree, but it's definitely not always possible (or even desirable) to do everything via stored procedures... They come with their own set of problems (e.g. code in the database that isn't managed in the application's source control), and data layers such as EF Core simply don't work this way. The reason I think this is appealing is that it would cut down a full round trip for any application doing transactions with SqlClient (should be a lot!), without any action/change in the applications themselves... It's an optimization that's effortless on the application side and whose value increases with the latency to the database. |
I guess as long as in case of overlapping commands we can always determine which one is first to flip the switch and propend the BEGIN TRAN to the following statements making sure other commands will enroll in the transaction, we're good. Any potential race condition in the Execute* code that needs to check if the "transaction bit" was flipped previously for that connection? |
I'm not sure how mars works with transactions. Does each mux get a separated spid or do they present as a single spid and share a transaction? If they share a transaction then we'd have to plumb the transaction pending flag all the way down into the physical object layer which would be a bit messy, you're effectively needing to know if anything you share a muxer with has set that flag and then co-operatively make sure only one mux tries to execute the transaction start request while holding up execution on any other mux that tries to start an rpc after it until the begin transaction has been sent (but not received). As I said, can we ditch mars? it'll make a lot of things much easier |
Yeah, I wish... but it was introduced to mimic "some other database engines" 😊 (especially those in the mid-range/DB2-400-ish space) and help with cross-platform migrations of ERP/Accounting software originally designed around that pattern, so hard to ditch |
Not that I love MARS or think it's a good idea, but unless I'm mistaken, most of the issues we're seeing are implementation problems in the managed implementation of MARS; in other words, it's not the concept itself which necessarily causes the problems (the native windows implementation doesn't seem to suffer from the same problems). So it should be possible to fix the managed implementation (e.g. stop doing sync-over-async), even if doing that probably wouldn't be easy. But what to do about MARS seems of topic in the specific context here. |
When it comes down to it almost everything is just a question of implementation. |
@roji I am adding this proposal as Performance enhancement feature in our backlog for future semester planning. |
A thought. We could add the feature and just turn it off if the connection is mars enabled. Users will get what they pay for, either they get perf or mars, it's already the case that you can't have both. |
Nice idea - it's certainly better than not doing it at all... |
In Npgsql, calling DbConnection.BeginTransaction doesn't actually do anything - it simply sets an in-memory state flag. When a command is next actually executed on the connection, the begin transaction is prepended to it, so they are delivered as a single batch. This eliminates an additional roundtrip to the database, which can be quite significant. SqlClient could do the same.
Note that this may have visible side-effects, e.g. if BeginTransaction() can throw because of any sort of incorrect state (I'm not sure this can actually happen in SQL Server, but it does need to be taken into account). Even if that's the case, the elimination of a full roundtrip for the happy path would still likely justify this, IMHO.
The text was updated successfully, but these errors were encountered: