-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A transport-level error has occurred when receiving results from the server if query length exceeds 956 characters #2141
Comments
Same error in Microsoft.Data.SqlClient version 5.2.0-preview3.23201.1 |
@guyvaio thanks for reporting this. We will check this and will get back to you. |
@guyvaio just to confirm that you are not using any stored procedures in the command? It is just passing actual text as command? |
No, reproduced with just n spaces followed by select 1. |
I reproduced the bug. |
Thanks for the repo. |
The same error occurs everywhere the query sent to SQL Server exceeds a certain size:
|
I tried a similar repro on Windows and all query sizes execute fine. So this appears to be limited to Android and/or .NET MAUI. We'll need to debug in that environment when someone gets time to set one up. There could just be in issue in one of the underlying libraries. |
Yes, initial issues are observed in Android. After reading the MDS source code with my dev team (many time lost !) we deduce this error occurs in all « unix » based app. The win-nt (!) compilation still uses native code. The error occurs in the new TDS parser that, obviously, is not finished nor stable. |
The place where it goes wrong is in src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SNI\SNIPacket.cs, line 258 If sqlCommand.Text has an accepted length, _dataLength is non zero. I guess the problem is in the Read method of underlying SslStream. Hope this can help. |
Further investigations have shown that _dataLength is zero (see above comment) because SQL Server closes the connexion. When the query exceeds 956 (or more) characters, the request sent to SQL Server is malformed, SQL Server detects it and closes the connexion. The error occurs in external code (seen in the stack) between line 318 of SNIPacket.cs
and line 183 of SslOverTdsStream.NetCoreApp.cs
We guess that the SSL encryption occurs between these 2 lines and that the encryption produces a wrong message. If in the connection string, Encrypt = false is forced and SQL Server does not require encryption, the error does not occur. The stream of the SNIPacket is of type SNINetworkStream (not a SNISslStream), so encryption does not occur anymore and everything goes wel. Hope this can help. Credits to @jFrancoisH for joined efforts to understand this mess. |
I do confirm there is no problem in iOS. |
@guyvaio Could there be a device between the client and server that is inspecting or manipulating the network packets? Maybe a security device trying to identify or sanitize potentially malicious SQL? I've seen it before. They can be very difficult to troubleshoot. Or maybe some security software? I could imagine that if something was modifying the encrypted data, the server would reject it and close the connection. |
@David-Engel Thanks for your comment. All tests have been done in a private LAN.
It seems to me that the conclusion we must draw is that there is a bug in the Android platform specific code that performs encryption. |
We have recently upgraded to .NET 8 for Remote Desktop Manager (RDM) on Android after TLS-related issues were finally fixed, only to start getting error reports from different customers for Microsoft.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - Success). I took the time to set up RDM Android with the latest Visual Studio 2022 Preview to launch it in WSA, and used a test VM with SQL Server as my RDM SQL Server data source with TLS encryption enforced server-side. I then used instructions for TLS pre-master secret dumping from LSASS to decrypt traffic on port 1433 from the server in Wireshark. RDM Windows can connect to the same SQL Server data source without problem - only RDM Android is affected, and it's very easy to reproduce, as we do a bunch of queries after the initial connection that go over the weird limit observed. At first sight, there doesn't seem to be an obvious issue with the query size, or even the response size: However, looking at the last query sent from the client to the server, we can observe an important difference: the query was fragmented in two TLS records (2048+307 bytes). This isn't TCP fragmentation of a single large TLS record like the previous query, and the fact that it has been split in two TLS records appears to have triggered a server-initiated disconnection upon reception: Normally, an application should be able to handle this nicely, but there's potentially something unhandled there. We do know that the issues only occur with TLS, and that TLS issues were only recently fixed. In my past experience with FreeRDP, I encountered similar weirdness between OpenSSL and SChannel - for instance, zero-length TLS records had to be disabled. Here we seem to be hitting a problem related to the way a single query message is fragmented over multiple TLS records. What about platforms where the issue doesn't happen, like RDM Windows? The equivalent query is sent as a single TLS record, and the server doesn't drop the connection! Applications normally handle such fragmentation differences correctly, but I really wouldn't be surprised to learn that SQL Server + SChannel doesn't handle this properly. I've had issues in the past with FreeRDP for CredSSP+OpenSSL interoperability where we had to disable zero-length TLS records because SChannel couldn't handle them. SQL Server probably has similar weird unhandled cases. I have no idea what triggers the fragmentation at the TLS record level in .NET 8 for Android, but the fragment size appears to be set at 2048, which is close-enough to a query of 956 characters encoded in UTF-16. The numbers fit, now we just need someone more knowledgeable of the stack to take a closer look at why it's behaving the way it is right now, and see how queries could be sent as a single TLS record fragmented at the TCP level instead of sending it as multiple TLS records, and I think it should resolve the issue. |
@awakecoding if I got it correctly, Server Hello is requesting Certificate from the client. What are pre-login messages sent to the server? |
Also, what connection string properties are being used? |
@JRahnama I can send decrypted Wireshark captures showing the actual traffic without TLS record fragmentation that works from Windows, and TLS record fragmentation causing SQL Server to disconnect the Android client, GitHub just doesn't allow .pcapng file attachments. There is no client certificate authentication involved, and the connection string doesn't matter, what matters is the issue always happens with TLS and query strings around 956 characters because they produce messages over 2048 bytes that somehow only get sent as multiple TLS records on Android, and no other platform. We suspect this would be an issue within the .NET runtime combined with SQL Server being unable to handle query messages split in multiple TLS records. I've been digging all morning trying to get SQL Server to produce traces to get a hint as to why it disconnects after receiving the query in split TLS records, but I couldn't get it to be verbose enough for that. This is a critical issue for us and we're looking for a swift response - and pretty much everyone else in this thread has been able to reproduce it easily: .NET 8, SQL Server with TLS, and a long query string. That's it, boom. We were able to hit the issue with a smaller sample like the one that was shared by @jFrancoisH, and our analysis now points in the same direction as @guyvaio - #2141 (comment) Knowing that at the protocol level, the fragmentation happens at 2048 bytes, the original magical value of 956 characters becomes less relevant. All you really need is a query resulting in a protocol message going over 2048 bytes, for which the end result is a single query message sent over multiple TLS records rather than a single TLS record fragmented at the TCP level. I could not observe the same behavior as Android from any other platform, and only Android is affected. I tried finding a hint in the SqlClient source code, and I don't see an obvious link with the fragmentation. However, there are many hints that point towards the .NET runtime implementation of SslStream: as @guyvaio reported earlier, it happens in external code (the .NET runtime here), in between lines from SqlClient. This makes a lot of sense given the poor state of SslStream for Android prior to .NET 8, it only just got fixed. This is not battle-tested code, there's been churn in that area very recently, so bugs like this would not be a surprise. In the .NET runtime, we can see the value 2048 used as the InitialBufferSize for SslStream. When _stream.Write() is called from SqlClient, it calls SSLStreamWrite from pal_sslstream.c. The managed wrapper of the native Android SSLStreamWrite is in Interop.Ssl.cs. I've tried to look and compare the various platform-specific implementations to find a hint on what might cause the difference in behavior, but I could not find it yet. However, Wireshark doesn't lie: those long queries are only split in multiple TLS records with .NET on Android, and all other platforms work. I'm still looking for undeniable proof that this is the reason SQL Server drops the connection, but from observation, there's a direct cause and effect relationship: if you stick to queries that result in messages smaller than 2048 bytes over the wire, the queries get sent as a single TLS record, and everything works fine. Other platforms send queries over 2048 as a single TLS record, and they work fine. Would it be possible to increase the priority on this issue and maybe get people from the .NET runtime involved? If you know people from the SQL Server team, maybe this could help confirm the theory about queries sent over multiple TLS records not being handled properly? |
@awakecoding we can look into this, but there is a release coming up on December 7th and have limited time up until that time. @wfurt could this be something related to your team? |
@JRahnama @wfurt here are the Wireshark captures (the link is only good for 24 hours), they have the TLS pre-master secrets embedded into the .pcapng, so Wireshark will decrypt TLS traffic. It's all test data so don't worry, but we can see that from Windows, we don't get queries split in multiple TLS records, while from Android, the last message before SQL Server disconnects the client is a query split in two TLS records. One thing I noticed is that SQL Server simply closes the TCP connection without even bothering to send a TLS alert, but it still closes it nonetheless: https://wormhole.app/5E9L3#EETArv5SLuAlwOy74lWqMw |
I've got further proof of the direct relationship between long queries split multiple TLS records and the server-initiated TCP disconnection: I caught it in WinDBG while inspecting traffic in Wireshark. The moment I resumed execution to let sqllang!Tcp::FCloseTransport complete, I saw the TCP FIN packet sent by the server. This is definitely the call stack in SQL Server. During my testing, I sometimes saw the split TLS records sent in multiple packets - sometimes they're batched together, but the end result is still the same. I can put a breakpoint on either sqllang!Tcp::FCloseTransport or sqllang!Tcp::FProcessSessionKill to break earlier on the call stack leading to the disconnection. There's some check done somewhere in sqllang!process_messages that results in an error condition, but with WinDBG disassembly I couldn't figure out much, and in IDA, it's a bit hard to follow: |
cc @simonrozsival for the Android angle. (it may be worth of trying .NET 8 as there were many Android improvements) We could dump more data from schannel or SslStream on the server to see if there is any TLS processing error or if the close comes from the SQL itself. |
While I agree that SQL Server should handle TDS RPC requests split into multiple TLS records or "messages", I think I just found the corresponding hint that it is not handled in MS-TDS:
Seeing how the protocol security is heavily tied to SSPI for non-TLS and then with SChannel SSPI on Windows for TLS, to me this is indicative that it is expecting requests to come in 1) individually and 2) as a complete message. It doesn't say it explicitly but it is most likely the byproduct of this condition. Of course the final confirmation would need to come from the SQL Server team, but in the meantime, what we need is the quickest workaround to avoid the fragmentation of TDS RPC messages (type 3) into multiple TLS records. I don't know if the 2048 initial buffer size pointed to earlier is truly responsible for the fragmentation, we'd need a way to at least double it to work around the problem until a more permanent solution can be found. |
Success! I finally managed to prove my point by increasing the value of InitialBufferSize from 2048 to 4096, rebuilt the .NET runtime from the v8.0.0 tag, then manually copied over my patched System.Net.Security.dll in the following locations, which is where it was picked up by Visual Studio according to ProcMon, and surprise, surprise: Remote Desktop Manager works again, no issue resulting from broken SQL queries, because the long TDS RPC requests no longer get fragmented. Of course, the issue would still happen for requests longer than 4096 bytes, but I feel like just doubling the buffer size is good enough to fix this blocking issue (some of our customers are unable to use RDM Android anything until we fix this). I don't really know if there's a better way to swap .NET runtime assemblies for a custom build, but here are the paths on Windows where I replaced System.Net.Security.dll with my locally built patched copy:
Despite having a separate copy for reach CPU architecture, the DLLs are exactly the same in all of those folders. Here's a zipped copy of my patched DLL for convenience: System.Net.Security.dll.zip And here's a patch file, but it's just a matter of changing I hadn't realized that we could attach zip files to GitHub comments until now, so here are the decrypted Wireshark captures mentioned earlier for my initial analysis of the issue: TL;DR: SQL Server requires TDS RPC requests to be sent as a single TLS record, and the protocol specification hints about this limitation. The initial buffer size of 2048 bytes used in the .NET 8 runtime SslStream stack results in fragmentation of messages into multiple TLS records, breaking SqlClient on Android when sending queries of about 956 characters (header size + UTF-16 string encoding makes it close to 2048 bytes). The only solution is to avoid splitting SQL queries (TDS RPC requests) in multiple TLS records, of face immediate disconnection from SQL Server. Increasing the initial buffer size to 4096 is sufficient to work around the issue for Remote Desktop Manager on Android: Since we need it a fix now, I'll look into overwriting System.Net.Security.dll in our CI build runners. I would be fine with just increasing the initial buffer size, but I somehow doubt the .NET runtime time will accept this as a proper fix. We should look into the best way to work around the issue in the short term until a more permanent solution can be found. |
@awakecoding Thank you the time you have taken into analysing this and finding the temporary fix. How does this works iOS? Any ideas?. |
As I tested it (with SQL Server) on Sep 25 there was no problem with iOS. |
@last-Programmer The issue is in the dotnet TLS stack on Android, not on the SQL side. So iOS is unaffected. |
My colleague @thenextman may have found the reason why the issue only happens with Android, from the docs of the Android SSLEngine APIs used by the .NET runtime:
If we look at the .NET runtime DoWrap() implementation using the Android SSLEngine APIs (simplified and commented in-line here):
TL;DR: the Android SSLEngineWrap API return STATUS__BUFFER_OVERFLOW to tell that it could not write all the data bytes to the output buffer. The end result is a TLS record which holds only part of the data we wanted to send. Here we just catch the condition to increase the buffer size for the next call to SSLEngineWrap, but not the current one, which is already too late for SQLClient. Ideally, we should increase the buffer size before calling SSLEngineWrap to avoid getting the overflow status code, and get a single, clean TLS record. |
I created a detailed issue report in the .NET runtime for the underlying problem that affects SqlClient: dotnet/runtime#95295 |
@awakecoding i need to set it to 8192 to get it working for my android app. my application uses SMO and it requires 8192 to get it working. It may break some other area of my app. |
I would recommend adding this information to the issue on the .NET runtime side just so they know what minimum size should be required in practice, which is higher in your case: dotnet/runtime#95295 |
@awakecoding so we can confirm that this is not SqlClient related issue? If the answer is yes, can we close the issue here and follow up in runtime repo please? |
The root cause of the problem is inside the .NET runtime and I do not know of a way to fix it within SqlClient, but I would like to keep this issue open here until it is fixed. Closing it would just make it look like it has been handled when it is not the case yet. However, let's follow up on the other issue, and come back to close it once the .NET runtime has an upstream fix. |
I'm wondering if we should push issue to the server. While we may improve the |
@wfurt I’m not certain who I should contact from the SQL Server team. |
@wfurt On the other hand, all clients on all other platforms work with the current server implementation |
@wfurt we can certainly try to push the issue to the SQL Server team, but only as an improvement on their side, and something we cannot expect to be fixed in a timely manner. Also, it works for all platforms other than Android, and MS-TDS documents this finicky behavior as a MUST. I feel like the best we'd get from the SQL Server team might be that they ask the protocol docs team to make the finicky behavior more clearly specified instead of improving the server. |
It still may work by accident ;( I can understand that the query is requested in single message but that should be SQL/TDS layer. Making assumptions about underlying TLS or TCP breaks protocol layering IMHO. I do think we should address dotnet/runtime#95295. But that may not be quick fix either, likely surfacing in 9.0 unless we can bring good case to servicing committee. |
@wfurt this isn't the first Microsoft protocol I encounter that's picky like this, this is a prime example of the legacy of SSPI-based security being transposed to TLS: the old SSPI message wrapping is just replaced by TLS records, and someone 20 years ago didn't feel like refactoring the code to handle TLS record fragmentation, so here we are, looking forward to hot-patching the .NET 8 SDK in our builds until this is fixed upstream. As a matter of fact we had to patch it again to increase the initial buffer size to 8192 as we did hit a special case where the query resulted in packets larger than 4096 bytes: System.Net.Security.dll.zip. As for fixing this on the SQL Server side, unless you have a direct point of contact within the development team, I really doubt they'll prioritize fixing something that worked by accident for decades. To add on top of this, the simple fact that it's in the protocol specification shows that someone must have hit this very issue in the past, prompting them to document the fragile behavior instead of fixing the implementation. |
This seems to be fixed in .net9 rc. |
Based on above discussion, closing this issue as external. |
In MAUI Android using Microsoft.Data.SqlClient, it's not possible to run a query exceeding 956 characters.
Exception message:
Stack trace:
Microsoft.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - Success)
at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action
1 wrapCloseInAction) in D:\a\_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlConnection.cs:line 2010 at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action
1 wrapCloseInAction) in D:\a_work\1\s\src\Microsoft.Data.SqlClient\src\Microsoft\Data\SqlClient\SqlInternalConnection.cs:line 770at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) in D:\a_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\TdsParser.cs:line 1404
at Microsoft.Data.SqlClient.TdsParserStateObject.ThrowExceptionAndWarning(Boolean callerHasConnectionLock, Boolean asyncClose) in D:\a_work\1\s\src\Microsoft.Data.SqlClient\src\Microsoft\Data\SqlClient\TdsParserStateObject.cs:line 734
at Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error) in D:\a_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\TdsParserStateObject.cs:line 1783
at Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync() in D:\a_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\TdsParserStateObject.cs:line 1256
at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket() in D:\a_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\TdsParserStateObject.cs:line 1181
at Microsoft.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer() in D:\a_work\1\s\src\Microsoft.Data.SqlClient\src\Microsoft\Data\SqlClient\TdsParserStateObject.cs:line 904
at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value) in D:\a_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\TdsParserStateObject.cs:line 446
at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) in D:\a_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\TdsParser.cs:line 2019
at Microsoft.Data.SqlClient.SqlDataReader.TryConsumeMetaData() in D:\a_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlDataReader.cs:line 1142
at Microsoft.Data.SqlClient.SqlDataReader.get_MetaData() in D:\a_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlDataReader.cs:line 258
at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted) in D:\a_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlCommand.cs:line 5157
at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest) in D:\a_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlCommand.cs:line 4951
at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method) in D:\a_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlCommand.cs:line 4621
at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method) in D:\a_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlCommand.cs:line 4491
at Microsoft.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior) in D:\a_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlCommand.cs:line 2050
at Microsoft.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior) in D:\a_work\1\s\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlCommand.cs:line 2000
at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.FillInternal(DataSet dataset, DataTable[] datatables, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet)
at MercatorApi.Api.Zselect(SqlCommand cmd, Boolean withSchema, MercatorSqlParam[] mercatorSqlParams) in M:\Mercator_dotnet\MercatorTunnel\MercatorTunnel\Api\MercatorApiZselectSqlClientCore.cs:line 133
ClientConnectionId:7f5e8352-c020-4aaa-9a88-42ca61cfaa28
To reproduce
sqlCommand.CommandText = new string(' ', 948) + "select 1";
works fine.
sqlCommand.CommandText = new string(' ', 949) + "select 1";
gives the error.
The text was updated successfully, but these errors were encountered: