-
Notifications
You must be signed in to change notification settings - Fork 24
Set up build and perf testing infrastructure #10
Comments
The best way to start is probably to gradually build up a benchmark project with BenchmarkDotNet - Npgsql has a few which could be used as a starting point. Who knows, maybe these could evolve into a provider-agnostic benchmark suite. Once there's something meaningful, we should be able to use crank to execute this in the perf lab etc. |
Just for reference and comparison. This example is the fastest code I could think of the read records with Ado.net. I tried a lot of variants. I post this because everybody seems to ignore The query is executed with private static DBUser ReadSequential(SqlDataReader dr)
{
var data = new object[4];
dr.GetValues(data);
var obj = new DBUser
{
IdUser = (int)data[0],
Key = (string)data[1], // a non-nullable string
IsAdmin = (bool)data[2],
Name = data[3] as string // a nullable string
};
return obj;
} |
@thargol1 I don't see any immediate reason why GetValues should be faster than calling GetInt32 or GetString... It's a single call-per-row rather multiple call-per-columns - so less virtual dispatch - but if you reference SqlDataReader directly (and that type is sealed) that shouldn't matter. GetValues may simply be implemented in a more efficient way in today's SqlClient, but that would be an implementation detail that wouldn't necessarily be relevant for a new, perf-oriented ADO.NET provider. Aside from that, GetValues causes all value types to get boxed, which causes a lot of heap allocations - this is why I wouldn't recommend this API. This may not be an issue with the current SqlClient since value types probably get boxed anyway - but I'd expect any efficient ADO.NET provider to implement |
This code was generated by T4-template. I did a lot of tests in generating the optimal code. I tested GetString, GetInt32 etc, but they all were a little bit slower. I don't have an explanation for it, it's just the results of my tests. My tests where not low-level benchmarkdotnet, but I used an RPS-test on a website reading the database. If you create performance tests I think the fastest possible solution with currently available clients should be included as a reference. And in my uses case this type of code is the fastest. |
I completely believe you, and I think it's good to know this - but this simply measure the SqlClient implementation of the various APIs. If you tried to do the same on Npgsql, for instance, I strongly suspect you'd see different results. At the end of the day, purely as an API, GetValues is neither modern nor perf-oriented... IIRC it originated from before the time generics were introduced to .NET, so representing a row as an array of object was reasonable; it isn't really nowadays. And once again, the boxing of value types is problematic as it adds needless GC pressure.
Sure, if and when we get the point where we pit a new SqlClient.Core against SqlClient, each driver should use whatever coding pattern is fastest. |
I'm sorry.. my previous tests were using an old version. I recreated my tests with BenchMarkDotNet instead of an RPS counter. Using
The slowest part in my code seems to be: Name = reader.IsDBNull(3) ? null : reader.GetString(3) as it requires 2 calls in to the datareader. Perhaps a Or is it the aim to keep the current API intact and not introduce new methods? |
Everything is open at this point - some people believe that this new driver shouldn't at all adhere to the ADO.NET API surface. I personally believe we should stick to the current ADO.NET until proven that it's a significant perf blocker; a new SqlClient.Core implementation is work enough without having to invent a new database access API etc. Regardless, note that once again, your tests are checking the IsDBNull and GetString implementation on the current SqlClient - so they mean very little in general, e.g. on what the API should look like on SqlClient.Core. It's very possible that SqlClient has some unnecessary overheads here. |
Well if everything is 'open' I think Source Generators should also be considered. In situations where the query is known and the desired resulting class type is known, it should be possible to generate very low level and fast code to convert the data stream to a list/enumerable of the requested type. It may require for the programmer to use extra data annotations on the class, but if it brings performance I'm up for it. With source generators you only need a minimum driver with some low level functions API functions. This may result in a much smaller footprint. I do not know how the internals of TDS or SqlPipe work. If you can point me to good examples and documentation I can do some experiments with source generators. |
Everything is indeed open, but some comments on your suggestions:
At least at the moment, the goal really is to concentrate on the basics: execute a query, get back the results, and do it in the fastest way possible. I do think we could explore all kinds of creative directions - possibly like what you propose above - but I think it's way too early for that. |
SQL Server performance for parameterized queries is also sensitive to data types sent over the wire. Different drivers map native language types to something appropriate for SQL Server and they don't always make good choices. The go-mssqldb driver, for example, sends all integer types as BigInt, causing unnecessary conversions on the server which are slow. I'm fixing that in the Microsoft fork. As part of setting up a performance infrastructure, we should consider how to record the TDS packets and the query plans for our benchmark queries and use such recordings to flag regressions or improvements. They can also be used to compare driver performance from different languages/platforms like Go vs .Net vs JDBC. |
Closing issues as part of the repo archiving, see #22 for a summary of Woodstar. |
We need a build/test environment that will allow us to run perf tests easily. We may also want simple functional/unit tests.
The text was updated successfully, but these errors were encountered: