Set up build and perf testing infrastructure #10

ajcvickers · 2021-04-23T15:05:02Z

We need a build/test environment that will allow us to run perf tests easily. We may also want simple functional/unit tests.

roji · 2021-04-23T15:08:35Z

The best way to start is probably to gradually build up a benchmark project with BenchmarkDotNet - Npgsql has a few which could be used as a starting point. Who knows, maybe these could evolve into a provider-agnostic benchmark suite.

Once there's something meaningful, we should be able to use crank to execute this in the perf lab etc.

thargol1 · 2021-06-04T10:09:19Z

Just for reference and comparison. This example is the fastest code I could think of the read records with Ado.net. I tried a lot of variants. I post this because everybody seems to ignore GetValues() on SqlDataReader.

The query is executed with CommandBehavior.SequentialAccess

private static DBUser ReadSequential(SqlDataReader dr)
{
	var data = new object[4];
	dr.GetValues(data);
	var obj = new DBUser
	{
		IdUser = (int)data[0],
		Key = (string)data[1], // a non-nullable string
		IsAdmin = (bool)data[2],
                Name = data[3] as string // a nullable string
	};
	return obj;
}

roji · 2021-06-04T10:18:50Z

@thargol1 I don't see any immediate reason why GetValues should be faster than calling GetInt32 or GetString... It's a single call-per-row rather multiple call-per-columns - so less virtual dispatch - but if you reference SqlDataReader directly (and that type is sealed) that shouldn't matter. GetValues may simply be implemented in a more efficient way in today's SqlClient, but that would be an implementation detail that wouldn't necessarily be relevant for a new, perf-oriented ADO.NET provider.

Aside from that, GetValues causes all value types to get boxed, which causes a lot of heap allocations - this is why I wouldn't recommend this API. This may not be an issue with the current SqlClient since value types probably get boxed anyway - but I'd expect any efficient ADO.NET provider to implement GetInt32 and indeed GetFieldValue<T> in a way which doesn't box (Npgsql does).

thargol1 · 2021-06-04T10:37:59Z

This code was generated by T4-template. I did a lot of tests in generating the optimal code. I tested GetString, GetInt32 etc, but they all were a little bit slower. I don't have an explanation for it, it's just the results of my tests.

My tests where not low-level benchmarkdotnet, but I used an RPS-test on a website reading the database.

If you create performance tests I think the fastest possible solution with currently available clients should be included as a reference. And in my uses case this type of code is the fastest.

roji · 2021-06-04T11:03:43Z

This code was generated by T4-template. I did a lot of tests in generating the optimal code. I tested GetString, GetInt32 etc, but they all were a little bit slower. I don't have an explanation for it, it's just the results of my tests.

I completely believe you, and I think it's good to know this - but this simply measure the SqlClient implementation of the various APIs. If you tried to do the same on Npgsql, for instance, I strongly suspect you'd see different results.

At the end of the day, purely as an API, GetValues is neither modern nor perf-oriented... IIRC it originated from before the time generics were introduced to .NET, so representing a row as an array of object was reasonable; it isn't really nowadays. And once again, the boxing of value types is problematic as it adds needless GC pressure.

If you create performance tests I think the fastest possible solution with currently available clients should be included as a reference. And in my uses case this type of code is the fastest.

Sure, if and when we get the point where we pit a new SqlClient.Core against SqlClient, each driver should use whatever coding pattern is fastest.

thargol1 · 2021-06-04T12:00:50Z

I'm sorry.. my previous tests were using an old version. I recreated my tests with BenchMarkDotNet instead of an RPS counter. Using System.Data.SqlClient my solution is a little bit faster, but within the margin of error. With Microsoft.Data.SqlClient the version with GetString() etc is a bit faster.

Method	Mean	Error	StdDev	Median	Ratio	RatioSD
'with SqlDataReader.GetValue()'	6.664 ms	0.2394 ms	0.6555 ms	6.426 ms	1.00	0.00
'with SqlDataReader.GetString()'	6.468 ms	0.2591 ms	0.7265 ms	6.352 ms	0.98	0.13

The slowest part in my code seems to be:

Name = reader.IsDBNull(3) ? null : reader.GetString(3)

as it requires 2 calls in to the datareader. Perhaps a reader.GetNullableString() and similar for other types would be an small performance addition as it could skip a few internal checks on the index etc. Or a reader.Get<T>(int ordinal) which would not throw on null-strings, but could also work with int? etc.

Or is it the aim to keep the current API intact and not introduce new methods?

roji · 2021-06-04T14:07:35Z

Everything is open at this point - some people believe that this new driver shouldn't at all adhere to the ADO.NET API surface. I personally believe we should stick to the current ADO.NET until proven that it's a significant perf blocker; a new SqlClient.Core implementation is work enough without having to invent a new database access API etc.

Regardless, note that once again, your tests are checking the IsDBNull and GetString implementation on the current SqlClient - so they mean very little in general, e.g. on what the API should look like on SqlClient.Core. It's very possible that SqlClient has some unnecessary overheads here.

thargol1 · 2021-06-04T15:39:35Z

Well if everything is 'open' I think Source Generators should also be considered. In situations where the query is known and the desired resulting class type is known, it should be possible to generate very low level and fast code to convert the data stream to a list/enumerable of the requested type.

It may require for the programmer to use extra data annotations on the class, but if it brings performance I'm up for it.

With source generators you only need a minimum driver with some low level functions API functions. This may result in a much smaller footprint.

I do not know how the internals of TDS or SqlPipe work. If you can point me to good examples and documentation I can do some experiments with source generators.

roji · 2021-06-04T18:46:58Z

Everything is indeed open, but some comments on your suggestions:

At least IMHO we're still trying to do a low-level SQL Server driver rather than an ORM or object mapper... Returning a list/enumerable of a user POCO goes beyond what database drivers typically, and is in the domain of something like Dapper or EF Core.
Not everyone wants to have a POCO - some people really just want to read primitive values directly from the database. Not everyone even can have a POCO, e.g. in dynamic scenarios you don't even know what query comes back.
One good reason this doesn't necessarily make sense inside the driver, is that it would have to be reimplemented in each driver (e.g. for SQL Server, for PostgreSQL, for MySQL...). IMHO mapping to an object is a concern for an upper, optional layer, which would work across multiple DB drivers.
It's not clear that reading back a POCO is actually going to be that good for perf: at the very least we'd be allocating an object, so increasing GC pressure. And I don't see a fundamental reason to assume that the code populating the POCO would be significantly more efficient than methods like GetInt32, GetString. Both would be decoding the database wire protocol, extracting variable-width data (e.g. strings) and passing along the value.

At least at the moment, the goal really is to concentrate on the basics: execute a query, get back the results, and do it in the fastest way possible. I do think we could explore all kinds of creative directions - possibly like what you propose above - but I think it's way too early for that.

shueybubbles · 2022-05-27T17:22:39Z

SQL Server performance for parameterized queries is also sensitive to data types sent over the wire. Different drivers map native language types to something appropriate for SQL Server and they don't always make good choices. The go-mssqldb driver, for example, sends all integer types as BigInt, causing unnecessary conversions on the server which are slow. I'm fixing that in the Microsoft fork.

As part of setting up a performance infrastructure, we should consider how to record the TDS packets and the query plans for our benchmark queries and use such recordings to flag regressions or improvements. They can also be used to compare driver performance from different languages/platforms like Go vs .Net vs JDBC.

roji · 2024-07-01T21:49:44Z

Closing issues as part of the repo archiving, see #22 for a summary of Woodstar.

ajcvickers added the project-woodstar label Apr 23, 2021

smitpatel mentioned this issue Apr 23, 2021

SqlServer.Core: Performance-oriented SQL Server .NET driver #6

Closed

roji mentioned this issue Jun 4, 2021

What code is generated to process records from SqlClient? dotnet/efcore#25035

Closed

roji closed this as not planned Won't fix, can't repro, duplicate, stale Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set up build and perf testing infrastructure #10

Set up build and perf testing infrastructure #10

ajcvickers commented Apr 23, 2021

roji commented Apr 23, 2021

thargol1 commented Jun 4, 2021

roji commented Jun 4, 2021

thargol1 commented Jun 4, 2021

roji commented Jun 4, 2021

thargol1 commented Jun 4, 2021 •

edited

Loading

roji commented Jun 4, 2021

thargol1 commented Jun 4, 2021

roji commented Jun 4, 2021 •

edited

Loading

shueybubbles commented May 27, 2022

roji commented Jul 1, 2024

Set up build and perf testing infrastructure #10

Set up build and perf testing infrastructure #10

Comments

ajcvickers commented Apr 23, 2021

roji commented Apr 23, 2021

thargol1 commented Jun 4, 2021

roji commented Jun 4, 2021

thargol1 commented Jun 4, 2021

roji commented Jun 4, 2021

thargol1 commented Jun 4, 2021 • edited Loading

roji commented Jun 4, 2021

thargol1 commented Jun 4, 2021

roji commented Jun 4, 2021 • edited Loading

shueybubbles commented May 27, 2022

roji commented Jul 1, 2024

thargol1 commented Jun 4, 2021 •

edited

Loading

roji commented Jun 4, 2021 •

edited

Loading