Skip to content
This repository has been archived by the owner on Jul 7, 2024. It is now read-only.

Set up build and perf testing infrastructure #10

Closed
ajcvickers opened this issue Apr 23, 2021 · 11 comments
Closed

Set up build and perf testing infrastructure #10

ajcvickers opened this issue Apr 23, 2021 · 11 comments

Comments

@ajcvickers
Copy link
Member

We need a build/test environment that will allow us to run perf tests easily. We may also want simple functional/unit tests.

@roji
Copy link
Member

roji commented Apr 23, 2021

The best way to start is probably to gradually build up a benchmark project with BenchmarkDotNet - Npgsql has a few which could be used as a starting point. Who knows, maybe these could evolve into a provider-agnostic benchmark suite.

Once there's something meaningful, we should be able to use crank to execute this in the perf lab etc.

@thargol1
Copy link

thargol1 commented Jun 4, 2021

Just for reference and comparison. This example is the fastest code I could think of the read records with Ado.net. I tried a lot of variants. I post this because everybody seems to ignore GetValues() on SqlDataReader.

The query is executed with CommandBehavior.SequentialAccess

private static DBUser ReadSequential(SqlDataReader dr)
{
	var data = new object[4];
	dr.GetValues(data);
	var obj = new DBUser
	{
		IdUser = (int)data[0],
		Key = (string)data[1], // a non-nullable string
		IsAdmin = (bool)data[2],
                Name = data[3] as string // a nullable string
	};
	return obj;
}

@roji
Copy link
Member

roji commented Jun 4, 2021

@thargol1 I don't see any immediate reason why GetValues should be faster than calling GetInt32 or GetString... It's a single call-per-row rather multiple call-per-columns - so less virtual dispatch - but if you reference SqlDataReader directly (and that type is sealed) that shouldn't matter. GetValues may simply be implemented in a more efficient way in today's SqlClient, but that would be an implementation detail that wouldn't necessarily be relevant for a new, perf-oriented ADO.NET provider.

Aside from that, GetValues causes all value types to get boxed, which causes a lot of heap allocations - this is why I wouldn't recommend this API. This may not be an issue with the current SqlClient since value types probably get boxed anyway - but I'd expect any efficient ADO.NET provider to implement GetInt32 and indeed GetFieldValue<T> in a way which doesn't box (Npgsql does).

@thargol1
Copy link

thargol1 commented Jun 4, 2021

This code was generated by T4-template. I did a lot of tests in generating the optimal code. I tested GetString, GetInt32 etc, but they all were a little bit slower. I don't have an explanation for it, it's just the results of my tests.

My tests where not low-level benchmarkdotnet, but I used an RPS-test on a website reading the database.

If you create performance tests I think the fastest possible solution with currently available clients should be included as a reference. And in my uses case this type of code is the fastest.

@roji
Copy link
Member

roji commented Jun 4, 2021

This code was generated by T4-template. I did a lot of tests in generating the optimal code. I tested GetString, GetInt32 etc, but they all were a little bit slower. I don't have an explanation for it, it's just the results of my tests.

I completely believe you, and I think it's good to know this - but this simply measure the SqlClient implementation of the various APIs. If you tried to do the same on Npgsql, for instance, I strongly suspect you'd see different results.

At the end of the day, purely as an API, GetValues is neither modern nor perf-oriented... IIRC it originated from before the time generics were introduced to .NET, so representing a row as an array of object was reasonable; it isn't really nowadays. And once again, the boxing of value types is problematic as it adds needless GC pressure.

If you create performance tests I think the fastest possible solution with currently available clients should be included as a reference. And in my uses case this type of code is the fastest.

Sure, if and when we get the point where we pit a new SqlClient.Core against SqlClient, each driver should use whatever coding pattern is fastest.

@thargol1
Copy link

thargol1 commented Jun 4, 2021

I'm sorry.. my previous tests were using an old version. I recreated my tests with BenchMarkDotNet instead of an RPS counter. Using System.Data.SqlClient my solution is a little bit faster, but within the margin of error. With Microsoft.Data.SqlClient the version with GetString() etc is a bit faster.

Method Mean Error StdDev Median Ratio RatioSD
'with SqlDataReader.GetValue()' 6.664 ms 0.2394 ms 0.6555 ms 6.426 ms 1.00 0.00
'with SqlDataReader.GetString()' 6.468 ms 0.2591 ms 0.7265 ms 6.352 ms 0.98 0.13

The slowest part in my code seems to be:

Name = reader.IsDBNull(3) ? null : reader.GetString(3)

as it requires 2 calls in to the datareader. Perhaps a reader.GetNullableString() and similar for other types would be an small performance addition as it could skip a few internal checks on the index etc. Or a reader.Get<T>(int ordinal) which would not throw on null-strings, but could also work with int? etc.

Or is it the aim to keep the current API intact and not introduce new methods?

@roji
Copy link
Member

roji commented Jun 4, 2021

Everything is open at this point - some people believe that this new driver shouldn't at all adhere to the ADO.NET API surface. I personally believe we should stick to the current ADO.NET until proven that it's a significant perf blocker; a new SqlClient.Core implementation is work enough without having to invent a new database access API etc.

Regardless, note that once again, your tests are checking the IsDBNull and GetString implementation on the current SqlClient - so they mean very little in general, e.g. on what the API should look like on SqlClient.Core. It's very possible that SqlClient has some unnecessary overheads here.

@thargol1
Copy link

thargol1 commented Jun 4, 2021

Well if everything is 'open' I think Source Generators should also be considered. In situations where the query is known and the desired resulting class type is known, it should be possible to generate very low level and fast code to convert the data stream to a list/enumerable of the requested type.

It may require for the programmer to use extra data annotations on the class, but if it brings performance I'm up for it.

With source generators you only need a minimum driver with some low level functions API functions. This may result in a much smaller footprint.

I do not know how the internals of TDS or SqlPipe work. If you can point me to good examples and documentation I can do some experiments with source generators.

@roji
Copy link
Member

roji commented Jun 4, 2021

Everything is indeed open, but some comments on your suggestions:

  • At least IMHO we're still trying to do a low-level SQL Server driver rather than an ORM or object mapper... Returning a list/enumerable of a user POCO goes beyond what database drivers typically, and is in the domain of something like Dapper or EF Core.
  • Not everyone wants to have a POCO - some people really just want to read primitive values directly from the database. Not everyone even can have a POCO, e.g. in dynamic scenarios you don't even know what query comes back.
  • One good reason this doesn't necessarily make sense inside the driver, is that it would have to be reimplemented in each driver (e.g. for SQL Server, for PostgreSQL, for MySQL...). IMHO mapping to an object is a concern for an upper, optional layer, which would work across multiple DB drivers.
  • It's not clear that reading back a POCO is actually going to be that good for perf: at the very least we'd be allocating an object, so increasing GC pressure. And I don't see a fundamental reason to assume that the code populating the POCO would be significantly more efficient than methods like GetInt32, GetString. Both would be decoding the database wire protocol, extracting variable-width data (e.g. strings) and passing along the value.

At least at the moment, the goal really is to concentrate on the basics: execute a query, get back the results, and do it in the fastest way possible. I do think we could explore all kinds of creative directions - possibly like what you propose above - but I think it's way too early for that.

@shueybubbles
Copy link

SQL Server performance for parameterized queries is also sensitive to data types sent over the wire. Different drivers map native language types to something appropriate for SQL Server and they don't always make good choices. The go-mssqldb driver, for example, sends all integer types as BigInt, causing unnecessary conversions on the server which are slow. I'm fixing that in the Microsoft fork.

As part of setting up a performance infrastructure, we should consider how to record the TDS packets and the query plans for our benchmark queries and use such recordings to flag regressions or improvements. They can also be used to compare driver performance from different languages/platforms like Go vs .Net vs JDBC.

@roji
Copy link
Member

roji commented Jul 1, 2024

Closing issues as part of the repo archiving, see #22 for a summary of Woodstar.

@roji roji closed this as not planned Won't fix, can't repro, duplicate, stale Jul 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants