-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QueryUnbufferedAsync with FirstAsync is slow if the query returns lots of records #2115
Comments
That's curious, because it should "just work"; while I try to investigate, however, a pragmatic workaround may be to use |
Using QueryFirstAsync has the same effect. |
I believe this may simply be the server latency, i.e. the server is taking a long time to fetch the first row; does the same query in, for example, SSMS (or the relevant tool for your database) behave similarly? |
Yes it's slow in SSMS because it's loading millions of rows. It will be loading the first row quickly but the query is not finished. Using FirstAsync is having to wait for all rows to be loaded. If I write it using DbCommand it's much faster when the command is cancelled before the reader is disposed.
|
hm; OK, that's interesting; I'll see what I can do with that |
So; I can't quite repro as stated, but: I can see a pain point that is specific to async - specifically it is async and single-row specified and we read the primary grid to completion. This might also be specific to the ADO.NET provider - which exact provider are you using? is it Microsoft.Data.SqlClient? or System.Data.SqlClient? Here's my test rig with what I see: Found 100000 rows...
Reading one row, badly, via async Dapper...
Got row 1039451802 in 2534ms
Reading one row, badly, via async Dapper...
Got row 1039451802 in 23ms
Reading one row, badly, via sync Dapper + LINQ...
Got row 1039451802 in 21ms
Reading one row, badly, via sync Dapper + LINQ...
Got row 1039451802 in 2ms
Reading one row, badly, via async DapperAOT...
Got row 1039451802 in 2373ms
Reading one row, badly, via async DapperAOT...
Got row 1039451802 in 22ms
Reading one row, badly, via sync DapperAOT + LINQ...
Got row 1039451802 in 20ms
Reading one row, badly, via sync DapperAOT + LINQ...
Got row 1039451802 in 21ms
Reading one row, badly, via async ADO.NET...
Got row 1039451802 in 20ms
Reading one row, badly, via sync ADO.NET...
Got row 1039451802 in 20ms
Reading one row, badly, via async ADO.NET; read-grid...
Got row 1039451802 in 41ms
Reading one row, badly, via sync ADO.NET; read-grid...
Got row 1039451802 in 20ms
Reading one row, badly, via async ADO.NET; single-row...
Got row 1039451802 in 20ms
Reading one row, badly, via sync ADO.NET; single-row...
Got row 1039451802 in 21ms
Reading one row, badly, via async ADO.NET; single-row; read-grid...
Got row 1039451802 in 2363ms
Reading one row, badly, via sync ADO.NET; single-row; read-grid...
Got row 1039451802 in 20ms The main rows of interest are: Reading one row, badly, via async Dapper...
Got row 1039451802 in 2534ms
Reading one row, badly, via async DapperAOT...
Got row 1039451802 in 2373ms
Reading one row, badly, via async ADO.NET; single-row; read-grid...
Got row 1039451802 in 2363ms Do you see the same? I'm using Microsoft.Data.SqlClient; if you're using System.Data.SqlClient, does changing to Microsoft.Data.SqlClient make what you see more consistent with what I see? code: using Dapper;
using Microsoft.Data.SqlClient;
using System.Data;
using System.Diagnostics;
static class Program
{
static async Task<int> Main()
{
var builder = new SqlConnectionStringBuilder()
{
TrustServerCertificate = true,
DataSource = ".",
InitialCatalog = "master",
IntegratedSecurity = true,
};
using var conn = new SqlConnection(builder.ConnectionString);
await conn.OpenAsync();
const int Expected = 100_000;
int count;
try
{
count = conn.QuerySingle<int>("""
select count(1)
from LoadTest;
""");
Console.WriteLine($"Found {count} rows...");
}
catch (SqlException ex) when (ex.Number == 208)
{
Console.WriteLine("Creating table...");
await conn.ExecuteAsync("""
create table LoadTest(id int not null, name nvarchar(100) not null);
""");
count = 0;
}
catch (Exception ex)
{
Console.Error.WriteLine(ex.Message);
return -1;
}
if (count < Expected)
{
int need = Expected - count;
Console.WriteLine($"Inventing {need} rows...");
var rand = Random.Shared;
var data = from id in Enumerable.Range(1, need)
select new MyRow { Id = rand.Next(), Name = CreateName(rand) };
using var bcp = new SqlBulkCopy(conn);
bcp.DestinationTableName = "LoadTest";
bcp.EnableStreaming = true;
await bcp.WriteToServerAsync(TypeAccessor.CreateDataReader(data));
Console.WriteLine($"Copied: {bcp.RowsCopied}");
}
await RunDapper(conn);
await RunDapperAot(conn);
await RunAdoNet(conn, false, false);
await RunAdoNet(conn, false, true);
await RunAdoNet(conn, true, false);
await RunAdoNet(conn, true, true);
return 0;
}
const string Sql = """
select id, name from LoadTest;
""";
public static async ValueTask<T> FirstAsync<T>(this IAsyncEnumerable<T> source)
{
await using var iter = source.GetAsyncEnumerator();
if (!await iter.MoveNextAsync()) throw new EndOfStreamException("oops");
return iter.Current;
}
[DapperAot(false)]
async static Task RunDapper(SqlConnection conn)
{
Console.WriteLine("Reading one row, badly, via async Dapper...");
var timer = Stopwatch.StartNew();
var row = await conn.QueryFirstAsync<MyRow>(Sql);
timer.Stop();
Console.WriteLine($"Got row {row.Id} in {timer.ElapsedMilliseconds}ms");
Console.WriteLine("Reading one row, badly, via async Dapper...");
timer = Stopwatch.StartNew();
row = await conn.QueryUnbufferedAsync<MyRow>(Sql).FirstAsync();
timer.Stop();
Console.WriteLine($"Got row {row.Id} in {timer.ElapsedMilliseconds}ms");
Console.WriteLine("Reading one row, badly, via sync Dapper + LINQ...");
timer = Stopwatch.StartNew();
row = conn.QueryFirst<MyRow>(Sql);
timer.Stop();
Console.WriteLine($"Got row {row.Id} in {timer.ElapsedMilliseconds}ms");
Console.WriteLine("Reading one row, badly, via sync Dapper + LINQ...");
timer = Stopwatch.StartNew();
row = conn.Query<MyRow>(Sql, buffered: false).First();
timer.Stop();
Console.WriteLine($"Got row {row.Id} in {timer.ElapsedMilliseconds}ms");
}
[DapperAot(true)]
async static Task RunDapperAot(SqlConnection conn)
{
Console.WriteLine("Reading one row, badly, via async DapperAOT...");
var timer = Stopwatch.StartNew();
var row = await conn.QueryFirstAsync<MyRow>(Sql);
timer.Stop();
Console.WriteLine($"Got row {row.Id} in {timer.ElapsedMilliseconds}ms");
Console.WriteLine("Reading one row, badly, via async DapperAOT...");
timer = Stopwatch.StartNew();
row = await conn.QueryUnbufferedAsync<MyRow>(Sql).FirstAsync();
timer.Stop();
Console.WriteLine($"Got row {row.Id} in {timer.ElapsedMilliseconds}ms");
Console.WriteLine("Reading one row, badly, via sync DapperAOT + LINQ...");
timer = Stopwatch.StartNew();
row = conn.QueryFirst<MyRow>(Sql);
timer.Stop();
Console.WriteLine($"Got row {row.Id} in {timer.ElapsedMilliseconds}ms");
Console.WriteLine("Reading one row, badly, via sync DapperAOT + LINQ...");
timer = Stopwatch.StartNew();
row = conn.Query<MyRow>(Sql, buffered: false).First();
timer.Stop();
Console.WriteLine($"Got row {row.Id} in {timer.ElapsedMilliseconds}ms");
}
async static Task RunAdoNet(SqlConnection conn, bool singleRow, bool readGrid)
{
var suffix = (singleRow ? "; single-row" : "") + (readGrid ? "; read-grid" : "");
Console.WriteLine($"Reading one row, badly, via async ADO.NET{suffix}...");
var timer = Stopwatch.StartNew();
using var cmd = conn.CreateCommand();
cmd.CommandType = CommandType.Text;
cmd.CommandText = Sql;
MyRow row;
var Flags = (CommandBehavior.SequentialAccess | CommandBehavior.SingleResult) | (singleRow ? CommandBehavior.SingleRow : 0);
await using (var reader = await cmd.ExecuteReaderAsync(Flags))
{
if (!await reader.ReadAsync())
{
throw new EndOfStreamException("no row!");
}
row = new()
{
Id = reader.GetInt32(0),
Name = reader.GetString(1)
};
if (readGrid)
{
while (await reader.ReadAsync()) { }
}
while (await reader.NextResultAsync())
{
}
}
timer.Stop();
Console.WriteLine($"Got row {row.Id} in {timer.ElapsedMilliseconds}ms");
Console.WriteLine($"Reading one row, badly, via sync ADO.NET{suffix}...");
timer = Stopwatch.StartNew();
using (var reader = cmd.ExecuteReader(Flags))
{
if (!reader.Read())
{
throw new EndOfStreamException("no row!");
}
row = new()
{
Id = reader.GetInt32(0),
Name = reader.GetString(1)
};
if (readGrid)
{
while (reader.Read()) { }
}
while (reader.NextResult())
{
}
}
timer.Stop();
Console.WriteLine($"Got row {row.Id} in {timer.ElapsedMilliseconds}ms");
}
static string CreateName(Random rand)
{
const string Alphabet = "abcdefghijklmnopqrstuvwxyz 0123456789,;-";
var len = rand.Next(5, 101);
Span<char> name = stackalloc char[len];
foreach (ref char c in name)
{
c = Alphabet[rand.Next(Alphabet.Length)];
}
return new string(name);
}
public class MyRow
{
public int Id { get; set; }
public string Name { get; set; } = "";
}
} |
I am using System.Data.SqlClient. Switching to Microsoft.Data.SqlClient has no improvement. I've ran your code and the speeds match. I tried my table and it takes 20 seconds for QueryUnbufferedAsync. My table has an int primary key, another int column, a smallint column, a decimal column, three datetime columns and 3 other int columns. My table has over 10 million records. If I just select the Id column for example, it works in under a second. Selecting all columns in SSMS takes 20 seconds. If you create a table with enough columns and rows it will be slow in SSMS and hence slow with QueryUnbufferedAsync/FirstAsync. Calling reader.DisposeAsync will wait for the command to finish (all rows). https://stackoverflow.com/questions/66771460/sqldatareader-command-timing-out-on-close-dispose |
See, the tricky thing here is: TDS. In TDS, errors can follow the data, and people usually want to see errors even if they occur after the data - we've had multiple error reports about when this hasn't been the case. While I'm sympathetic to making this as fast as it can be, ultimately
linking to https://aot.dapperlib.dev/rules/DAP231 If we cancel the command, we don't correctly capture error metadata |
To be explicit: my plan is to see whether we can get what we need with the change, in {
...
- while (await reader.ReadAsync(cancel).ConfigureAwait(false)) { /* ignore rows after the first */ }
...
}
while (await reader.NextResultAsync(cancel).ConfigureAwait(false)) { /* ignore result sets after the first */ } |
OK, so: good news first - we can trivially fix Random context: I haven't changed |
typical test output now: bulk-insert complete; 500000 rows in 18ms
bad things
bad things
sync top 1 read first complete; row 2119983792 in 6ms
async top 1 read first complete; row 2119983792 in 1ms
sync read first complete; row 2119983792 in 407ms
async read first complete; row 2119983792 in 320ms
sync read with error complete in 268ms; bad things
async read with error complete in 314ms; bad things
sync unbuffered LINQ read first complete; row 2119983792 in 1ms
async unbuffered LINQ read first complete; row 2119983792 in 3ms The bottom row is where we started here, so: that's fixed. The others are larger than I'd like, but as shown in the first 2 rows: the answer there is "add |
Using the System.Linq.Async NuGet Package.
If "table" has millions of rows this is slow. IAsyncEnumerable is designed to be more efficient, at least from a memory point of view. But when calling FirstAsync you would expect it to run quickly as each row is returned to the calling code when it is read.
The reader is being disposed without cancelling the command first. I did look at using ExecuteReaderAsync but I can't get access to the command.
The text was updated successfully, but these errors were encountered: