-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Parameter Usage on MultipleRowsCopy/MultipleRowsCopyAsync #2975
Allow Parameter Usage on MultipleRowsCopy/MultipleRowsCopyAsync #2975
Conversation
helper.LastRowParameterIndex = helper.ParameterIndex; | ||
helper.LastRowStringIndex = helper.StringBuilder.Length; | ||
addFunction(helper, item!, from); | ||
|
||
if (helper.CurrentCount >= helper.BatchSize || helper.Parameters.Count > maxParameters || helper.StringBuilder.Length > maxSqlLength) | ||
var needRemove = helper.Parameters.Count > maxParameters || | ||
helper.StringBuilder.Length > maxSqlLength; | ||
if (helper.CurrentCount >= helper.BatchSize || needRemove) | ||
{ | ||
if (needRemove) | ||
{ | ||
helper.Parameters.RemoveRange(helper.LastRowParameterIndex, helper.ParameterIndex-helper.LastRowParameterIndex); | ||
helper.StringBuilder.Length = helper.LastRowStringIndex; | ||
} | ||
finishFunction(helper); | ||
if (!helper.Execute()) | ||
return helper.RowsCopied; | ||
if (needRemove) | ||
{ | ||
addFunction(helper, item!, from); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think this could -possibly- be simplified a little bit. I put LastRowParameterIndex
and LastRowStringIndex
into MultipleRowsHelper
since .Execute()
is what is clearing out the other bits we are comparing to. I'm not certain whether that is considered better or worse than just tracking those values locally in these functions.
other option to clean up would be if(CurrentCount > rowcount )
->else if (needRemoveCondition)
.
…ase where a single row takes us over a limit; we should try anyway. Made MaxParameters and MaxSqlLength virtual properties so we can set it per-provider bulk copy. Set for Postgres, oracle, sqlite, sqlserver
49fe3c5
to
9caf4b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some more comments since I have questions before I do much more.
Hoping for feedback before I go too far down a rabbit hole:
Should we limit Parameters to 8000 even if provider supports more (for LOH concerns)?
Also, Whether it's OK to move Implementations of MultipleRowsHelper/(Async)
methods in BasicBulkCopy
into Instance of MultipleRowsHelper
, Open to better names than Bind
/BindAsync
. I tried doing another branch where we cached parameter arrays, and the code got pretty ugly and a bit hard to understand when kept in BasicBulkCopy
.
private readonly OracleDataProvider _provider; | ||
|
||
private const int _maxParameters = 32766; | ||
private const int _maxSqlLength = 327670; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really, Oracle allows much longer SQL length; I'm not sure what number really 'makes sense' for batching however, huge strings probably aren't great for a number of reasons. These consts are here because this class has static functions so they need a const or static passed in.
@@ -8,6 +8,9 @@ namespace LinqToDB.DataProvider.SQLite | |||
|
|||
class SQLiteBulkCopy : BasicBulkCopy | |||
{ | |||
protected override int MaxParameters => 999; | |||
protected override int MaxSqlLength => 1000000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went higher on this number since SQLite is going to be parsed locally and we won't have a network hop.
@@ -15,6 +15,9 @@ namespace LinqToDB.DataProvider | |||
|
|||
public class BasicBulkCopy | |||
{ | |||
protected virtual int MaxParameters => 999; | |||
protected virtual int MaxSqlLength => 100000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting this as a virtual -feels- better, but I still ask whether we should let Max Length be tunable by the user or not (For Network round-trip purposes). I decided not, since MaxBatchSize
is good enough to do the same most likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COuld be usful. In baselines I see increase in number of roundtrips e.g. for DB2
int maxParameters = 10000, | ||
int maxSqlLength = 100000) | ||
int maxParameters, | ||
int maxSqlLength) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be worth moving these, even with a simple Redirection for now, into a 'Bind' like method on MultipleRowsHelper
. Over 15 helper.
in a method feels like a smell to me.
@@ -12,7 +12,9 @@ namespace LinqToDB.DataProvider.SqlServer | |||
|
|||
class SqlServerBulkCopy : BasicBulkCopy | |||
{ | |||
private readonly SqlServerDataProvider _provider; | |||
protected override int MaxParameters => 2099; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK if this is the value we -really- want to use, because of dotnet/SqlClient#974. Large number of parameters is actually counter-productive for performance.
if (helper.CurrentCount >= helper.BatchSize || helper.Parameters.Count > maxParameters || helper.StringBuilder.Length > maxSqlLength) | ||
var needRemove = helper.Parameters.Count > maxParameters || | ||
helper.StringBuilder.Length > maxSqlLength; | ||
var isSingle = helper.CurrentCount == 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this check because there may be an edge case where even a single row was putting us over Max SQL limit.
Since we are now trying to be more correct here, we now 'back off' the last row when we go over maxSqlLength
, rather than just continuing and hoping for the best. That said, we should still try to insert in case of Single row, and let provider fail if it really is an issue.
if (needRemove && !isSingle) | ||
{ | ||
helper.Parameters.RemoveRange(helper.LastRowParameterIndex, helper.ParameterIndex-helper.LastRowParameterIndex); | ||
helper.StringBuilder.Length = helper.LastRowStringIndex; | ||
helper.RowsCopied.RowsCopied--; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do this before finishFunction, in case finishFunction
decides to add a parameter in future.
@viceroypenguin If you get a chance can you beat me up on this one please? |
First thing I would suggest is move the The final concern will be the text length. For that, your current solution will work, though I think you can track |
Thank you for the feedback! I'll give that a try. I was trying not to be too aggressive in this PR since next round (i.e. another PR) I'd like to optimize the build stage so that we don't re-build the string/parameter lists. I think to do it well we'll want to re-do the loop.
We might be able to pull out at least some of the logic, challenge is each one is just different enough where more indirection feels like as much cognitive load as the duped code. But, I think overall the loop structure can be improved at least; Rather than Adding the item, and then checking length to decide whether to back off, we can check length at the start of the iteration and decide what to do. I think it will make the code a little cleaner, and will probably be cleaner as it will have less jumps. |
Definitely agree with re-working to check the length before adding. That would simplify all three back to reasonable levels again and don't need to abstract the three versions of the loop into one. Personally, I'd say go ahead and do it, rather than trying to piece-meal in two separate PRs, but that's a personal opinion. On the other hand, if you're going to redo it in a second PR, then what you've got works well enough for now I think. |
If I wasn't about to deal with a bunch of release stuff at work that's gonna have me busy for a while I'd agree. :) But I think second pr is going to be a bigger refactoring of things, I'm still thinking of doing #2960 and that will mean a decent amount of change to this again most likely. Question though, based on your other feedback; Should I put the |
I looked into setting MaxBatchSize based on param usage, it unfortunately gets a little hacky to work-around the behavior of Oracle's Array-parameter copy. Might be better to get what is here into 3.4.0 for now, so I am marking ready to see if there is other feedback or if we must make more changes first. |
/azp run test-all |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide sources for limits for various providers in comments? I see some of them specified but some not
@@ -88,5 +88,10 @@ public BulkCopyOptions(BulkCopyOptions options) | |||
/// This callback will not be used if <see cref="NotifyAfter"/> set to 0. | |||
/// </summary> | |||
public Action<BulkCopyRowsCopied>? RowsCopiedCallback { get; set; } | |||
|
|||
/// <summary> | |||
/// Gets or sets whether to Always use Parameters for MultipleRowsCopy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to add default value information and behavior in this mode like splitting operation into batches if parameters limit reached for operation.
Tests/Linq/Update/BulkCopyTests.cs
Outdated
@@ -276,5 +276,22 @@ public void ReuseOptionTest([DataSources(false, ProviderName.DB2)] string contex | |||
db.Child. BulkCopy(options, new[] { new Child { ParentID = 111001 } }); | |||
} | |||
} | |||
|
|||
[Test] | |||
public void UseParametersTest([DataSources(false, ProviderName.DB2)] string context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong with DB2?
@to11mtm, we plan to release 3.4 next thursday, it would be nice if we can include this one. |
…hub.com/to11mtm/linq2db into Allow-Parameter-Use-On-Bulk-Copy-Part-1
/azp run test-all |
Azure Pipelines successfully started running 1 pipeline(s). |
I'll try to find my notes again. Most came from https://www.jooq.org/doc/3.12/manual/sql-building/dsl-context/custom-settings/settings-inline-threshold/ but for the strings I had to look in some other places. |
Test baselines changed by this PR. Don't forget to merge/close baselines PR after this pr merged/closed. |
Yeah I found FirebirdSqlOptimizer.WrapParameters and cried for a few minutes. I'm giving it a try with just casts on all rows. Ironically, after our other optimizations to SQL building (i.e. #2774) It appears that at least for SQL Server and maybe postgres, Parameters are slower than our string literals at this point. my guess/assumption is that the cost of marshaling the parameters is eating away any benefits, and because we aren't treating this as a prepared statement there's no gains to be had there either. My hope is that when myself or someone else does a more thorough refactoring of |
/azp run test-firebird |
Azure Pipelines successfully started running 1 pipeline(s). |
@@ -77,7 +79,27 @@ public virtual void BuildColumns(object item, Func<ColumnDescriptor, bool>? skip | |||
{ | |||
var name = ParameterName == "?" ? ParameterName : ParameterName + ++ParameterIndex; | |||
|
|||
StringBuilder.Append(name); | |||
if (castParameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CAST needed only for first select in union. For other rows db will use type information from first type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah in retrospect I don't think it would hurt to make this if (castParameters && CurrentCount==0)
. I was over-thinking the jump cost lol.
StringBuilder.Append(name); | ||
StringBuilder.Append(" AS "); | ||
StringBuilder.Append(column.DataType); | ||
if (!string.IsNullOrEmpty(column.DbType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use sql builder here as it already have db-specific type builder method that takes into account db-specific stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you could give some pointers on which method to try to use in SQLBuilder I can give it a shot after work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BuildDataType https://github.com/linq2db/linq2db/blob/master/Source/LinqToDB/SqlProvider/BasicSqlBuilder.cs#L2716
but you need to make it protected internal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into some challenges here (we are using ISqlBuilder
, can't cleanly internal on an interface,) so tried to solve for this by adding to ISqlBuilder Interface, StringBuilder BuildDataType (StringBuilder sb, SqlDataType dataType);
. I went with returning the stringbuilder because that was the convention for BuildTableName, Convert, and others.
/azp run test-all |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run test-all |
@to11mtm, if you are interested in working on this area more I have several suggestions for improvements:
|
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run test-all |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run test-all |
Azure Pipelines successfully started running 1 pipeline(s). |
Test baselines changed by this PR. Don't forget to merge/close baselines PR after this pr merged/closed. |
Yeah, I want to do more work on this in future; My big goal is to have version of BulkCopy that will return inserted Identity columns (issue #2960 ), but it is a bigger refactor because every DB that supports returning on MultipleRows has different rules on how to do this right. I started looking into re-using the generated SQL, I think for it to be of actual benefit we will need to do more refactoring. Our stringbuild is pretty efficent as-is, to get a worthwhile benefit I think we would have to also |
/azp run test-all |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run test-all |
Azure Pipelines successfully started running 1 pipeline(s). |
* [Windows / NET472 / SQLite.MS] baselines * [Windows / NET472 / SQL CE] baselines * [Windows / NET472 / Access ODBC MDB] baselines * [Windows / NET472 / SQL Server 2012 (System.Data.SqlClient)] baselines * [Windows / NET472 / SQL Server 2005 (System.Data.SqlClient)] baselines * [Windows / NET472 / SQL Server 2008 (System.Data.SqlClient)] baselines * [Windows / NET472 / SQL Server 2016 (System.Data.SqlClient)] baselines * [Windows / NET472 / SQL Server 2014 (System.Data.SqlClient)] baselines * [Windows / NET472 / SQL Server 2017 (System.Data.SqlClient)] baselines * [Windows / NET472 / Access Jet] baselines * [Windows / NET472 / SQLite] baselines * [Windows / NETCOREAPP2.1 / SQL CE] baselines * [Windows / NETCOREAPP2.1 / Access ODBC ACE x64] baselines * [Windows / NETCOREAPP2.1 / SQLite.MS] baselines * [Windows / NET472 / SQL Server 2019 (System.Data.SqlClient)] baselines * [Windows / NET472 / SQL Server 2019 (Microsoft.Data.SqlClient)] baselines * [Windows / NETCOREAPP2.1 / SQL Server 2016 (System.Data.SqlClient)] baselines * [Windows / NETCOREAPP2.1 / SQL Server 2005 (System.Data.SqlClient)] baselines * [Linux / NETCOREAPP2.1 / Informix 14.10] baselines * [Windows / NETCOREAPP2.1 / SQL Server 2008 (System.Data.SqlClient)] baselines * [Linux / NETCOREAPP2.1 / DB2 LUW 11.5] baselines * [Windows / NET 5.0 / SQL Server 2019 (Microsoft.Data.SqlClient)] baselines * [Linux / NET5.0 / PostgreSQL 13] baselines * [Linux / NETCOREAPP3.1 / PostgreSQL] baselines * [Linux / NETCOREAPP3.1 / MySQL] baselines * [Windows / NETCOREAPP2.1 / SQL Server 2012 (System.Data.SqlClient)] baselines * [Windows / NETCOREAPP2.1 / SQL Server 2017 (System.Data.SqlClient)] baselines * [Linux / NET5.0 / Sybase ASE 16] baselines * [Linux / NET5.0 / SQLite] baselines * [Linux / NETCOREAPP2.1 / MariaDB] baselines * [Windows / NETCOREAPP2.1 / SQL Server 2014 (System.Data.SqlClient)] baselines * [Linux / NETCOREAPP2.1 / Firebird 3.0] baselines * [Linux / NETCOREAPP2.1 / Firebird 2.5] baselines * [Linux / NETCOREAPP2.1 / MySQL 5.5] baselines * [Linux / NETCOREAPP2.1 / PostgreSQL 10] baselines * [Linux / NETCOREAPP2.1 / PostgreSQL 11] baselines * [Linux / NETCOREAPP2.1 / PostgreSQL 12] baselines * [Linux / NETCOREAPP2.1 / PostgreSQL 9.2] baselines * [Linux / NETCOREAPP2.1 / PostgreSQL 9.3] baselines * [Linux / NETCOREAPP2.1 / PostgreSQL 9.5] baselines * [Linux / NETCOREAPP2.1 / Oracle 11g XE] baselines * [Linux / NETCOREAPP2.1 / Oracle 12c] baselines * [Linux / NETCOREAPP2.1 / SAP HANA 2] baselines * [Linux / NETCOREAPP2.1 / Firebird 4.0 (RC1)] baselines Co-authored-by: Azure Pipelines Bot <azp@linq2db.com>
Allow Parameter usage on MultipleRowsCopy/MultipleRowsCopyAsync via
BulkCopyOptions
. Fix #2144