Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DbDataReader.GetSchemaTable without a resultset should return empty table instead of null #509

Closed
roji opened this issue Dec 4, 2019 · 23 comments
Assignees

Comments

@roji
Copy link
Member

roji commented Dec 4, 2019

While working on nullability annotation for System.Data.Common, I ran across some odd and inconsistent behavior when GetSchemaTable and GetColumnSchema are called and a resultset isn't present (e.g. a non-SELECT statement was executed, or NextResult was called and returned false).

Provider behavior

  • GetSchemaTable: SqlClient and Npgsql return null when there is no resultset, Sqlite returns an table (may be empty?), MySQL throws. No doc/spec info exists on this.
  • GetColumnSchema: SqlClient and Npgsql return an empty list of columns when there is no resultset. Sqlite returns a non-empty column list (?), MySQL throws. No doc/spec info exists on this.

Notes

  • Other reader metadata methods which require a resultset - GetName, GetDataTypeName, etc. - throw InvalidOperationException, so this method we have a behavior inconsistency.
  • There are some rare dynamic scenarios (especially with stored procedures) where a user legitimately cannot be expected to know in advance whether the reader has a resultset or not. User can check FieldCount == 0 to identify whether a resultset exists or not.

Options

  1. We could make GetSchemaTable return a nullable DataTable, but that would make it harder to use for everyone in the 99% case.
  2. We could make SqlClient and Npgsql to throw for this scenario, aligning with GetName and other metadata methods (minor breaking change). We would want to do this for GetColumnSchema as well to make sure they behave the same way. This would require a non-breaking change from MySqlConnector and possibly Sqlite.
  3. We could make SqlClient and Npgsql return an empty DataTable (minor breaking change), aligning with GetColumnSchema. GetSchemaTable/GetColumnSchema are different from GetName/GetDataType since they can return an empty table/list, which clearly expresses the lack of a resultset.

I vote for option 3. It would allow the method to cleanly return a non-nullable DataTable

Test code

[Fact]
public virtual void GetSchemaTable_returns_null_when_no_resultset()
{
	using var connection = CreateOpenConnection();
	using var command = connection.CreateCommand();
	command.CommandText = "SELECT 1";
	using var reader = command.ExecuteReader();
	reader.NextResult();
	Assert.Null(reader.GetSchemaTable());
}

/cc @cheenamalhotra @David-Engel @saurabh500 @Wraith2 @bgrainger @bricelam @ajcvickers @AndriySvyryd

@roji roji added design-discussion Ongoing discussion about design without consensus area-System.Data labels Dec 4, 2019
@roji roji added this to the 5.0 milestone Dec 4, 2019
@roji roji self-assigned this Dec 4, 2019
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Dec 4, 2019
@roji
Copy link
Member Author

roji commented Dec 4, 2019

/cc @ErikEJ

@roji
Copy link
Member Author

roji commented Dec 4, 2019

/cc @FransBouma

@bgrainger
Copy link
Contributor

The current behaviour in MySqlConnector was due to mysql-net/AdoNetApiTest#28; see also mysql-net/MySqlConnector#678.

@roji
Copy link
Member Author

roji commented Dec 4, 2019

Forgot about that conversation :)

How would you feel about my suggestion of returning an empty DataTable/column collection for these methods? It wouldn't be a breaking change for you (as the methods currently throw), and it seems consistent with the idea of no resultset having "zero columns" (as expressed by FieldCount).

@David-Engel
Copy link
Contributor

I vote for option 3, too. Seems cleaner, more logical, and easier to code against.

@bgrainger
Copy link
Contributor

  1. We could make SqlClient and Npgsql return an empty list

Do you mean "DataTable", not "list"?

How would you feel about my suggestion of returning an empty DataTable/column collection for these methods?

I have no objection; there's value in consistency and I doubt anyone is relying on the current behaviour (of throwing an exception).

@roji
Copy link
Member Author

roji commented Dec 4, 2019

Do you mean "DataTable", not "list"?

Yeah, thanks - corrected it (thinking about GetSchemaTable and GetColumnSchema at the same time).

@saurabh500
Copy link
Contributor

I like the idea of an empty DataTable instead of a null. However I wonder if any code takes a decision based on the value of the schema object returned like don't show a UI grid if a null value is received. @roji has already mentioned that it is a breaking change.
I don't have data though. Putting a hypothetical situation here.
It might be worth following up with SMO.

@saurabh500
Copy link
Contributor

SMO - SQL management objects library

@FransBouma
Copy link
Contributor

If you execute queries without knowing whether they will return a resultset, you in general will do that with ExecuteReader(), and first examine the GetSchemaTable. If it's null, you can assume there wasn't a resultset returned, so the query likely was a DML query.
If there is a datatable with columns, you can assume there's a resultset.

But indeed the 'null' isn't reliable as there's no documentation that this should happen (in my code I assume it's null, but perhaps some obscure providers fail here). So I opt for option 3: the empty datatable is a good solution, as its emptiness (no columns nor rows) is a reliable indicator no resultset will be returned by the reader.

It would be a breaking change for my desktop facing designer system but it's a simple change to make. What's more important tho is: how to enforce all ADO.NET providers to do the right thing here. As there's code out there relying on this (has to be) for e.g. 'mysql's behavior', it won't be easy to persuade these provider writers to change their API too (as it enforces a breaking change onto their users they might not want to do that)

@roji
Copy link
Member Author

roji commented Dec 4, 2019

Thanks for your feedbacks everyone!

@saurabh500

It might be worth following up with SMO.

Sure. But it's important to note that this would only be a documented, breaking change in .NET 5 - a major part of the appeal of .NET Core is the side-by-side nature, which would only affect a product such as SMO if and when they actually switch to .NET 5. This is in contrast with the old, in-place .NET Framework dates, where we could probably never even have this conversation. But regardless it would be great to get more feedback from any other interested parted here.

@FransBouma

If you execute queries without knowing whether they will return a resultset, you in general will do that with ExecuteReader(), and first examine the GetSchemaTable. If it's null, you can assume there wasn't a resultset returned, so the query likely was a DML query.

Yes, anyone doing this would be broken by this change (if and when they switch to .NET 5). Interestingly I tend to think about the resultset checking more via FieldCount being zero - I was somewhat surprised to find that GetSchemaTable returns null.

What's more important tho is: how to enforce all ADO.NET providers to do the right thing here

One important resource here is @bgrainger's ADO.NET spec tests, which provide a relatively easy way to check behavior across many providers. In fact, as I'm working through the ADO.NET API surface for nullability annotation (which is what spawned this issue), I'm using that project and adding tests to it. Of course, not all providers are represented there.

@bgrainger
Copy link
Contributor

Yes, anyone doing this would be broken by this change (if and when they switch to .NET 5).

I'm not following how the behaviour change would be linked with / limited to .NET 5. DbDataReader.GetSchemaTable is virtual, so the actual behaviour (returning null or an empty DataTable) would depend on the specific ADO.NET provider being used (and its version).

Unless you're proposing/assuming multitargeting where ADO.NET providers only implement the new behaviour for builds specifically targeting netcoreapp5.0 (or whatever the proper TFM is), which seems even harder to enforce.

@roji
Copy link
Member Author

roji commented Dec 4, 2019

You're partially right, I'm currently investigating other breaking changes in S.D.C itself that I'm thinking about (will come in a separate issue). However, since SqlClient specifically has been part of the Framework, side-by-side did not apply to it (and breaking changes could therefore almost never be introduced). Regardless, as .NET itself has (thankfully) is now side-by-side, it seems there's more general openness towards introducing some breaking changes in major versions, compared to the earlier reluctance.

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 4, 2019

Only newly compiled consumers will take notice of the nullable annotations and care if an old version of the library returns a null contrary to the annotation. It seems unlikely that anyone would try to backdate the in-box version of sqlclient specifically in that scenario. As long as the new expectation is documented for provider authors I don't see it as a big problem.

@roji
Copy link
Member Author

roji commented Dec 5, 2019

@Wraith2 the issue here specifically is that we're also proposing modifying the behavior of SqlClient and other providers to return an empty DataTable from GetSchemaTable - that's a visible behavior change regardless of whether you're opted into the new nullable feature.

To be clear, yeah - the idea is to annotate the return value of DbConnection.GetSchemaTable as non-nullable. But to have SqlClient and Npgsql compatible with that annotation they would also need to change behavior.

/cc @mgravell who I left out before (sorry!)

@ajcvickers
Copy link
Contributor

I'm also in favor of option 3, although I am concerned about the breaking change. Any option for consistency results in at least one provider being broken--if the standard behavior becomes "return null" then SQLite needs to make a breaking change. That being said, breaking SQLite is not as bad as breaking SQLClient because it has a lot lower usage.

So I think this boils down to whether or not SQLClient is okay with the break. If so, then 3 seems to clearly win.

@roji
Copy link
Member Author

roji commented Mar 7, 2020

@roji roji closed this as completed Mar 7, 2020
@roji roji removed design-discussion Ongoing discussion about design without consensus untriaged New issue has not been triaged by the area owner labels Mar 7, 2020
@roji
Copy link
Member Author

roji commented Mar 7, 2020

@terrajobst @stephentoub this discussion was the basis of runtime changes in ADO providers but not in System.Data itself (though it will influence how we null-annotate it). OK to leave the labels/milestone like this or should we remove from the milestone as no actual change happened in the runtime?

@roji
Copy link
Member Author

roji commented Aug 20, 2020

Everyone, unfortunately I have to propose to revert this change for 5.0. When we originally had the discussion above, I didn't sufficiently consider the wider effect of the breaking change... since then, I've done a lot of work on System.Data nullability, and have come to better understand the bar for breaking changes in System.Data and their overall impact. I apologize for the mess this is creating.

Some reasoning:

  • Since this is an ADO.NET change, we'd also need to change the behavior of System.Data.Odbc and System.Data.OleDb. These are old providers, and we have very little knowledge on who is using them or how. I'm now more familiar with the difficulty of pushing breaking changes in the BCL, and this is a potentially risky one.
  • There's also System.Data.SqlClient. I think it would be out of the question to introduce such a breaking change there, with the very high backwards compat bar.
  • If we indeed leave older providers with the old, null-returning behavior, then we've fragmented the behavior - newer providers would return an empty table, while older ones would return null. Anyone coding against ADO.NET would therefore have to handle both behaviors, which is definitely worse than having just one (regardless of what we think of it).
  • At the end of the day, the change simply doesn't seem to justify the issues and risks it creates - requiring users to check GetSchemaTable for null isn't ideal, but it isn't the end of the world either. Had I designed a new API today, I'd definitely have adopted the proposed new behavior, but changing it at this point is something entirely different.
  • Unfortunately, SqlClient 2.0.0 has changed its behavior (Change DbDataReader.{GetSchemaTable,GetColumnSchema} to return empty results when there is no resultset SqlClient#417), as well as MySqlConnector (GetSchemaTable/GetColumnSchema should return empty object mysql-net/MySqlConnector#744). I'm hoping that since these are relatively recent changes, reverting the behavior back would impact few users (the lack of any feedback after introducing the original break in these two providers may support this). Also, considering the age of ADO.NET and old applications/libraries relying on it, in the long run re-adopting the previous behavior may spare you and your users more headaches than it saves.

Once again, apologies for this.

PS The PR making GetSchemaTable nullable in System.Data (and Odbc/OleDb) is #41082.

@bgrainger
Copy link
Contributor

To confirm: the desired new behaviour is that when there is no result set, GetSchemaTable will return null (and GetColumnSchema will continue returning an empty collection)?

Since MySqlConnector changed from throwing an exception to returning an empty DataTable, I don't think there would be many users relying on the new return value, and changing it to null shouldn't have a significant impact.

@roji
Copy link
Member Author

roji commented Aug 20, 2020

To confirm: the desired new behaviour is that when there is no result set, GetSchemaTable will return null (and GetColumnSchema will continue returning an empty collection)?

Yeah, that's correct. This includes the case where NextResult has been called on the reader, and it returned false (i.e. consumed all result sets). You should see this behavior with System.Data.SqlClient, and with Microsoft.Data.SqlClient prior to 2.0.0; if you're seeing anything different please let me know...

Since MySqlConnector changed from throwing an exception to returning an empty DataTable, I don't think there would be many users relying on the new return value, and changing it to null shouldn't have a significant impact.

That's good to hear.. I now understand the extent to which behavioral changes in the public-facing abstraction methods of ADO.NET should be done really, really carefully.

@ErikEJ
Copy link

ErikEJ commented Aug 20, 2020

Or not at all? 😀

@roji
Copy link
Member Author

roji commented Aug 20, 2020

A certain number of "reallys" starts to be equivalent to "not at all" for sure ☹️

roji added a commit that referenced this issue Aug 21, 2020
roji added a commit to dotnet/dotnet-api-docs that referenced this issue Aug 21, 2020
carlossanlop pushed a commit to carlossanlop/runtime that referenced this issue Aug 28, 2020
@roji roji removed this from the 5.0.0 milestone Sep 9, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants