-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLite RevEng: Sample data to determine CLR type #8824
Comments
@ErikEJ was that a 👎 to doing this? or because SQLite is weird. |
It was actually to doing it, it will give users a false sense of strong typing |
I imagine something like this to get the average field type: SELECT typeof(column1)
FROM table1
WHERE typeof(column1) != 'null'
GROUP BY typeof(column1)
ORDER BY count() DESC
LIMIT 1; If |
Hello @bricelam, question regarding where this is going please. If I have the following table definition: CREATE TABLE [Schema] ( In EF6 EF Core Are you saying this is how it will be in the future, or is it going to get fixed post 2.1 preview 1? I'm assuming EF Core would follow the same rules as EF6, which seemed to work well in this regard: https://system.data.sqlite.org/index.html/artifact/18f60c317bbdfb16 /// /// SQLite is inherently un-typed. All datatypes in SQLite are natively strings. The definition of the columns of a table /// and the affinity of returned types are all we have to go on to type-restrict data in the reader. |
No. In SQLite, column type names mean nothing*. System.Data.SQLite added additional semantics to them to map them CLR types. Early on, we decided that Microsoft.Data.Sqlite and EF Core's SQLite provider should not add these additional semantics instead exposing the underlying SQLite behavior more directly. This issue is about looking the actual data to determine the CLR type. The column type names would continue to just be application-defined strings with application-only semantics. To illustrate the problem, EF6 and System.Data.SQLite map columns with type By continuing to ignore the column type and going directly to the data, we could see that a column contains only * Column type names are actually used by SQLite to determine type affinity, but that really only affects the on-disk format. |
For additional reference, System.Data.SQLite's column to EDM type mapping is specified here. |
The Microsoft.Data.Sqlite Data Type Mappings might also be interesting. |
Hi Brice. Thank you for the reply. Please bear with me I am new to EF Core and I just spent a couple of days attempting to port a relatively simple data model from EF6 to EF Core 2.1, just to see where the pain points will be. I was expecting a simpler port, but found that it was more difficult due not being able to be involved in the code generation (solved with EF Core Power Tools) and the changes to how data types are being presented by EF Core.
So I can only reverse engineer a database that is fully populated? The old way of defining datatypes within the table definition seems less error-prone and more declarative. If I define a column as , [ModifiedDateTime] datetime NOT NULL If I use Dapper, EF6, or any other micro-ORM I get a datetime. Why would I want the data to come back as a string and write code to make the conversion manually? Also, according the Sqlite documentation, datetimes can be stored as REAL, INTEGER, or STRING in the database. How are you going to determine that a REAL, or INTEGER is really meant to be a datetime?
What would the application want to do with the column type names? I'm pretty sure they want their ORM to use it to interpret the types for them rather than having to write the conversion code themselves? One last point on this. With EF6, when I moved my application from SqlServerCe to Sqlite, I did not have to change anything. I was pretty much insulated by EF to the specifics of how things were stored in the database. Now what I hear is that this is no longer the case. We need to change our applications to conform to the semantics of the database. Is this correct? |
You can change the CLR types after reverse engineering.
We won't. You'll have to update the type manually if you know they're datetime values.
We could apply heuristics, but since the type names could literally be anything (including no type name), we would constantly be getting issues to update the heuristics to match/not match every new type name somebody's application decided to use. SQLite does not apply semantics to these names so we don't want to either. Reverse Engineering is a best guess at what we think you want your domain model to look like. Since SQLite only has four primitive data types, there is A LOT of information lost when looking just at the database. This issue is about making a more educated guess, but at the end of the day it's still just a guess and you'll probably have to provide the missing information by manually updating the model after it's generated. This is one place where reverse engineer templates can help. If you have the column type names, you can map to the appropriate CLR types in the template. |
Thank you for your patience on this Brice. I think we have been talking at cross-purposes here. I have no problem modifying my model after the reverse engineering. That makes sense to me. What I thought was been said is that I could not in general use a bool with SQLite because it doesn't support that type. So here is what happens if I use a bool in my model. Perhaps it is because I am configuring things wrong. SQLite Column Definition [ReadOnly] bit NOT NULL SELECT typeof(Readonly) FROM Schema returns 'integer' Reverse engineering turns this to a string (not sure why its not an integer), I edit it and turn it to a bool public bool ReadOnly {get;set;} Configured in the context as:
When I attempt to access this entity from the data model, I get the following: System.InvalidOperationException: 'An exception occurred while reading a database value for property 'Schema.ReadOnly'. The expected type was 'System.Boolean' but the actual value was of type 'System.Boolean'.' |
Oh weird. Yes, that should definitely work. Could you submit a new issue with a repro? It may be hitting a bug somewhere in query. |
Not sure why it is being reverse engineered as a string when the type in Sqlite is an integer. I will log a new issue with a sample project. Thanks for your help. |
Some notes on the table above:
So just to confirm... If I create a column with type DATETIME in SQLite, I get an affinity of NUMERIC affinity (rules), which means SQLite tries to convert inputs to NUMERIC (but stores them as-is if it can't convert). But according to the above we'd scaffold it as a .NET DateTime. That sounds OK (not pushing back) - just making sure. BTW we should probably exempt STRICT tables from data sampling? Type ANY in strict table is particularly interesting, it really seems to correspond to CLR object, since it's meant to hold any type (vs. non-strict tables, where there's still a specific "preferred" column type, even if anything can still be stored...). This is all quite intricate 😅 |
Add DateOnly/TimeOnly? |
Absolutely! |
No. They get mapped to the NUMERIC affinity which results in a mix of INTEGER, REAL, and TEXT values.
I don't think we should add explicit .NET type names here. There should be good precedence for an existing database using the type names before we think about handling them. Remember that looking at the actual data will catch most cases. These are just a last ditch effort to give you something better than the affinity--especially where the affinity for these types would be objectively wrong. It's also worth noting that all the type names here are handled by System.Data.SQLite and related .NET drivers.
I don't think it should. No other floating-point column would result in
...yes, good so far.
There is no NUMERIC type--it's just an affinity. It results in INTEGER, REAL, or TEXT values depending on which one is lossless.
Yes. If there is no data, and the column type is DATETIME.
Probably.
Yes. We need to handle this in more than just scaffolding. See #28628
Very intricate. I'll try to present everything in a design meeting so we can really understand and discuss it. It's worth re-stating that the goal in all of this is just to produce better, more expected EF models. All of it is guesswork. We'll never be perfect; we're just trying to be better. At the end of the day, SQLite truly only supports four types. |
Here are the initial rules based on the actual value (not column) types.
Notes:
|
@bricelam no need to bring to design just on my account - I don't want to hold this back. As you say, this is only about improving scaffolding, and nothing we do here will be perfect anyway.
BTW what would we do here if the column contains mixed values? |
I don't think we support |
@bricelam Just FYI. for SQL Server, we have special type mapping code to allow object properties mapped to SqlVariant types. |
📝 Design meeting notes
|
Add some opt out/override? |
…er CLR type Here's a table highlighting some of the improvements. SQL Type | Example Value | Old .NET Type | New .NET Type -------- | -------------------------------------- | ------------- | ------------- BOOLEAN | 0 | long | bool SMALLINT | 0 | long | short INT | 0 | long | int BIGINT | 0 | long | long TEXT | '0.0' | string | decimal TEXT | '1970-01-01' | string | DateOnly TEXT | '1970-01-01 00:00:00' | string | DateTime TEXT | '00:00:00' | string | TimeSpan TEXT | '00000000-0000-0000-0000-000000000000' | string | Guid STRING | 'ABC' | byte[] | string Resolves dotnet#8824
No good way to do this, but I'll discuss with the team. |
…er CLR type Here's a table highlighting some of the improvements. SQL Type | Example Value | Old .NET Type | New .NET Type -------- | -------------------------------------- | ------------- | ------------- BOOLEAN | 0 | long | bool SMALLINT | 0 | long | short INT | 0 | long | int BIGINT | 0 | long | long TEXT | '0.0' | string | decimal TEXT | '1970-01-01' | string | DateOnly TEXT | '1970-01-01 00:00:00' | string | DateTime TEXT | '00:00:00' | string | TimeSpan TEXT | '00000000-0000-0000-0000-000000000000' | string | Guid STRING | 'ABC' | byte[] | string Resolves dotnet#8824
…R type Here's a table highlighting some of the improvements. Column type | Sample value | Before | After ----------- | -------------------------------------- | ------ | ----- BOOLEAN | 0 | byte[] | bool SMALLINT | 0 | long | short INT | 0 | long | int BIGINT | 0 | long | long TEXT | '0.0' | string | decimal TEXT | '1970-01-01' | string | DateOnly TEXT | '1970-01-01 00:00:00' | string | DateTime TEXT | '00:00:00' | string | TimeSpan TEXT | '00000000-0000-0000-0000-000000000000' | string | Guid STRING | 'ABC' | byte[] | string Resolves dotnet#8824
…R type Here's a table highlighting some of the improvements. Column type | Sample value | Before | After ----------- | -------------------------------------- | ------ | ----- BOOLEAN | 0 | byte[] | bool SMALLINT | 0 | long | short INT | 0 | long | int BIGINT | 0 | long | long TEXT | '0.0' | string | decimal TEXT | '1970-01-01' | string | DateOnly TEXT | '1970-01-01 00:00:00' | string | DateTime TEXT | '00:00:00' | string | TimeSpan TEXT | '00000000-0000-0000-0000-000000000000' | string | Guid STRING | 'ABC' | byte[] | string Resolves dotnet#8824
Reverse engineering SQLite column types is tricky due to it's dynamic typing. The column type hold no real meaning.
Today, we apply the same type affinity rules that SQLite does to the specified column type name to put it into one of the four primitive types (which map to
long
,double
,string
&byte[]
).I think we could do a much better job by sampling the data to determine a suitable CLR type. This would allow us to reverse engineer types that need to be coerced like
decimal
,DateTime
andGuid
. We could also provide a more natural numeric type likeint
if the data fits.The text was updated successfully, but these errors were encountered: