SQLite RevEng: Sample data to determine CLR type #8824

bricelam · 2017-06-12T17:08:19Z

Reverse engineering SQLite column types is tricky due to it's dynamic typing. The column type hold no real meaning.

Today, we apply the same type affinity rules that SQLite does to the specified column type name to put it into one of the four primitive types (which map to long, double, string & byte[]).

I think we could do a much better job by sampling the data to determine a suitable CLR type. This would allow us to reverse engineer types that need to be coerced like decimal, DateTime and Guid. We could also provide a more natural numeric type like int if the data fits.

The text was updated successfully, but these errors were encountered:

bricelam · 2017-06-19T22:44:48Z

@ErikEJ was that a 👎 to doing this? or because SQLite is weird.

ErikEJ · 2017-06-20T16:08:16Z

It was actually to doing it, it will give users a false sense of strong typing

bricelam · 2017-09-25T20:57:32Z

I imagine something like this to get the average field type:

SELECT typeof(column1)
FROM table1
WHERE typeof(column1) != 'null'
GROUP BY typeof(column1)
ORDER BY count() DESC
LIMIT 1;

If INTEGER or REAL, use max() and min() to get the range.
If BLOB, use length() to see if it's a GUID.
If TEXT, use datetime() and time() to see if it parses. Maybe length() can indicate a GUID. Maybe CAST..AS REAL can indicate a decimal.

jonreis · 2018-03-12T20:03:20Z

Hello @bricelam, question regarding where this is going please. If I have the following table definition:

CREATE TABLE [Schema] (
[SchemaId] INTEGER NOT NULL
, [SchemaVersion] nvarchar(16) NOT NULL
, [DatabaseVersion] int NOT NULL
, [Expression] ntext NULL
, [Description] nvarchar(1024) NULL
, [ReadOnly] bit NOT NULL
, [ModifiedDateTime] datetime NOT NULL
, CONSTRAINT [sqlite_master_PK_Schema] PRIMARY KEY ([SchemaId])

In EF6
bit => bool
int => int
datetime => datetime

EF Core
bit => string
int => long
datetime => string

Are you saying this is how it will be in the future, or is it going to get fixed post 2.1 preview 1?

I'm assuming EF Core would follow the same rules as EF6, which seemed to work well in this regard:

https://system.data.sqlite.org/index.html/artifact/18f60c317bbdfb16

///

/// SQLite is inherently un-typed. All datatypes in SQLite are natively strings. The definition of the columns of a table
/// and the affinity of returned types are all we have to go on to type-restrict data in the reader.

bricelam · 2018-03-12T22:57:51Z

I'm assuming EF Core would follow the same rules as EF6

No. In SQLite, column type names mean nothing*. System.Data.SQLite added additional semantics to them to map them CLR types.

Early on, we decided that Microsoft.Data.Sqlite and EF Core's SQLite provider should not add these additional semantics instead exposing the underlying SQLite behavior more directly.

This issue is about looking the actual data to determine the CLR type. The column type names would continue to just be application-defined strings with application-only semantics.

To illustrate the problem, EF6 and System.Data.SQLite map columns with type FLOAT to System.Double, but most database created outside of EF6 probably intended those columns to map to System.Single. For further discussion, see aspnet/Microsoft.Data.Sqlite#457

By continuing to ignore the column type and going directly to the data, we could see that a column contains only 1s and 0s and map it to System.Bool. The column type could be whatever the application thinks is best: BIT, BOOL, BOOLEAN, FLAG, YESNO, SWITCH, etc.

* Column type names are actually used by SQLite to determine type affinity, but that really only affects the on-disk format.

bricelam · 2018-03-12T23:01:06Z

For additional reference, System.Data.SQLite's column to EDM type mapping is specified here.

bricelam · 2018-03-12T23:04:10Z

The Microsoft.Data.Sqlite Data Type Mappings might also be interesting.

jonreis · 2018-03-12T23:45:53Z

Hi Brice. Thank you for the reply. Please bear with me I am new to EF Core and I just spent a couple of days attempting to port a relatively simple data model from EF6 to EF Core 2.1, just to see where the pain points will be. I was expecting a simpler port, but found that it was more difficult due not being able to be involved in the code generation (solved with EF Core Power Tools) and the changes to how data types are being presented by EF Core.

This issue is about looking the actual data to determine the CLR type. The column type names would continue to just be application-defined strings with application-only semantics.

So I can only reverse engineer a database that is fully populated? The old way of defining datatypes within the table definition seems less error-prone and more declarative. If I define a column as

, [ModifiedDateTime] datetime NOT NULL

If I use Dapper, EF6, or any other micro-ORM I get a datetime. Why would I want the data to come back as a string and write code to make the conversion manually? Also, according the Sqlite documentation, datetimes can be stored as REAL, INTEGER, or STRING in the database. How are you going to determine that a REAL, or INTEGER is really meant to be a datetime?

The column type names would continue to just be application-defined strings with application-only semantics.

What would the application want to do with the column type names? I'm pretty sure they want their ORM to use it to interpret the types for them rather than having to write the conversion code themselves?

One last point on this. With EF6, when I moved my application from SqlServerCe to Sqlite, I did not have to change anything. I was pretty much insulated by EF to the specifics of how things were stored in the database. Now what I hear is that this is no longer the case. We need to change our applications to conform to the semantics of the database. Is this correct?

bricelam · 2018-03-13T15:54:37Z

You can change the CLR types after reverse engineering.

How are you going to determine that a REAL, or INTEGER is really meant to be a datetime?

We won't. You'll have to update the type manually if you know they're datetime values.

they want their ORM to use it to interpret the types for them

We could apply heuristics, but since the type names could literally be anything (including no type name), we would constantly be getting issues to update the heuristics to match/not match every new type name somebody's application decided to use. SQLite does not apply semantics to these names so we don't want to either.

Reverse Engineering is a best guess at what we think you want your domain model to look like. Since SQLite only has four primitive data types, there is A LOT of information lost when looking just at the database. This issue is about making a more educated guess, but at the end of the day it's still just a guess and you'll probably have to provide the missing information by manually updating the model after it's generated.

This is one place where reverse engineer templates can help. If you have the column type names, you can map to the appropriate CLR types in the template.

jonreis · 2018-03-13T17:55:50Z

Thank you for your patience on this Brice. I think we have been talking at cross-purposes here. I have no problem modifying my model after the reverse engineering. That makes sense to me. What I thought was been said is that I could not in general use a bool with SQLite because it doesn't support that type.

So here is what happens if I use a bool in my model. Perhaps it is because I am configuring things wrong.

SQLite Column Definition

[ReadOnly] bit NOT NULL

SELECT typeof(Readonly) FROM Schema returns 'integer'

Reverse engineering turns this to a string (not sure why its not an integer), I edit it and turn it to a bool

public bool ReadOnly {get;set;}

Configured in the context as:

            entity.Property(e => e.ReadOnly)
                .IsRequired()
                .HasColumnType("bit");

When I attempt to access this entity from the data model, I get the following:

System.InvalidOperationException: 'An exception occurred while reading a database value for property 'Schema.ReadOnly'. The expected type was 'System.Boolean' but the actual value was of type 'System.Boolean'.'

bricelam · 2018-03-13T19:40:52Z

Oh weird. Yes, that should definitely work. Could you submit a new issue with a repro? It may be hitting a bug somewhere in query.

jonreis · 2018-03-13T19:50:39Z

Not sure why it is being reverse engineered as a string when the type in Sqlite is an integer.

I will log a new issue with a sample project.

Thanks for your help.

jonreis · 2018-03-13T21:18:26Z

@bricelam logged as #11258

roji · 2022-12-11T11:53:36Z

Some notes on the table above:

CURRENCY, DECIMAL, MONEY, NUMBER, NUMERIC should get mapped to decimal anyway via type affinity (so they don't need to be listed in the table)?
Maybe add DateTimeOffset?
Should SINGLE be mapped to .NET double (as opposed to float)?
Support TIMESPAN for .NET TimeSpan alongside TIME?

So just to confirm... If I create a column with type DATETIME in SQLite, I get an affinity of NUMERIC affinity (rules), which means SQLite tries to convert inputs to NUMERIC (but stores them as-is if it can't convert). But according to the above we'd scaffold it as a .NET DateTime. That sounds OK (not pushing back) - just making sure.

BTW we should probably exempt STRICT tables from data sampling? Type ANY in strict table is particularly interesting, it really seems to correspond to CLR object, since it's meant to hold any type (vs. non-strict tables, where there's still a specific "preferred" column type, even if anything can still be stored...).

This is all quite intricate 😅

ErikEJ · 2022-12-11T11:58:38Z

Add DateOnly/TimeOnly?

roji · 2022-12-11T11:59:13Z

Absolutely!

bricelam · 2022-12-13T22:32:13Z

CURRENCY, DECIMAL, MONEY, NUMBER, NUMERIC should get mapped to decimal anyway via type affinity (so they don't need to be listed in the table)?

No. They get mapped to the NUMERIC affinity which results in a mix of INTEGER, REAL, and TEXT values.

Maybe add DateTimeOffset?
Add DateOnly/TimeOnly?
Support TIMESPAN for .NET TimeSpan alongside TIME?

I don't think we should add explicit .NET type names here. There should be good precedence for an existing database using the type names before we think about handling them. Remember that looking at the actual data will catch most cases. These are just a last ditch effort to give you something better than the affinity--especially where the affinity for these types would be objectively wrong. It's also worth noting that all the type names here are handled by System.Data.SQLite and related .NET drivers.

Should SINGLE be mapped to .NET double (as opposed to float)?

I don't think it should. No other floating-point column would result in float. This would make for an odd case where only when you have no data in a column would it potentially get mapped to float (as opposed to `double)

I create a column with type DATETIME in SQLite, I get an affinity of NUMERIC affinity...

...yes, good so far.

...which means SQLite tries to convert inputs to NUMERIC (but stores them as-is if it can't convert).

There is no NUMERIC type--it's just an affinity. It results in INTEGER, REAL, or TEXT values depending on which one is lossless.

But according to the above we'd scaffold it as a .NET DateTime.

Yes. If there is no data, and the column type is DATETIME.

we should probably exempt STRICT tables

Probably.

Type ANY in strict table is particularly interesting, it really seems to correspond to CLR object

Yes. We need to handle this in more than just scaffolding. See #28628

This is all quite intricate

Very intricate. I'll try to present everything in a design meeting so we can really understand and discuss it. It's worth re-stating that the goal in all of this is just to produce better, more expected EF models. All of it is guesswork. We'll never be perfect; we're just trying to be better. At the end of the day, SQLite truly only supports four types.

bricelam · 2022-12-13T22:50:04Z

Here are the initial rules based on the actual value (not column) types.

INTEGER
- When min() = 0 and max() = 1 and cout() > 2
  - bool
- When max() > int.MaxValue
  - long
- Otherwise
  - int
REAL
- double
TEXT
- When format is 'yyyy-MM-dd'
  - DateOnly
- When format is 'yyyy-MM-dd HH:mm:ss[.FFFFFFF]'
  - DateTime
- When format is 'yyyy-MM-dd HH:mm:ss[.FFFFFFF]zzz'
  - DateTimeOffset
- When format is '0.0###########################'
  - decimal
- When format is '00000000-0000-0000-0000-000000000000'
  - Guid
- When format is '[-][d.]hh:mm:ss[.fffffff]'
  - TimeSpan
- Otherwise
  - string
BLOB
- byte[]

Notes:

~~We'll never scaffold byte, sbyte, short, or ushort preferring int instead~~
~~We'll never scaffold uint preferring long instead~~
~~We'll never scaffold ulong because values just overflow in the database~~
~~We'll never scaffold float preferring double instead~~
We'll never scaffold char preferring string instead
We'll never scaffold TimeOnly because its format is ambiguous with TimeSpan
We'll never scaffold Guid for BLOB values

roji · 2022-12-14T10:44:07Z

@bricelam no need to bring to design just on my account - I don't want to hold this back. As you say, this is only about improving scaffolding, and nothing we do here will be perfect anyway.

Here are the initial rules based on the actual value (not column) types.

BTW what would we do here if the column contains mixed values?

bricelam · 2022-12-15T18:09:15Z

I don't think we support object properties today on SQLite (but we should as part of #28628), so I was just planning to pick a type.

ajcvickers · 2022-12-15T18:26:30Z

@bricelam Just FYI. for SQL Server, we have special type mapping code to allow object properties mapped to SqlVariant types.

bricelam · 2022-12-16T19:26:08Z

📝 Design meeting notes

Require at least three non-NULL values before choosing bool
After looking at the data, use the column name to further refine the type to things like short, float, etc.
Use .NET type names too if they don't conflict with known SQL types
Avoid aggregate (i.e. table scan) queries as much as possible for perf

ErikEJ · 2022-12-17T06:49:48Z

Add some opt out/override?

…er CLR type Here's a table highlighting some of the improvements. SQL Type | Example Value | Old .NET Type | New .NET Type -------- | -------------------------------------- | ------------- | ------------- BOOLEAN | 0 | long | bool SMALLINT | 0 | long | short INT | 0 | long | int BIGINT | 0 | long | long TEXT | '0.0' | string | decimal TEXT | '1970-01-01' | string | DateOnly TEXT | '1970-01-01 00:00:00' | string | DateTime TEXT | '00:00:00' | string | TimeSpan TEXT | '00000000-0000-0000-0000-000000000000' | string | Guid STRING | 'ABC' | byte[] | string Resolves dotnet#8824

bricelam · 2023-05-02T22:08:48Z

Add some opt out/override?

No good way to do this, but I'll discuss with the team.

…er CLR type Here's a table highlighting some of the improvements. SQL Type | Example Value | Old .NET Type | New .NET Type -------- | -------------------------------------- | ------------- | ------------- BOOLEAN | 0 | long | bool SMALLINT | 0 | long | short INT | 0 | long | int BIGINT | 0 | long | long TEXT | '0.0' | string | decimal TEXT | '1970-01-01' | string | DateOnly TEXT | '1970-01-01 00:00:00' | string | DateTime TEXT | '00:00:00' | string | TimeSpan TEXT | '00000000-0000-0000-0000-000000000000' | string | Guid STRING | 'ABC' | byte[] | string Resolves dotnet#8824

…R type Here's a table highlighting some of the improvements. Column type | Sample value | Before | After ----------- | -------------------------------------- | ------ | ----- BOOLEAN | 0 | byte[] | bool SMALLINT | 0 | long | short INT | 0 | long | int BIGINT | 0 | long | long TEXT | '0.0' | string | decimal TEXT | '1970-01-01' | string | DateOnly TEXT | '1970-01-01 00:00:00' | string | DateTime TEXT | '00:00:00' | string | TimeSpan TEXT | '00000000-0000-0000-0000-000000000000' | string | Guid STRING | 'ABC' | byte[] | string Resolves dotnet#8824

bricelam added the type-enhancement label Jun 12, 2017

ajcvickers added this to the Backlog milestone Jun 12, 2017

bricelam mentioned this issue Jun 19, 2017

Sqlite: scaffolding: a nullable bool is created as string in the POCO entity #8725

Closed

AndriySvyryd added the propose-close label Aug 30, 2017

bricelam mentioned this issue Sep 25, 2017

SqliteDataReader's GetFieldType is not right. aspnet/Microsoft.Data.Sqlite#433

Closed

bricelam mentioned this issue Mar 12, 2018

EF Core Power Tools - Sqlite data types different than those created in EF 6.0 ErikEJ/SqlCeToolbox#654

Closed

bricelam added the consider-for-current-release label Apr 13, 2018

ajcvickers removed the propose-close label Apr 23, 2018

bricelam mentioned this issue Apr 23, 2018

Reverse Engineering bricelam/EFCore.SqliteEx#13

Closed

bricelam self-assigned this May 16, 2018

ajcvickers removed the consider-for-current-release label May 21, 2018

bricelam mentioned this issue Sep 7, 2018

SQLite: Consider changing ultimate fallback type to BLOB #13253

Closed

bricelam assigned bricelam and unassigned bricelam Oct 12, 2018

bricelam added the consider-for-current-release label Oct 12, 2018

ajcvickers assigned bricelam Jan 30, 2019

bricelam mentioned this issue May 2, 2023

SQLite Scaffolding: Use column type and values to provide a better CLR type #30816

Merged

bricelam added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label May 2, 2023

bricelam modified the milestones: Backlog, 8.0.0 May 2, 2023

bricelam closed this as completed in 2d7c4a4 May 10, 2023

ajcvickers modified the milestones: 8.0.0, 8.0.0-preview4, 8.0.0-preview5 May 26, 2023

ajcvickers mentioned this issue Jun 23, 2023

Latest news and progress on .NET 8 and EF8 #29989

Closed

bricelam mentioned this issue Oct 2, 2023

SQLite affinity fallback is wrong: fallback to blob instead of numeric #31915

Closed

ajcvickers removed the consider-for-current-release label Oct 11, 2023

ajcvickers modified the milestones: 8.0.0-preview5, 8.0.0 Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQLite RevEng: Sample data to determine CLR type #8824

SQLite RevEng: Sample data to determine CLR type #8824

bricelam commented Jun 12, 2017 •

edited

Loading

bricelam commented Jun 19, 2017 •

edited

Loading

ErikEJ commented Jun 20, 2017

bricelam commented Sep 25, 2017 •

edited

Loading

jonreis commented Mar 12, 2018

bricelam commented Mar 12, 2018 •

edited

Loading

bricelam commented Mar 12, 2018

bricelam commented Mar 12, 2018

jonreis commented Mar 12, 2018 •

edited

Loading

bricelam commented Mar 13, 2018 •

edited

Loading

jonreis commented Mar 13, 2018 •

edited

Loading

bricelam commented Mar 13, 2018 •

edited

Loading

jonreis commented Mar 13, 2018

jonreis commented Mar 13, 2018

roji commented Dec 11, 2022

ErikEJ commented Dec 11, 2022

roji commented Dec 11, 2022

bricelam commented Dec 13, 2022

bricelam commented Dec 13, 2022 •

edited

Loading

roji commented Dec 14, 2022

bricelam commented Dec 15, 2022 •

edited

Loading

ajcvickers commented Dec 15, 2022

bricelam commented Dec 16, 2022

ErikEJ commented Dec 17, 2022

bricelam commented May 2, 2023

SQLite RevEng: Sample data to determine CLR type #8824

SQLite RevEng: Sample data to determine CLR type #8824

Comments

bricelam commented Jun 12, 2017 • edited Loading

bricelam commented Jun 19, 2017 • edited Loading

ErikEJ commented Jun 20, 2017

bricelam commented Sep 25, 2017 • edited Loading

jonreis commented Mar 12, 2018

bricelam commented Mar 12, 2018 • edited Loading

bricelam commented Mar 12, 2018

bricelam commented Mar 12, 2018

jonreis commented Mar 12, 2018 • edited Loading

bricelam commented Mar 13, 2018 • edited Loading

jonreis commented Mar 13, 2018 • edited Loading

bricelam commented Mar 13, 2018 • edited Loading

jonreis commented Mar 13, 2018

jonreis commented Mar 13, 2018

roji commented Dec 11, 2022

ErikEJ commented Dec 11, 2022

roji commented Dec 11, 2022

bricelam commented Dec 13, 2022

bricelam commented Dec 13, 2022 • edited Loading

roji commented Dec 14, 2022

bricelam commented Dec 15, 2022 • edited Loading

ajcvickers commented Dec 15, 2022

bricelam commented Dec 16, 2022

📝 Design meeting notes

ErikEJ commented Dec 17, 2022

bricelam commented May 2, 2023

bricelam commented Jun 12, 2017 •

edited

Loading

bricelam commented Jun 19, 2017 •

edited

Loading

bricelam commented Sep 25, 2017 •

edited

Loading

bricelam commented Mar 12, 2018 •

edited

Loading

jonreis commented Mar 12, 2018 •

edited

Loading

bricelam commented Mar 13, 2018 •

edited

Loading

jonreis commented Mar 13, 2018 •

edited

Loading

bricelam commented Mar 13, 2018 •

edited

Loading

bricelam commented Dec 13, 2022 •

edited

Loading

bricelam commented Dec 15, 2022 •

edited

Loading