Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initializing reference navigation in constructor causes incorrect fixup #18007

Closed
RickStrahl opened this issue Sep 24, 2019 · 31 comments · Fixed by #21045
Closed

Initializing reference navigation in constructor causes incorrect fixup #18007

RickStrahl opened this issue Sep 24, 2019 · 31 comments · Fixed by #21045
Labels
area-change-tracking closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported type-bug
Milestone

Comments

@RickStrahl
Copy link

RickStrahl commented Sep 24, 2019

I have a very simple sample application (my AlbumViewer Sample. It has Albums and Artists (and also Tracks).

In upgrading to the .NET Core 3.0 RTM bits today I noticed that the DB was no longer returning the related data properly.

Specifically I'm querying a list of Albums, which should return Artist records as part of the query results. But there's a problem with the Artist data with only the first resulting album getting the related Artist data - all subsequent Albums with the same entity get a null reference for the Artist:

image

The model is very simple:

    [DebuggerDisplay("{Title} {Artist.ArtistName}")]
    public class Album
    {           
        public int Id { get; set; }
        public int ArtistId { get; set; }        
        public string Title { get; set; }
        public string Description { get; set; }
        public int Year { get; set; }
        public string ImageUrl { get; set; }
        public string AmazonUrl { get; set; }
        public string SpotifyUrl { get; set; }

        public virtual Artist Artist { get; set; }
        public virtual IList<Track> Tracks { get; set; }

        public Album()
        {
            Artist = new Artist();
            Tracks = new List<Track>();
        }
    }

and this is a basic 1-1 relation ship.

The query runs:

 public async Task<List<Album>> GetAllAlbums(int page = 0, int pageSize = 15)
        {
            IQueryable<Album> albums = Context.Albums
                .Where(alb=> alb.ArtistId == 8)   // for debug only retrieve one artist set of records
                .Include(ctx => ctx.Tracks)
                .Include(ctx => ctx.Artist)
                .OrderBy(alb => alb.Title);

            if (page > 0)
            {
                albums = albums
                                .Skip((page - 1) * pageSize)
                                .Take(pageSize);
            }

            var list = await albums.ToListAsync();
            return list;
        }

In this query 3 records are returned each of which have the same artist. The first record correctly holds the the Artist data. Subsequent matches have the .Artist=null.

Note it appears that this affects any Album records where there are multiple albums per artist. The first match always has the Artist filled, but any records later in the resultset have the Artist empty.

EF Core version:
Database provider: 3.0 RTM
Target framework: .NET Core 3.0 RTM
Operating system: Windows 10
IDE: VS 2019 16.3

@RickStrahl
Copy link
Author

RickStrahl commented Sep 24, 2019

Some additional points about the results in this query:

To demonstrate more clearly I added another subquery to eek out all the entries that have an empty album, which results in 43 of 90 - basically all albums that have more than one album for a single artist. First one has Artist, subsequent ones do not.

image

There should never be any entries in that second list as there are not empty artists and no orphaned artists in the db.

So just to be sure there's not something wrong with my own assumptions I removed the EF Core 3.0 assemblies and rolled them back to the EF Core 2.2, then re-ran the same application.

image

As you can see 2.2 correctly returns the proper results with no empty Artist objects.

@divega
Copy link
Contributor

divega commented Sep 24, 2019

@smitpatel, can you please do the initial investigation?

@smitpatel smitpatel self-assigned this Sep 24, 2019
@smitpatel
Copy link
Member

I am not able to reproduce this error
Repro code used

using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;

namespace EFSampleApp
{
    public class Program
    {
        public static void Main(string[] args)
        {
            using (var db = new MyContext())
            {
                // Recreate database
                db.Database.EnsureDeleted();
                db.Database.EnsureCreated();

                // Seed database
                var artist = new Artist();
                db.Add(artist);

                db.AddRange(
                    new Album
                    {
                        Artist = artist,
                        Title = "Album1",
                        Tracks =
                        {
                            new Track(),
                            new Track(),
                            new Track(),
                        }
                    },
                    new Album
                    {
                        Artist = artist,
                        Title = "Album2",
                        Tracks =
                        {
                            new Track(),
                            new Track(),
                            new Track(),
                        }
                    },
                    new Album
                    {
                        Artist = artist,
                        Title = "Album3",
                        Tracks =
                        {
                            new Track(),
                            new Track(),
                            new Track(),
                        }
                    }
                    );

                db.SaveChanges();
            }

            using (var db = new MyContext())
            {
                // Run queries
                var result = db.Albums.Where(alb => alb.ArtistId == 1)
                    .Include(ctx => ctx.Tracks)
                    .Include(ctx => ctx.Artist)
                    .OrderBy(alb => alb.Artist)
                    .ToList();

                Console.WriteLine(result.Where(a => a.Artist == null).Count());
            }
            Console.WriteLine("Program finished.");
        }
    }


    public class MyContext : DbContext
    {
        private static ILoggerFactory ContextLoggerFactory
            => LoggerFactory.Create(b =>
            {
                b
                .AddConsole()
                .AddFilter("", LogLevel.Debug);
            });

        // Declare DBSets
        public DbSet<Album> Albums { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            // Select 1 provider
            optionsBuilder
                .UseSqlServer(@"Server=(localdb)\mssqllocaldb;Database=_ModelApp;Trusted_Connection=True;Connect Timeout=5;ConnectRetryCount=0")
                //.UseSqlite("filename=_modelApp.db")
                //.UseInMemoryDatabase(databaseName: "_modelApp")
                //.UseCosmos("https://localhost:8081", @"C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==", "_ModelApp")
                .EnableSensitiveDataLogging()
                .UseLoggerFactory(ContextLoggerFactory);
        }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            // Configure model
        }
    }

    public class Album
    {
        public int Id { get; set; }
        public int ArtistId { get; set; }
        public string Title { get; set; }
        public string Description { get; set; }
        public int Year { get; set; }
        public string ImageUrl { get; set; }
        public string AmazonUrl { get; set; }
        public string SpotifyUrl { get; set; }

        public virtual Artist Artist { get; set; }
        public virtual IList<Track> Tracks { get; set; }

        public Album()
        {
            Artist = new Artist();
            Tracks = new List<Track>();
        }
    }

    public class Artist
    {
        public int Id { get; set; }
    }
    public class Track
    {
        public int Id { get; set; }
        public int AlbumId { get; set; }
    }
}

@smitpatel
Copy link
Member

Further, the other 2 screenshots in 2nd comment should throw Null Ref is artist is not loaded when comparing album.Artist.Id == 0.

@RickStrahl
Copy link
Author

RickStrahl commented Sep 24, 2019

True but there are no orphaned entities so Artist should never be null - that code is there to demonstrate the issue - it's not in the actual code. Also I'm not sure that it should through since the constructor is in fact creating the instance.

Again note that this works fine in 2.2.

I'll take a look later if I can simplify this down to a simpler repo but my suggestion is maybe look at the repo and run the ASP.NET application and hit /api/albums then look at the data (or step in). The data autoloads in SqLite from a JSON file so you should be able to see the same thing I'm running here.

I don't doubt that this may not fail in a no data or memory db driver scenario, but with either SQL Server or SQLite in the above scenario it definitely fails.

@smitpatel
Copy link
Member

I used SqlServer as database & with adding seed data as evident in repro code above.

@rellis-of-rhindleton
Copy link

Forked the AlbumViewer repo to see if I could spot anything. Checked out the EF 3.0 commit. Using the SqlLite option, I'm seeing similar behavior. For artist id 12 (Accept), I'm getting three albums. The first has everything right; the second has an instance of Artist with none of its properties set.

locals

Additionally - in GetAlbumsForArtist, although the query includes Artist, none of the returned albums have the right data. Again it looks like they all have a new instance of Artist with default property values.

With this query code, breakpoint on the return statement:

public async Task<List<Album>> GetAlbumsForArtist(int artistId)
{
	var albums = await Context.Albums
		.Include(a => a.Tracks)
		.Include(a => a.Artist)
		.Where(a => a.ArtistId == artistId)
		.ToListAsync();

	return albums;
}

This is the value of albums:

locals2

I don't have any deeper insights here, just figured I'd see if I could reproduce the problem.

(Thanks for the list @RickStrahl ... now I have some new albums to listen to. 🥇 )

@RickStrahl
Copy link
Author

@rellis-of-rhindleton Thanks for confirming that it's not a machine specific thing.

@RickStrahl
Copy link
Author

@smitpatel - I've created a simpler test project that you can test with here:

https://github.com/RickStrahl-Content/EfCore30QueryBug

This gives you the data set I'm working with which is un-interesting but consistently demonstrates the failing behavior. Give that a shot and let me know what you find.

@alexwiese
Copy link

alexwiese commented Sep 25, 2019

@RickStrahl @smitpatel
I've run Rick's repro on my machine and it fails with both SQL and SQLLite.

X EfCoreQueryWithSqLiteDataBug [687ms]
  Error Message:
   Assert.IsTrue failed. There are 38 uninitialized Artist records
  Stack Trace:
     at EFCoreQueryBug.EfCoreQueryBug.EfCoreQueryWithSqLiteDataBug() in C:\git-other\EfCore30QueryBug\EfCoreQueryBug.cs:line 33


  X EfCoreQueryWithSqlDataBug [256ms]
  Error Message:
   Assert.IsTrue failed. There are 38 uninitialized Artist records
  Stack Trace:
     at EFCoreQueryBug.EfCoreQueryBug.EfCoreQueryWithSqlDataBug() in C:\git-other\EfCore30QueryBug\EfCoreQueryBug.cs:line 54

Interestingly if I comment out the line that sets the Artist property in the Album constructor it works fine.
This leads me to believe EF is getting tripped up by the presence of the "default" Artist somehow.

        public Album()
        {
           // Artist = new Artist();
            Tracks = new List<Track>();
        }

Further to this setting the ID of the "default" Artist to a different value such as -1 or 1 it also works (except the results are wrong, it seems the first instance has the Artist set correctly, but each subsequent instance having that ID will not be set correctly).

Artist = new Artist { Id = 1 };

@smitpatel
Copy link
Member

smitpatel commented Sep 25, 2019

There is possibly issue in seeding. Did you check data on server side?

@smitpatel
Copy link
Member

Or fixup is not running properly since Album is not null.

@alexwiese
Copy link

alexwiese commented Sep 25, 2019

@smitpatel
The data is fine, if I comment out that line that sets the "default" Artist then the data loads correctly.
There is not issue with the data in the database.

Ie. this works fine

        {
           // Artist = new Artist();
            Tracks = new List<Track>();
        }

This causes the issue where the non-first Album having that ArtistId does not have the Artist set

        public Album()
        {
            Artist = new Artist();
            Tracks = new List<Track>();
        }

Should be easier to narrow down the issue by looking at the behaviour with that line commented/uncommented.

@smitpatel
Copy link
Member

My theory is statemanager is not working correctly.
Step by step

  • Album is materialized and state manager starts tracking it.
  • Artist is materialized and state manager starts tracking it. This also causes fix up with initial album.
  • Album 2 is materialized and state manager starts tracking it. Since artist is not null & Artist Id is already tracking, it probably assumed that Artist got replaced (navigation changed which would propagate to FK property).

In EF Core 2.2, query also did fix-up but in 3.0 we decided to make it centralized and let state manager always do the fix-up so behavior could arise.

@rellis-of-rhindleton
Copy link

Is behavior defined somewhere for what to do if a newly materialized entity already has a nav prop set? It seems like a different situation than, say, a default collection property. Not suggesting it should change from 2.x, just looking for insight.

@RickStrahl
Copy link
Author

@smitpatel So are you saying this is by design and as Alex suggests the Artist child should be left at null?

@smitpatel
Copy link
Member

@RickStrahl - It seems that something is happening in state manager rather than query pipeline itself. Now that we know what is the root cause here (initializing reference navigation), I can debug this in EF Core codebase to see what state manager does (as noted above, still just a theory).

Though there are some obvious issue with initializing reference navigation and it could be very well by design that you cannot initialize reference navigation

  • Adding Album would add default artist, rather than throwing that related artist does not exist.
  • Forgetting to add include would never throw, rather give incorrect result.
  • Lazy loading may not work since the navigation is already non-null

Is there a particular reason for initializing reference navigation? That is certainly something which we don't recommend.

@rellis-of-rhindleton
Copy link

From a domain design standpoint, it seems like it'd be more intuitive to either leave the property null or require a non-null instance in the constructor parameters.

But without knowing what the framework expects it gets left to personal tastes. There isn't much guidance on this as far as I can find. If it's not recommended then maybe a page should be added to the docs about best practices for constructors/initialization, and Entity Framework's expectations? Materialization of entities is kind of a black box in EF, unless I'm missing something recent.

@jonsagara
Copy link

I always write my entity collections to be initialized so that I never have to worry about the collection being null:

public List<Thing> Things { get; private set; } = new List<Thing>();

@smitpatel
Copy link
Member

To keep the discussion in correct direction, collection navigations which are initialized are still supported and works correctly. Collection navigations can get away without having a setter too.

For reference navigation, you require a setter property.
I would agree with @rellis-of-rhindleton on the fact that if you are initializing something in ctor, then it is more intuitive for it to be either some collection initialization or initialize based on parameters of ctor.
We would be looking at some docs. We don't have so far because this is first time, we saw that it started causing issue. I believe it stems from the fact that state manager always had this behavior but query itself was doing identity resolution which was hiding it. And query behavior turned out to be different (but worked for the case). In past, Inserting just Album still adds empty Artist to database.

From customer perspective, at least as a work-around, don't initialize reference navigations to some value in constructor which would be used by EF.

We will discuss in team, what is the most useful user experience here. According to discussion, we would add documentation on recommendation or update the state manager to work in this case.

@RickStrahl
Copy link
Author

RickStrahl commented Sep 25, 2019

The more I think about I see that the object initialization shouldn't be necessary, but it is more consistent since you can initialize a collection. When you're talking about object initialization in the terms you do above you are talking about an implementation detail in the framework that is interfering with normal object behavior you would expect.

The use case used to be that you can create a new Album and have an artist that you can start punching values into. EF 6 was smart enough to figure out that the Artist was a new Artist and automatically added it. From a domain design perspective I never want to see a null instance on an object. If I new up a new Album then by default the Artist is an unassigned empty object which may later be filled with data assumed to be a new record. If the entity is loaded I would expect it to get overwritten with the value from the Db.

From my domain perspective I never want Artist to ever be null.

The old EF6 behavior is the behavior I would like to see actually, but that hasn't ever worked in EF Core where you had to explicitly add anyway.

Again for the most part I think this is a consistency issue and if I understand the problem correctly it shouldn't be expensive to check whether the attached property is tracked or not. After all this worked in in 2.2 since that returns the correct results so at the very least this is a breaking change.

@smitpatel
Copy link
Member

Reference navigations which does not have setters are not mapped by default whereas collection navigations without setters are mapped. There is already inconsistency between how reference vs collection navigations are treated. So consistency argument is very weak.
Setting the Artist to null is still possible because you have a setter available. Moreover, a collection initialized still adds whole entity to collection. Whereas reference is assigned the object rather than individual property. Initializing artist and punching values in is not ORM would assign reference navigation (to preserve semantics of materialization of Artist itself.

It's an unexpected breaking change for us because we did not expect or see user initialize reference navigation. We look into either reverting this breaking change or document it.

@rellis-of-rhindleton
Copy link

@smitpatel - for my own understanding, and to make sure we are hearing you right, you're saying that EF ideally deals in whole entities; getting into pieces (properties) of entities is not something the team wants to mess with...?

However you handle the breaking change, it'd be great if this process ends with something written up that explains what the framework expects and how it handles materialization.

@smitpatel
Copy link
Member

That is wrong interpretation of what I said. If you want EF to assign a reference navigation or add something to collection navigation as part of Include, EF will materialize entity instance from server side using materializer and will assign (or add) entity instance. Of course you are free to do whatever if you assign yourself. EF does not use pattern where it assigns album.Artist.ArtistName = "something", when it is asked to Include Artist on album instance it materialized. If you manually write that in your query, we will do that but EF is not going to be tracking that Album or Artist.

@divega divega assigned ajcvickers and unassigned smitpatel Sep 26, 2019
@divega divega added this to the 3.1.0 milestone Sep 26, 2019
@ajcvickers
Copy link
Member

@jeroenheijmans I think you are making very valid points. To me, the distinction comes down to how the child object fits into the domain/data model.

If we consider simple properties of intrinsic types, then it's clear they should behave as you are suggesting. For example, consider this:

public class HumanBody
{
  public int Id { get; set; }
  public string Name { get; set; }

  public HumanBody()
  {
    Name = "John Doe";
  }
}

If I query for this, then EF will first construct the object, and then set the Name property to the value from the database*. So "John Doe" really is just a default.

Now, let's say that I have something a bit more complex than an intrinsic type, but still something that in DDD terminology should be treated as value object:

public class PersonalInfo
{
  public string FirstName { get; set; }
  public string LastName { get; set; }
  public long SSN { get; set; }
}

public class HumanBody
{
  public int Id { get; set; }
  public PersonalInfo Info { get; set; }

  public HumanBody()
  {
    PersonalInfo = new PersonalInfo
    {
      FirstName = "John",
      LastName = "Doe",
      SSN = -1;
    }
  }
}

In this case I think it should still work like the intrinsic type. That is, the value set in the constructor is a default only, and it should be replaced with whatever was in the database when materializing a query. (I'll come back to what this means in EF shortly.)

But now let's say I have something which is itself an entity type with identity. (That is, it's specifically not a value object.) For example:

public class Hat
{
  public int HatId { get; set; }
  public string Color { get; set; }
  public string Style { get; set; }
}

public class HumanBody
{
  public int Id { get; set; }
  public Hat Info { get; set; }

  public HumanBody()
  {
    Hat = new Hat
    {
      Id = 0,
      Style = "Fedora"
    }
  }
}

Consider that in the general case, an entity like Hat can be re-parented from one human body to another. It might also be able to exist without any parent at all. It has its own identity.

So this could mean two things:

  • The Hat set in the constructor is just a default and should be replaced, like with the other cases above
  • Or, the Hat represents an actual Hat entity with its own identity that is referenced by the parent entity.

The intention in EF Core (as in EF6) was to treat it as the latter. That is, if you have an instance of an entity type then it is treated as an entity with its own identity, not a value object. (The bug meant we weren't doing this correctly.) Otherwise, what if this Hat wasn't set in the constructor and isn't a default? Does this mean if the identity is different from what is in the database? What about other property values? And as you get into what to do with graphs it gets more ambiguous.

So what does this mean for EF Core? Fundamentally, it means using entities in place of value objects is a leaky abstraction that doesn't work too well when you get down to the details. This in turn means that owned entity types (which are still entity types even though the key can be in shadow state) should not be used when the semantics of value objects are needed.

Instead, value objects should be mapped as normal properties with value converters to transform to and from the database. This works well when the value object can be converted or serialized into a single column, but we are missing the feature to split this over multiple columns: #13947

Anyway, I hope this explains a bit about the way we're thinking about these scenarios. It's certainly not a cut-and-dry decision. I hope and expect we will continue to learn in this area.


Footnotes

* Note that EF can also use the constructor for cases like this. For example, given this:

public class HumanBody
{
  public int Id { get; set; }
  public string Name { get; set; }

  public HumanBody(string name)
  {
    Name ??= "John Doe";
  }
}

EF will construct the object by calling the constructor and passing in the name from the database. So in this case there is no second step where EF calls the property setter.

ajcvickers added a commit that referenced this issue May 25, 2020
Fixes #18007

Previously it was possible for the existing instance to be replaced when a query brings back a new instance, but only for the first time the new instance is returned. This change prevents the new instance from being used in all cases, which is the agreed behavior here, and also now matches what EF6 does consistently.
@ajcvickers ajcvickers added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label May 25, 2020
ajcvickers added a commit that referenced this issue May 25, 2020
Fixes #18007

Previously it was possible for the existing instance to be replaced when a query brings back a new instance, but only for the first time the new instance is returned. This change prevents the new instance from being used in all cases, which is the agreed behavior here, and also now matches what EF6 does consistently.
ajcvickers added a commit that referenced this issue May 26, 2020
Fixes #18007

Previously it was possible for the existing instance to be replaced when a query brings back a new instance, but only for the first time the new instance is returned. This change prevents the new instance from being used in all cases, which is the agreed behavior here, and also now matches what EF6 does consistently.
ajcvickers added a commit that referenced this issue May 31, 2020
Fixes #18007

Previously it was possible for the existing instance to be replaced when a query brings back a new instance, but only for the first time the new instance is returned. This change prevents the new instance from being used in all cases, which is the agreed behavior here, and also now matches what EF6 does consistently.
ajcvickers added a commit that referenced this issue Jun 1, 2020
…21045)

Fixes #18007

Previously it was possible for the existing instance to be replaced when a query brings back a new instance, but only for the first time the new instance is returned. This change prevents the new instance from being used in all cases, which is the agreed behavior here, and also now matches what EF6 does consistently.
@ajcvickers ajcvickers modified the milestones: 5.0.0, 5.0.0-preview6 Jun 1, 2020
@ajcvickers ajcvickers modified the milestones: 5.0.0-preview6, 5.0.0 Nov 7, 2020
@ajcvickers ajcvickers removed their assignment Sep 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-change-tracking closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported type-bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants