Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve testing docs #3539

Merged
merged 8 commits into from
Jan 24, 2022
Merged

Improve testing docs #3539

merged 8 commits into from
Jan 24, 2022

Conversation

roji
Copy link
Member

@roji roji commented Nov 7, 2021

Closes #2971
Closes #3039
Closes #2699
Closes #2478
Closes #1308

@roji roji force-pushed the Testing branch 5 times, most recently from adb9547 to 98dffc3 Compare November 8, 2021 16:53
@roji roji requested a review from a team November 8, 2021 16:59
@roji roji marked this pull request as ready for review November 8, 2021 16:59
@roji
Copy link
Member Author

roji commented Nov 10, 2021

/cc @JonPSmith - you probably have some thoughts on all of this :)

@JonPSmith
Copy link

Happy to give feedback, but how do I see the updated docs, or should I check back when this goes live?

@JonPSmith
Copy link

PS. I created Five rules for testing with EF Core - you might like to add something like that to your testing docs.

@roji
Copy link
Member Author

roji commented Nov 10, 2021

@JonPSmith unfortunately preview rendering of doc PRs isn't available externally... You can review directly on the markdown changes here if you want (that's what we usually do), or you can wait for this to go live if that's painful.

I'll definitely take a look at your blog posts and the link above!

@JonPSmith
Copy link

JonPSmith commented Nov 11, 2021

I found the testing pages in the roji:Testing branch and looked at the three pages. I assume the index page replaces the current page and the other two pages are new. Here are my comments

Index page (I assume this replaces the current index page)

To be honest I like the current index page with the 3 approaches more than your new index. Manning (the publisher of my books) were clear that you need to quickly explain the key points and then expand on it. Your new index page has some good stuff in it but its a bit complicated for the index page.

Also I wouldn't push the unit test / integration test differences. I find the unit test / integration test split on using a database is a bit false because accessing a test database is so easy. For me an integration test is uses an external service, or running up a complete ASP.NET Core to test end-to-end.

What I really like is you comparison table, and I would definitely keep that and place it earlier. In my book I have a summary image - This article starts with a similar image (in the book I changed "Mocking the database" to "Stubbing the database"). Things like your table and my figure helps people quickly understand and decide what they should do.

integration-testing (new)

I really like this page because it has nice step-by-step explanation.

Personally I use database-per-test class because its quicker.

unit-testing (new)

Interesting, you consider SQLite in-memory as a unit test. I agree but then again I don't see the unit test / integration test divide when using a database.

I hope that helps.

@roji
Copy link
Member Author

roji commented Nov 11, 2021

Also I wouldn't push the unit test / integration test differences. I find the unit test / integration test split on using a database is a bit false because accessing a test database is so easy.

Interesting... I agree that accessing a test database (on the same database system used on production, e.g. SQL Server) is very easy, and should be the "default" thing to do. But we've been seeing a lot of users going down the InMemory path, precisely because they're under the impression that testing against SQL Server is tough. In fact, a lot of the new docs are oriented precisely to dispel that myth, both by sharpening the unit vs. integration testing divide, and by showing practical approaches to successful integration tests.

For me an integration test is uses an external service [...]

I don't think that's part of the accepted definition, at least not the way I understand it... IMHO integration testing is simply about testing the actual components without swapping any of them with a test double.

Think of it this way: if my production database happens to be SQLite (not in-memory), I may still want to isolate my application code and unit-test it without the database (e.g. via a repository pattern). The fact that SQLite happens to be an in-process database rather than something running on some remote server shouldn't be relevant.

[...] or running up a complete ASP.NET Core to test end-to-end.

There's indeed a distinction here between end-to-end and integration testing. End-to-end testing would indeed imply firing up the ASP.NET application and talking to it via HttpClient; or if it's a user-facing website rather than a web API, driving a browser instead, to include the client side (e.g. Javascript) in the test.

But that's another beast altogether. I do think it's useful to cut out certain parts of your application (like ASP.NET, client-side Javascript), while keeping other parts in and not swapping them with test doubles (your production database); it's not necessarily all-or-nothing.

Interesting, you consider SQLite in-memory as a unit test [...]

Well, if your production database is SQL Server, then SQLite in-memory is a testing fake - it's something you use to remove SQL Server from the mix in order to focus on your application code. In that sense it's really no different from the EF Core InMemory provider - both are fake databases in the testing context; even if SQLite is a superior choice, in both cases you're excluding SQL Server, and have no guarantee that what worked on your fake will work in production.


Testing code that accesses a database requires either:
Testing is an important concern to almost all application types - testing allows you to be sure your application work correctly, and makes it instantly known if its behavior regresses in the future. Since testing may affect how your code is architected, it's highly recommended to plan for a testing early, and to ensure good coverage as your application evolves. This section provides an overview of various testing strategies for applications using EF Core, shows how each one can be implemented and discusses best practices.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar: "your application work correctly"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Oxford comma missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very opinionated statement about testing applications. Are we really the ones who should be saying. "you must test!" as opposed to saying, "if you want to test, this might help?"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's "programmatic", though I wouldn't say this is very controversial :) I think it's OK as a general introductory blurb, but if you believe this is too much I can tone it down.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Oxford comma missing.

Do we have a committee standard on the Oxford comma question? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked a scholar from Oxford, and they said the comma must be used. I then asked a scholar from Cambridge, and they agreed. If scholars from Oxford and Cambridge agree on something, then it must be right.


This document outlines the trade-offs involved in each of these choices and shows how EF Core can be used with each approach.
Although there are many kinds of testing, in this section we'll concentrate on two types of automated testing: unit and integration testing. Without going into formal definitions, unit tests are about isolating a single component and testing against it, with all related components (or dependencies) stubbed out. Importantly, unit tests never interact with a real database, but instead replace the database with a [test double](https://martinfowler.com/bliki/TestDouble.html) which fulfills the same contracts. Integration tests, in contrast, test actual components working together - including your production database system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I predict you're going to have a fun time arguing with people who disagree with these statements. Again, I wonder if we should be making these distinctions, as opposed to just showing techniques for different types of tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - these seem like pretty uncontroversial definitions to me... and I even explicitly say we're avoiding formal definitions (and link to Martin Fowler, who specifically talks about replacing real databases)... Do you think it's controversial to say that integration tests include your real database, whereas unit tests do not?

The thing is that within these docs, it's important to make the distinction between these two testing types, since the whole docs are split along those lines. I also think lots of users are a bit blurry on what kind of testing they want (and also what it is they're trying to test), so making this clearer may help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tweaked things a bit around this. I'm still using integration and unit tests in the same way (i.e. with or without the production database system), but I introduce these terms as definitions for simplicity's sake in the new overview page. Hope that helps.

Copy link

@JonPSmith JonPSmith Jan 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @roji,

I also use the phase automated testing when talking about testing EF Core code. The problem with the term integration testing is people thinks that hard when in fact using EF Core is much easier that mocking or stubbing a repository.


## Approach 1: Production database system
Because of the above difficulties with integration tests, developers are frequently urged to concentrate on unit tests first, and have a robust unit test suite which they can run frequently on their machines; integration tests, in contrast, are supposed to be executed much less frequently, and in many cases also provide far less coverage. We recommend giving more thought to integration tests, and that that databases are actually far less affected by the above problems as much as people tend to think:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"We recommend giving more thought to integration tests, and that that databases are actually far less affected by the above problems as much as people tend to think" doesn't parse for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Changed to "We recommend giving more thought to integration tests, and suggest that databases may actually be far less affected by the above problems as much as people tend to think", better?


As described in the previous section, the only way to be sure you are testing what runs in production is to use the same database system.
For example, if the deployed application uses SQL Azure, then testing should also be done against SQL Azure.
1. Most databases can nowadays be easily installed on the developer's machine. Container-based technologies such as Docker can make this very easy, and technologies such as [Github Workspaces](https://docs.github.com/en/codespaces/overview) and [Dev Container](https://code.visualstudio.com/docs/remote/create-dev-container) set up your entire development environment for you (including the database). When using SQL Server, it's also possible to test against [LocalDB](/sql/database-engine/configure-windows/sql-server-express-localdb).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalDb is Windows-only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But SQL Server Express is xplat, and available as a docker image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing last sentence to: "When using SQL Server, it's also possible to test against LocalDB on Windows, or easily set up a Docker image on Linux."


This approach typically uses a mock framework to create a test double of `DbContext` and `DbSet`, and tests against those doubles. Mocking `DbContext` can be a good approach for testing various *non-query* functionality, such as calls to <xref:Microsoft.EntityFrameworkCore.DbContext.Add%2A> or <xref:Microsoft.EntityFrameworkCore.DbContext.SaveChanges>, allowing you to verify that your code called them in write scenarios.

However, properly mocking `DbSet` *query* functionality is not possible, since queries are expressed via LINQ operators, which are static extension method calls over `IQueryable`. As a result, when some people talk about "mocking `DbSet`", what they really mean is that they stub `DbSet` out with an in-memory collection, and then evaluate query operators against that collection in memory, just like a simple `IEnumerable`. This isn't the same as actually mocking or stubbing the database, which would involve stubbing out query **end results**, rather than query inputs (i.e. the database table, as represented by a `DbSet`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
However, properly mocking `DbSet` *query* functionality is not possible, since queries are expressed via LINQ operators, which are static extension method calls over `IQueryable`. As a result, when some people talk about "mocking `DbSet`", what they really mean is that they stub `DbSet` out with an in-memory collection, and then evaluate query operators against that collection in memory, just like a simple `IEnumerable`. This isn't the same as actually mocking or stubbing the database, which would involve stubbing out query **end results**, rather than query inputs (i.e. the database table, as represented by a `DbSet`).
However, properly mocking `DbSet` *query* functionality is extremely difficult, since queries are expressed via LINQ operators, which are static extension method calls over `IQueryable`. As a result, when some people talk about "mocking `DbSet`", what they really mean is that they stub `DbSet` out with an in-memory collection, and then evaluate query operators against that collection in memory, just like a simple `IEnumerable`. This isn't the same as actually mocking or stubbing the database, which would involve stubbing out query **end results**, rather than query inputs (i.e. the database table, as represented by a `DbSet`).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe, "practically impossible". You can do it--you basically write your own LINQ provider for your tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as I wrote above - I think we're jumping between two definitions of what it means to mock DbSet/query.

I think most people understand this to mean having a DbSet backed by an in-memory collection, and applying Enumerable operators to it (technically faking rather than mocking). That's easy to do, but not really more useful than the in-memory provider.

The 2nd meaning (use a mock/fake LINQ provider) is sort-of what we provide with the in-memory provider; sure, we provide a fake EF provider rather than non-EF LINQ provider, but that's quite similar. Obviously users writing their own LINQ provider makes no sense, and I'm not sure what benefits that would bring over our in-memory provider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users writing their own LINQ provider makes no sense

Exactly.

Obviously.

To you... :-D


Since LINQ queries are no longer part of testing, you can directly provide query results to your application. Put another way, the previous approaches roughly allow stubbing out *query inputs* (e.g. replacing SQL Server tables with an in-memory one), but then still execute the actual query operators in-memory; the repository pattern, in contrast, allows you to stub out *query outputs* directly, allowing for far more powerful and focused unit testing.

One possible disadvantage of the repository pattern is that it requires you to architect your application accordingly: application code can no longer execute LINQ queries directly, but must instead call into a repository method, which executes the query (and is stubbed in testing). This can make for a somewhat heavier application architecture, but can also have other advantages: all data access code is concentrated in one place rather than being spread around the application, and if your application needs to support more than one database, the repository abstraction can be very helpful for tweaking queries across providers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to call out even more explicitly the disadvantages of a repository pattern. Maybe something like:

LINQ is a very powerful feature of .NET. It allows applications to compose complex queries that are translated and executed against the database without the need to shape the data layer to explicitly support these queries. In addition to being powerful, this also provides a great deal of flexibility for the evolution of applications over time.

Using a repository pattern to facilitate unit testing as described here necessarily prevents applications from using LINQ in this way, since if the repository itself exposes LINQ, then the testing problem has not been solved. This means that there is a significant cost involved in implementing and maintaining a repository. This cost should not be discounted when making a choice on how to test an application, especially given that integration tests for the queries generated by the repository will still be needed anyway.

It's worth noting that repositories do have advantages outside of just testing. They ensure all data access code is concentrated in one place rather than being spread around the application, and if your application needs to support more than one database, then the repository abstraction can be very helpful for tweaking queries across providers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth mentioning that leaking IQueryable while sometimes frowned upon, allows querying to become a minor cross-cutting concern. To me, this is actually quite powerful, especially given that IQueryable is part of the language itself, and so leaking it doesn't create any unwanted dependencies.

( https://chillicream.com/blog/2020/03/18/entity-framework )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atrauzzi the moment you leak IQueryable, you lose the ability to stub/mock your repository layer - which at least in this context is the main reason for having a repository in the first place... I get what you're saying: if you have a complex query used in multiple places, and you want to compose further operators over it in those places, then yeah, you can wrap it in a method returning IQueryable for further composition. But if you want to preserve the ability to stub out your repository, each use of that query itself must be implemented in a method which doesn't return IQueryable, so that it can be stubbed/mocked.

Bottom line, I think this would be more confusing than helpful in this specific conversation, which is about testing/stubbing/mocking (GraphQL is a very specific case which again is somewhat orthogonal to testing).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajcvickers I get what you're saying: it's indeed important to stress that testable repositories can't return IQueryable. But I wouldn't say a repository pattern prevents composing complex queries with different result shapes, or any other composable operators (I think that's what you're saying). The basic limitation is that whatever API your product code uses must itself not return IQueryable; but a repository can have a private Queryable-returning method representing a complex query, and multiple public methods which compose over that (projections, filters, whatever) to return IEnumerable.

You could definitely argue that this is more laborious, as each data shape requires another IEnumerable-returning repository method, but that's just the regular price of the pattern (and different from saying that a repository prevents anything, no?)

I've integrated parts of your suggested text above, and hopefully made it clearer that the repository pattern has a significant implementation/maintenance cost - let me know what you think, am open for further changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atrauzzi I'm in favor of using IQueryable throughout application code. Hence I'm not in favor of the repository pattern. Something that looks like a repository but exposes IQueryable...is DbContext/DbSet. :-)

@roji

But I wouldn't say a repository pattern prevents composing complex queries with different result shapes, or any other composable operators

I'm explicitly not saying this. I'm saying you have to write and maintain code in the repository to do this. There are good ways to do this. However, they are not IQueryable, and are (necessarily and by-design) more restrictive than IQueryable. This is what you lose/gain from having the repository; not the ability to have parameterized/extendable queries.

## Summary

* We highly recommend that developers have good test coverage of their application running against their actual production database system. This provides confidence that the application actually works in production, and with proper design, tests can execute reliably and quickly. Since integration tests are required in any case, it's a good idea to start there, and add unit tests later, based on need.
* For unit testing, we recommend implementing the repository pattern, allowing you to properly stub or mock out your data access layer above EF Core, rather than via a fake EF Core provider (Sqlite/InMemory) or by mocking `DbSet`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* For unit testing, we recommend implementing the repository pattern, allowing you to properly stub or mock out your data access layer above EF Core, rather than via a fake EF Core provider (Sqlite/InMemory) or by mocking `DbSet`.
* If unit tests for queries are absolutely required, then we recommend implementing the repository pattern, allowing you to properly stub or mock out your data access layer above EF Core, rather than via a fake EF Core provider (Sqlite/InMemory) or by mocking `DbSet`.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really strong - I tried to not be that categorical about rejecting unit testing. Can we compromise and remove the "absolutely"? 🤣


SQLite can easily be configured as the EF Core provider for your test suite instead of your production database system (e.g. SQL Server); consult the [SQLite provider docs](xref:core/providers/sqlite/index) for details. However, it's usually a good idea to use SQLite's [in-memory database](https://sqlite.org/inmemorydb.html) feature when unit testing, since it provides easy isolation between tests, and does not require dealing with actual SQLite files.

To use in-memory SQLite, it's important to understand that a new database is created whenever a low-level connection is opened, and that it's deleted what that connection is closed. In normal usage, EF Core's `DbContext` opens and closes database connections as needed - every time a query is executed - to avoid keeping connection for unnecessarily long times. However, with in-memory SQLite this would lead to resetting the database every time; so as a workaround, we open the connection before passing it to EF Core, and arrange for it to be closed only when the test completes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this behavior changed in any way due to pooling? /cc @bricelam

Copy link
Member

@ajcvickers ajcvickers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff!

roji and others added 5 commits January 23, 2022 21:31
Co-authored-by: Arthur Vickers <ajcvickers@hotmail.com>
Co-authored-by: Arthur Vickers <ajcvickers@hotmail.com>
Co-authored-by: Arthur Vickers <ajcvickers@hotmail.com>
Co-authored-by: Arthur Vickers <ajcvickers@hotmail.com>
@roji
Copy link
Member Author

roji commented Jan 23, 2022

@ajcvickers pushed fixes and tweaks - as there are some unresolved discussions above, we can briefly discuss tomorrow during the day and merge in the evening?

@ajcvickers
Copy link
Member

I think the fundamental problem here is that when I read this document end-to-end and then use principled reasoning, I come to the conclusion that I must use a repository layer. More specifically:

  • Principle: Unit tests are extremely important and I must have them.
  • Principle: I want to follow best-practice guidance from the team.
  • Assertion from this document: To unit test an EF Core application, it is highly recommended I use a repository layer.
  • Conclusion: I must use a repository layer in my EF Core application.

But I don't think this is the correct conclusion. For most applications, I don't think adding a repository layer is necessary or beneficial. So either I'm wrong about this, or this document is leading me to the wrong conclusion. I think it would be a good idea to discuss with @bricelam @AndriySvyryd, @maumar, @smitpatel, and @JeremyLikness whether or not my view (and my understanding of the team view) is wrong. But assuming I am not wrong, then why does this document lead me to the wrong conclusion? Some possibilities:

  1. My principle is at odds with the intent of this document. Specifically, I would use a real database for my tests, but I don't know that I would call those tests "integration tests". This is why I am somewhat uncomfortable using the terms "integration testing" and "unit testing". They are highly-charged and yet ambiguous terms, and I don't think using them here necessarily results in clarity.
  2. My principle is wrong. What I really mean is, "Unit Tests are extremely important and I must have them." This is essentially the same as the first point, but the principle I stated including the word "unit" is very widely and very strongly held, and we're not going to change people's mind on it.
  3. My principle is too strong. I need unit tests for everything except interactions with the database. Again, a hard sell for a widely and strongly held belief.

So, fundamentally, if we're going to make the distinction between "integration testing" and "unit testing", then I think we have to make it absolutely clear that we're saying what we define as "unit testing" is not something that is always needed. We have to be very strong on this point, because we're going against a widely and strongly held belief.

This means we could resolve this by simply removing the distinction and labeling of the different kinds of tests. This will leave the door open for questions like, "but how do I unit test?" I think this is okay; the answer is that you don't need to, or that, if you prefer, you can still think of your tests as unit tests even though they use a real database. But, if we are going to make this distinction, then we also have to be absolutely clear that we are not, in general, recommending the repository pattern.

@ajcvickers
Copy link
Member

Re: the LINQ/IQueryable issue. When we talk about a repository pattern, what we are talking about is not exposing LINQ/IQueryable to most of the application. I think it's very important that we emphasize this because:

  • It reduces flexibility and creates significant overhead. This is not immediately obvious.
  • Many people believe that a repository that exposes IQueryable is an acceptable (even good) pattern. I don't agree with this, but that doesn't really matter. What matters is that people understand that this kind of repository is not what we're talking about here and won't help.

@roji
Copy link
Member Author

roji commented Jan 24, 2022

OK, it seems like the best course of action is to get rid of the terms "unit" and "integration" tests. I'll change the text.

Re: the LINQ/IQueryable issue. When we talk about a repository pattern, what we are talking about is not exposing LINQ/IQueryable to most of the application. I think it's very important that we emphasize this because:

I agree, and I think that's already mentioned both in the testing strategies page and in the implementation page. If this still seems insufficient to you I can make this into a more sizable (and scary) paragraph.

Copy link
Member

@ajcvickers ajcvickers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! With the change not to say "unit" or "integration" I think this is now excellent and clear guidance. Bravo!

@roji
Copy link
Member Author

roji commented Jan 24, 2022

Thank you! I even grudgingly agree that this will head off some worthless discussions and possible confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants