Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning to countDocuments() #6713

Closed
bazineta opened this issue Jul 12, 2018 · 10 comments
Closed

Add warning to countDocuments() #6713

bazineta opened this issue Jul 12, 2018 · 10 comments
Labels
docs This issue is due to a mistake or omission in the mongoosejs.com documentation
Milestone

Comments

@bazineta
Copy link

Do you want to request a feature or report a bug?

Request a feature.

What is the current behavior?

The close similarity of count and countDocuments, and the seemingly benign nature of the countDocuments documentation, i.e., Counts number of matching documents in a database collection., will, I suspect, lead users to believe that they are functionally equivalent.

They are not; countDocuments performs a scan.

I suspect that in the majority of cases, estimatedDocumentCount is the appropriate replacement function.

A common situation is that one will replace the deprecated count with countDocuments and verify against a test environment. Presuming the test environment is, as is common, one with collections of a much lower document count than the corresponding production environment, there'll then be a false sense of security imparted. When the code is then promoted to production, perhaps with gigantic collections, the joy of watching the primary server page itself into the ground and a secondary takeover will be yours.

If the current behavior is a bug, please provide the steps to reproduce.

What is the expected behavior?

I think the documentation for countDocuments should be much, much more proactive in warning about this, and directing the user that estimatedDocumentCount is probably what they want instead.

Please mention your node.js, mongoose and MongoDB version.

Mongoose 5.2.3. Node and MongoDB, any supported version.

@simllll
Copy link
Contributor

simllll commented Jul 13, 2018

Maybe just a check if "countDocuments" has a query parameter..if not, warn about it that estimatedDocumentCount is probabaly the right way to go?

@bazineta
Copy link
Author

@simllll Good point. However, is there even a valid use case for countDocuments without a query parameter? I mean, under what circumstances would one actually want to do a full scan when just wanting the document count metadata from the collection?

I really think this is going to hurt a lot of people as it stands. In the documentation, there's the count method that they're used to and familiar with, but they're told it's deprecated. Right below it in the documentation, there's countDocuments, with word-for-word the same documentation as count. They're used to running a no-argument count to get the document count from some very large (for argument's sake, let's say billions of documents) collection.

The documentation doesn't give any indication that a no-argument countDocuments is going to be absolutely disastrous for them.

@vkarpov15 vkarpov15 added this to the 5.2.4 milestone Jul 15, 2018
@vkarpov15 vkarpov15 added the docs This issue is due to a mistake or omission in the mongoosejs.com documentation label Jul 15, 2018
@vkarpov15
Copy link
Collaborator

That's a fair point. We'll add a note to the docs that if you're using count() for a full collection scan, you should replace it with estimatedDocumentCount(). Do you think that's enough?

@bazineta
Copy link
Author

@vkarpov15 It's certainly a step in the right direction.

However, I'm a a loss to find a reasonable use case for an unparameterized countDocuments(). It might be reasonable to push that down the code path for the previous behavior of an unparameterized count(), i.e., reference the metadata so as to avoid the most likely cause of disaster.

That is, unless there's really a valid use case for determining the total number of documents in a collection by iterating through all of them, one by one.

@vkarpov15 vkarpov15 modified the milestones: 5.2.4, 5.2.5 Jul 16, 2018
@vkarpov15
Copy link
Collaborator

Reasonable point but I would like to avoid magically converting countDocuments -> estimatedDocumentCount, and countDocuments scanning the whole collection is useful for testing, so I think we should just add a warning but leave the functionality as is.

@bazineta
Copy link
Author

bazineta commented Jul 18, 2018

I can fully understand your reluctance to magically convert things.

I think my issue is more with the radical nature of the API change in a minor release (and yes, I know it's 'deprecated', but let's be serious, the console is spammed with messages about the deprecation; no one is going to run a production server like that).

I'll provide a sample scenario for you to contemplate, and then I'll shut up about this.

You're giving a talk to a bunch of Mongo newbies about the database and the API. They're excited about the ability of the product to store vast numbers of documents in a collection. You get to the portion of the API that pertains to determining the size of the collection.

Where you say, see this method, the one that returns the count of documents in the collection? Never call that method; in fact, we usually have a lint rule that looks for it. It'll perform a full collection scan, bringing every document in the collection into memory, and the server to its knees...to return a scalar value.

Not a talk I'd want to be giving.

@vkarpov15
Copy link
Collaborator

That's a fair concern, I'm not thrilled with the distinction as well. I imagine the distinction is because the collection metadata might somehow be out of sync, and its worth warning about how this can impact large collections. We'll consider what we can do about this going forward.

@blumentali
Copy link

So what exactly does countDocuments do when it is called with a query parameter? Is it sate to use in this case (and does not do extra scanning / loading of unnecessary things to memory)?

@vkarpov15
Copy link
Collaborator

@blumentali it does an aggregation. It should be at least as safe as calling find() with the same query parameter - it may cause performance issues if your collection is huge and you have an index miss, but outside of that case countDocuments() is safe to use.

@shuwenhe
Copy link

CountDocuments

@Automattic Automattic locked as resolved and limited conversation to collaborators Aug 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
docs This issue is due to a mistake or omission in the mongoosejs.com documentation
Projects
None yet
Development

No branches or pull requests

5 participants