Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse 4: Case-insensitive indexes cause huge performance problems for regex queries #6559

Closed
rdhelms opened this issue Apr 2, 2020 · 20 comments

Comments

@rdhelms
Copy link

rdhelms commented Apr 2, 2020

Issue Description


TL;DR

  • Should we add a hint to regex queries to make sure they don't attempt to use the case-insensitive index?
  • When will parse-server with parse@2.12.0 be published to npm (the change is already on master)?

TS;WM

As documented, Parse 4 now creates case_insensitive_email and case_insensitive_username indexes in addition to the email_1 and username_1 indexes that it has always created. These case-insensitive indexes significantly help the performance for queries that use the MongoCollection.caseInsensitiveCollation() 👍

However...

The MongoDB docs for regex queries mention that regex queries are not collation-aware and unable to utilize case-insensitive indexes:

Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes.

At first this doesn't seem to be a problem...because why would we tell the regex query to use the case-insensitive index? Shouldn't the regex query just continue to use the case-sensitive index, or no index at all? Well...sadly, for whatever reason, when both a case-insensitive index and a case-sensitive index exist and a regex query runs, the MongoDB Query Plan seems to select the case-insensitive index, and the resulting query time is significantly worse. We were seeing regex queries running ~4x slower, and then as CPU use and query backlog increased this effectively crashed our server.

We're in conversation with the Mongo community about why this is the case, as it doesn't seem like the intended behavior. One possibility is that the regex query is selecting the most recently created index for the key being queried, regardless of its collation. We attempted to change the selected index by specifying a collation manually on the regex query, but this didn't seem to have any effect. Whether the query optimizer is behaving as expected is still unknown.

Some good news...

We do seem to have found a workaround. Using the hint() cursor method, we were able to tell our regex queries to use the email_1 index instead of the case_insensitive_email index. On the parse side, this looks like:

const query = new Parse.Query(Parse.User)
query.hint('email_1')

This returned our regex query times back to normal and everything was great 👍

So...I'm opening this issue to

  1. Make everyone aware that this seemed to be a huge performance regression, and what a possible workaround might be
  2. Discuss the possibility of setting the hint() within parse-server for regex queries, so that users don't have to do it themselves

Finally, as a corollary to this, we also wanted to ask about the status of parse-server's version of parse being updated to 2.12.0...we see that the dependency has been updated, but it hasn't been published to npm, yet? Where could we find information about when to expect changes merged to master getting published to npm? The reason we ask is that our workaround using hint() can only work with parse@2.12.0 because that is the version that added support for Parse.Query.hint.

Steps to reproduce

  • Spin up a server using Parse 4 pointing to a database with a large number of users (we were using over 4 million when doing our tests)
  • After the indexes have been created, run a find query using regex against the username or email fields.
  • Delete the case_insensitive_email and case_insensitive_username indexes
  • Run the same query again

For us, doing the above led to an almost 4x improvement in query time.

Expected Results

We expected the regex queries to completely ignore the case-insensitive indexes, and to have the same performance as in parse-server@3.9.0

Actual Outcome

The regex queries are attempting to use the case-insensitive indexes, and are performing ~4x worse.

Environment Setup

  • Server

    • parse-server version : 4.1.0
    • Operating System: various
    • Hardware: various
    • Localhost or remote server?: localhost and Heroku
  • Database

    • MongoDB version: 3.6.12
    • Storage engine: WiredTiger
    • Hardware: unknown
    • Localhost or remote server?: mLab

Logs/Trace

Can add more detail here if there are issues reproducing the issue

@rdhelms
Copy link
Author

rdhelms commented Apr 2, 2020

@acinader

@acinader
Copy link
Contributor

acinader commented Apr 2, 2020

thanks for the great issue!

  1. I think that adding a hint is reasonable if the query is:
  • regex
  • on user table
  • would otherwise use the wrong index for email or username

Is this something you'd be willing to take on?

  1. I'll look at publishing parse-server today. We don't have a formal timeline or process for release timing.....

@MobileVet
Copy link

@acinader It has been an interesting couple days of learning for us and we definitely wanted to pass on the knowledge to the community.

Regarding 1, since the hint() option is now available... it does seem like we are mostly there.

  • Does it make sense that in the Parse server code we 'trap' the condition of someone attempting to search w/ $regex on _User.email and _User.username and force those to use the email_1 index?

@acinader
Copy link
Contributor

acinader commented Apr 2, 2020

I don't have an opinion yet, but probably, right?

cc: @davimacedo

@MobileVet
Copy link

MobileVet commented Apr 2, 2020

Cool. We will definitely handle creating the PR once everyone agrees on the correct approach.

@acinader
Copy link
Contributor

acinader commented Apr 2, 2020

@MobileVet do you have a strong opinion?

After having thought about it for an hour, I think that adding a hint in the limited number of cases where it'll be important would be a good idea -- especially if you guys think you can bite it off ;).

@MobileVet
Copy link

@acinader I am never a fan of 'special cases' or 'automagical' decisions as it were, but we have had 3 people thinking about this for the last 2 days and that is the best solution we could come up with. One clearly NEVER wants to use the new case insensitive index for a regex... the results are abysmal. Without forcing the hint you will certainly get the wrong index.

There are only 2 keys where this matters so that limits the scope some as you said.

My only question, do you think we should still allow a passed in hint to override this special case or is that too much flexibility?

@acinader
Copy link
Contributor

acinader commented Apr 2, 2020

I agree with you.

I definitely think that we should let a passed in hint override. It doesn't introduce a security weakness (that we know of) and there's no other reason not to let the user shoot herself in the foot if wanted (or, more optimistically, there is a good reason, like another index that should be used and has been reasoned about.)

Lmk if you disagree.

@davimacedo
Copy link
Member

Hi guys, since #6322, it is already possible to pass a hint to the query and avoid the case insensitive index, right? Or am I missing something?

@rdhelms
Copy link
Author

rdhelms commented Apr 14, 2020

Hi @davimacedo - I think the issue boils down to

  1. The performance of a regex query attempting to use a case insensitive index is SO bad that updating to parse 4 resulted in major service degradation for us, and we can only assume that this could happen to others as well.
  2. Are we really expecting everyone else who might be impacted by this to have to go through this same difficult process to find the same annoying workaround, or can we successfully manage this for them in one place?

In my mind, the only reason NOT to improve parse-server’s handling of regex queries when indexes are involved would be if for some reason there is a case where a user would prefer to have the regex query use the case-insensitive index, but the MongoDB docs seem to strongly discourage that pretty clearly.

@rdhelms
Copy link
Author

rdhelms commented Apr 14, 2020

Also a suggested implementation of this change has been made here:

#6600

@davimacedo
Copy link
Member

I understand the problem but I am not super fan of the solution as well. In https://github.com/parse-community/parse-server/pull/6600/files#diff-271c623e43af83bc3d858c5ad248451aR117 and https://github.com/parse-community/parse-server/pull/6600/files#diff-271c623e43af83bc3d858c5ad248451aR122 we are doing two more db queries just to know if the indexes exist. In the case I am doing a case insensitive regex, wouldn't the insensitive index perform better? Maybe it would be better just remove this insensitive index creation from Parse Server (since it is not recommended by MongoDB) and let the developers to choose which indexes they want to create.

@rdhelms
Copy link
Author

rdhelms commented Apr 14, 2020

  1. Regarding the extra db queries - those should be extremely fast since they are simply checking whether indexes exist. We can try to collect some data for you to demonstrate the performance improvement. I think you may be underestimating just how slow the regex queries were with case insensitive indexes 🙂 As mentioned above, when the regex query used the wrong index for us with a db of >4.4 million users, the (already slow) query was at least 4 times slower and frequently over 5 times slower.

  2. Case insensitive regex queries are very much NOT faster when using the case insensitive index. See the mongodb docs referenced above for a note about this (but note that even though $regex is "unable to utilize" case-insensitive indexes, mongo does not prevent the query from trying from their end):

https://docs.mongodb.com/manual/reference/operator/query/regex/#index-use
Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes.

  1. Honestly our first gut reaction to all of this was "why is parse creating indexes for us?" But it seemed like the pattern of parse creating indexes on server start had been around so long that simply managing the use of those indexes seemed simpler. We would welcome discussion about whether it's a good idea for parse to be creating indexes, but I do think that would be a much bigger discussion with a much more significant impact and a more significant deployment/update effort. Also, I think it's worth noting that MongoDB does not recommend against case-insensitive indexes in general, they just say that regex queries will not successfully use them.

@MobileVet
Copy link

MobileVet commented Apr 14, 2020

Maybe it would be better just remove this insensitive index creation from Parse Server (since it is not recommended by MongoDB) and let the developers to choose which indexes they want to create.

I see the pros and cons of creating this index, I think the issue is that it becomes the index of choice by Mongo when it should not be.

Stepping back just a bit, I think it is important to recognize that there is NO situation where using the case insensitive index is the correct solution when doing a $regex based query... but that is the result of the current production code for Parse Server.

@acinader and our team agreed on that and if we need to we should revisit that discussion first.

Once we are on the same page regarding the issue, then we can discuss the solution more clearly. I could definitely see the potential for skipping the index queries since we know they exist (as we create them on startup) and we know that using the regular index is ALWAYS preferable to the case insensitive index when doing $regex.

@davimacedo
Copy link
Member

I really don't like adding corner cases like this on the code base even more when we already have the hint option through which the developers can create their own corner cases by themselves. I thought more about the the case insensitive index and I believe it is not a good idea to remove it as well. Since we have a huge performance problem here, I'd vote to add the corner case, but:

  • Do we really need the two additional index verifications since we ensure the existence of both indexes in Parse Server startup?
  • Would it be possible to make the condition even more restrictive, maybe Object.keys(query).length === 1 && (query?.username?.$regex || query?.email?.$regex) && !hint? I am afraid that a query { booleanField: true, username: { $regex: 'something' } } would go with username_1 index while a booleanField_1_username_1 index could exist and be better.

@pocketcolin
Copy link

@davimacedo I agree that corner cases are not ideal and we definitely want to make sure that we don't create additional issues when trying to patch this one.

In response to your two suggestions:

  • I could go either way on this one. Parse Server does ensure that all of those indexes exist on startup, but checking that they exist is extremely efficient (the index table is very small) and it's possible that a user may be deleting/modifying the indexes on startup.
  • Given what we know about Mongo's index selection method, I believe a user trying to use a specific index with a regex query when multiple exist should always explicitly pass the desired index with hint(). Using your example query, if they didn't pass a hint, it's my understanding that the user could not guarantee that Mongo would choose their custom index (booleanField_1_username_1) over the case insensitive index despite the fact that it may be better. As @MobileVet stated, there is NO situation where using the case insensitive index is the correct solution when doing a $regex based query. Thus, I think it's better to force the use of the username_1 index anytime a regex query occurs with username or email (except of course when a custom hint is passed).

@davimacedo
Copy link
Member

@pocketcolin thanks for your comments. @dplewis @acinader any additional thoughts here?

@MobileVet
Copy link

Thanks for the continued discussion on this everyone. Our goal is to improve Parse Server and ensure that others don't experience the pain we did when updating from 3.9 to 4.x. That pain was a result of the new index and our use of $regex in a very limited case... but it cascaded due to the extreme inefficiency of that query. We have made changes on our side and can also use the hint option but it does seem prudent that this failure case is explicitly avoided by the code.

We would really appreciate further thoughts from everyone, including @acinader who responded extremely quickly to our initial post.

Hopefully we can all agree on the appropriate changes soon and get them pulled in ASAP so that the entire community can benefit.

@acinader
Copy link
Contributor

Sorry to have missed out on this for so long...

My first thought is: is there a better solution to #5634 that would allow us to remove the case insensitive index (ci) that is causing the problem. Fixing the underlying issue was important to me as it was causing a set of end-user problems.

After reading Davi's comments, I'm left with the biggest concern of how do we know we can safely 'auto-hint' to the ci index.

Would it be possible to make the condition even more restrictive, maybe Object.keys(query).length === 1 && (query?.username?.$regex || query?.email?.$regex) && !hint? I am afraid that a query { booleanField: true, username: { $regex: 'something' } } would go with username_1 index while a booleanField_1_username_1 index could exist and be better.

Which, if I am reading it right, will fail if you add anything other than email or username.

My uncertain inclination at this point, which contradicts my earlier discussion with Rob, is to document the Query's regex method to explain the problem and suggest the appropriate hint to use rather than introducing the proposed corner-case fix?

@stale
Copy link

stale bot commented May 31, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants