Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Compound Indexes #812

Closed
wants to merge 15 commits into from
Closed

Conversation

malarzm
Copy link
Member

@malarzm malarzm commented Feb 26, 2014

Fixes #761 , depends on doctrine/mongodb#154

Exceptions I have left Query::getUnindexedFields untouched and adjusted Exception Message (instead of showing unindexed fields there's a list of used ones. I was thinking about comparing existing indexes with fields from query and suggesting developer fields to add/remove to query to match index requiring least changes but finally gave up on this - however if you think it's good idea I'll implement it)

TODO:

  • adjust index checking
  • tests (without getUnindexedFields thing yet)
  • adjust thrown exceptions
  • add check for Sort Order
  • document CompoundIndex
  • document allowLessEfficientIndexes in @document annotation

AFTER REVIEW:

  • remove allowLessEfficientIndexes feature (including docs)
  • get index information from ClassMetadata instead of DB (code & docs)
  • consider text and geospatial indexes
  • consider index intersection ([WIP] Compound Indexes #812 (comment))

cc @dwieeb

@malarzm
Copy link
Member Author

malarzm commented Mar 12, 2014

I had way less time than I expected but finally this is finished :)

@malarzm
Copy link
Member Author

malarzm commented Mar 14, 2014

Not finished, will write why in doctrine/mongodb#154

@malarzm
Copy link
Member Author

malarzm commented Mar 19, 2014

I've added IndexChecker class with logic ported from doctrine/mongodb#154, also added new methods to FieldExtractor and covered them with tests. Still need to rewrite tests from doctrine/mongodb#154 for IndexChecker and add new ones covering $or clauses behavior

@malarzm
Copy link
Member Author

malarzm commented Mar 21, 2014

Ok this time it's finished, feedback would be appreciated :)

ping @jmikola @jwage

@malarzm malarzm changed the title [WIP] Compound Indexes Compound Indexes Mar 21, 2014
* @param \Doctrine\MongoDB\Collection $collection
* @param boolean $allowLessEfficientIndexes
*/
public function __construct(Query $query, Collection $collection, $allowLessEfficientIndexes = true)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shame on PHP for no friendly classes

@imhoffd
Copy link

imhoffd commented Mar 21, 2014

Thank you for all your hard work! I left a few line notes after looking over it.

@jmikola jmikola added this to the 1.0.0-BETA10 milestone Apr 30, 2014
@jmikola
Copy link
Member

jmikola commented Apr 30, 2014

@malarzm: Thanks for taking the time for this PR, and likewise to @dwieeb for reporting/reviewing. I'm going to take a look through this now and hopefully get it merged shortly.

$orClauses = $this->query->getFieldExtractor()->getOrClauses();
foreach ($orClauses as $or) {
$orExtractor = new FieldExtractor($or);
if (!$this->getIndexesIncludingFields($orExtractor->getFields(), true)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the $or clause inspection use true for $findAny, while handling of the main query does not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it looks like it's because you're just checking if at least one index exists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, I don't need to check anything more for $or so just knowing that there is index is enough :)

@jmikola jmikola modified the milestones: 1.0.0, 1.0.0-BETA10 Apr 30, 2014
@jmikola
Copy link
Member

jmikola commented May 1, 2014

@malarzm: Can you elaborate on "fixed issue was a bug but introduced changes might be worth a note in CHANGELOG", which is the outstanding todo item in the PR description?

@malarzm
Copy link
Member Author

malarzm commented May 1, 2014

@jmikola: I'm not sure right now what exactly I had in mind at the time... probably that Exception messages has changed (and that I wanted to change Query::getUnindexedFields as described in Exceptions in PR description but gave up on that). I think that that TODO point can be removed (shall I?) :)

@jmikola
Copy link
Member

jmikola commented May 2, 2014

It it's just a matter of exception messages changing (and not the class), I don't think there's a problem. Feel free to remove the todo item.

@malarzm
Copy link
Member Author

malarzm commented May 2, 2014

Removed

@jmikola jmikola modified the milestones: 1.0.0-BETA11, 1.0.0 May 6, 2014
@jmikola jmikola added the feature label May 6, 2014
return $index;
} else {
$matchingIndexes[] = $index;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you would want to continue after this, as the foreach below is redundant. We've already matched an index (possibly less efficient) by this point, so there is nothing left to do. I'm having difficulty tracking the entire code flow, but is there a chance we might add the index to $matchingIndexes twice with the code as-is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about this one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line wasn't changed but issue is fixed along with removing allowLessEfficientIndexes

@jmikola
Copy link
Member

jmikola commented Jun 3, 2014

After giving this PR some more thought, I have more new feedback:

allowLessEfficientIndexes doesn't seem all that useful and I think the code complexity could be reduced quite a bit without this option. As I understand it, the functionality for this option is "require all query fields to be present in the index". I don't think that's something to encourage, as users are better served creating a modest amount of indexes to support as many queries as possible (there is a point of diminishing return when you need to maintain one index for each query). Most importantly, simply matching field names between queries and indexes doesn't at all tell us that the index actually is efficient in practice (i.e. selective).

I think the motivation behind the enforcement level where allowLessEfficientIndexes is false is achieving something like covered index queries, which is typically not something you expect when using an ODM.

Originally in #761, @dwieeb's bug report was that the index checker was giving a false positive by not respecting the field order (e.g. ODM would assume {a:1, b:1, c:1} was an acceptable index for a query on c). Fixing that is certainly a priority, but I think this PR may go too far.

Lastly, we should consider how this PR will work with special index types such as geo-spatial and full-text. Those are also cases where even a single-field geo-spatial or full-text index could prove to be very selective even if the query includes other fields. For instance, if the geo-spatial index can narrow the number of documents we must consider to one, the cost of checking the additional fields in our query is fairly negligible.

On a side note, a great way to require indexes at the server level is the notablescan option. I will make a note to reference this in the ODM documentation, but I'm aware that it's not for everyone since it is a global option that would affect all databases.

private function getIndexesIncludingFields($fieldsNames, $findAny = false)
{
$matchingIndexes = array();
$indexes = $this->collection->getIndexInfo();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up doing a round-trip to the database server, as it needs to query the system.indexes collection. I realize that is the only authoritative way to determine which indexes are in MongoDB, but the ODM might be better served by relying on the indexes defined in ClassMetadata (and we should be clear about that documentation).

In a production deployment, the overhead of this round-trip is undesirable and users should be expected to ensure that the indexes in their metadata are synced with the database (the schema update command does this).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 but also I think whole requireIndexes() is undesirable behavior in production and should be considered dev feature (maybe it's worth pointing this out in docs as well?) - I'd rather have slow query than 500 page :)

@malarzm
Copy link
Member Author

malarzm commented Jun 4, 2014

allowLessEfficientIndexes intent wasn't to encourage covered queries but was inspired by "only the item and stock fields; however, this index would be less efficient than an index on only item and stock." point from http://docs.mongodb.org/manual/core/index-compound/#prefixes and this is main reason I wanted to introduce that feature. I agree that removing feature will reduce complexity and it will be removed since it's unwanted.

As for text indexes: right now requireIndexes works only in QueryBuilder and before doctrine/mongodb#162 is done there is now way to check it, right?

Last (in self defense) this PR as you said goes further than original bug report required but I thought it was worth it (and also was great opportunity for me to learn few things :) )

I'll update TODO list with everything pointed out.

@jmikola
Copy link
Member

jmikola commented Jun 5, 2014

Last (in self defense) this PR as you said goes further than original bug report required but I thought it was worth it (and also was great opportunity for me to learn few things :) )

I definitely understood the intent of allowLessEfficientIndexes and appreciate you taking the time to implement the PR. My apologies if the last comments came across as too critical, as I didn't want to put you on the defensive.

The MongoDB documentation you cited makes a solid point and those indexes generally will be more efficient; however, I just had reservations about it being an option for ODM to enforce, since we really don't have enough information to say what's most efficient based on the index specs alone (we should trust MongoDB's internal query optimizer for that :)

As for text indexes: right now requireIndexes works only in QueryBuilder and before doctrine/mongodb#162 is done there is now way to check it, right?

That issue is just to add query builder helper methods, but the $text operator is usable today since the builder does let users set their own operators via some general method (Expr::operator() I believe).

Regarding the geo-spatial and text index support, I would suggest removing allowLessEfficientIndexes and then I'll take another look and attempt to write some tests for those. I don't think we'll have too much trouble supporting those with the simpler code path.

@jmikola jmikola modified the milestone: 1.0.0-BETA11 Jun 5, 2014
@jmikola
Copy link
Member

jmikola commented Jun 5, 2014

Note: the index intersection provided in MongoDB 2.6+ is another thing to consider (also covered in this blog post). Two single-field indexes might now be used together to handle a single query, essentially fulfilling the role of a compound index. That would be another argument for relaxing the index checks in this PR.

@malarzm
Copy link
Member Author

malarzm commented Jun 5, 2014

Yup I have to agree it'll simplify everything now. One question though:

Two single-field indexes might now be used together to handle a single query

According to this section of doc Mongo can intersect single-field index with Compound Indexes prefixes and I guess it's also capable of intersecting two Compound Indexes? If so then maintaining `allowLessEfficientIndexes`` would be a nightmare and I'm glad they are about to be removed :D

Btw this feature is now added to TODO list

@jmikola
Copy link
Member

jmikola commented Jun 5, 2014

My understanding is that compound indexes can be intersected, but obviously with respect to their prefixes (field ordering). Geo-spatial and text indexes currently don't support intersection, but that is expected to change down the road.

@malarzm
Copy link
Member Author

malarzm commented Jun 5, 2014

Ok, thanks for confirmation, I'll sleep with this and will try to figure out a way to implement it

@malarzm
Copy link
Member Author

malarzm commented Jun 6, 2014

@jmikola @jwage I'm having troubles with utilizing ClassMetadataInfo::getIndexes() instead of asking Mongo directly:

  1. this method doesn't know about index on _id (already worked around with https://github.com/doctrine/mongodb-odm/pull/812/files#diff-7c7bab5269322a646c1a87fccfd6389aR112)
  2. this method doesn't return information about indexes on EmbeddedDocuments see this failing test

Do you have any thoughts on this?

@jmikola
Copy link
Member

jmikola commented Jun 18, 2014

@malarzm: I don't have time to look into this within the next week; I'll have to come back to it afterwards. At a quick glance, working around the _id index is trivial, but ClassMetadata may need to aggregate indexes of its embedded documents.

@malarzm
Copy link
Member Author

malarzm commented Jun 18, 2014

No problem, I hadn't much time either. I was thinking that it's done for
some reason and maybe you'd remember why it's like that. I'll try to look
into it next week

On Wed, Jun 18, 2014 at 10:56 PM, Jeremy Mikola notifications@github.com
wrote:

@malarzm https://github.com/malarzm: I don't have time to look into
this within the next week; I'll have to come back to it afterwards. At a
quick glance, working around the _id index is trivial, but ClassMetadata
may need to aggregate indexes of its embedded documents.


Reply to this email directly or view it on GitHub
#812 (comment).

@malarzm malarzm changed the title Compound Indexes [WIP] Compound Indexes Mar 17, 2015
@malarzm
Copy link
Member Author

malarzm commented Jul 30, 2016

For the moment being I'm more in favour of dropping requireIndexes feature in 2.0 so I will not be continuing my work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compound index order not respected
4 participants