-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query index improvements #6376
Query index improvements #6376
Conversation
This enables StringIndex::find_all_no_copy in more places which should speed up equals queries with many results for indexed Int, ObjectId, UUID and Mixed types. This also fixed a bug on indexed Mixed equality queries that would return case insensitive results.
optimize EqualIns for mixed properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor stuff... great work!
// The node having the search index can be removed from the query as we know that | ||
// all the objects will match this condition | ||
pn->m_children[best] = pn->m_children.back(); | ||
pn->m_children.pop_back(); | ||
for (auto key : keys) { | ||
auto obj = m_table->get_object(key); | ||
const size_t num_keys = keys->size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this loop really making a difference?
KeyColumn& refs = ret.m_key_values; | ||
|
||
// The node having the search index can be removed from the query as we know that | ||
// all the objects will match this condition | ||
pn->m_children[best] = pn->m_children.back(); | ||
pn->m_children.pop_back(); | ||
|
||
auto keys = node->index_based_keys(); | ||
for (auto key : keys) { | ||
const size_t num_keys = keys->size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK it seems it is a pattern, why do we need to do this rather than a simpler for(const auto& key : keys)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iteration pattern here isn't really part of the optimization. The change is because keys
was previously a std::vector<ObjKey>
and now it is a IndexEvaluator*
which doesn't have built in iterator support (I suppose I could add iterator support if you really wanted to keep the loop the same). The change in type allows us to dynamically fetch the keys directly from BPTree one at a time as needed. The previous way was to always evaluate all the results and return them in the vector. But that is entirely unused when we only have one query condition and are doing a count of results. So the optimization here is to fetch the matching object keys as needed because sometimes we don't need them at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the iterator is a "nice to have" feature, but that's ok to leave it as it is, it is up to you.
@@ -2384,7 +2384,7 @@ TEST(Query_TwoColumnsNumeric) | |||
size_t num_expected_matches = num_rows; | |||
if ((lhs_type == type_Mixed) != (rhs_type == type_Mixed)) { | |||
// Only one prop is mixed | |||
num_expected_matches = 6; | |||
num_expected_matches = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This magic numbers change should have some explanation probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to this test. The number of matches changes in this test because I also changed TestValueGenerator::convert_for_test<Mixed>
to return an ObjectId instead of an int in one case.
@@ -22,6 +22,8 @@ | |||
#include <realm.hpp> | |||
#include <realm/column_type_traits.hpp> | |||
|
|||
#include <list> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No so sure this include is needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually required, there is a use of std::list
later on. It was working accidentally in the other tests because another include brings in std::list, but the benchmarks don't do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but it does not show up in the diff, it tricked me.
@@ -479,7 +479,7 @@ struct LikeIns : public HackClass { | |||
{ | |||
if (m1.is_null() && m2.is_null()) | |||
return true; | |||
if (Mixed::types_are_comparable(m1, m2)) { | |||
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, since you are repeating this over several ifs
, you could have added this check inside Mixed::types_are_comparable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that both of the values could be numerics, so Mixed::types_are_comparable
would actually return true, but trying to convert to a string will fail. We don't want to change Mixed::types_are_comparable
because that is used in other places for other non-string values.
CHANGELOG.md
Outdated
* <How do the end-user experience this issue? what was the impact?> ([#????](https://github.com/realm/realm-core/issues/????), since v?.?.?) | ||
* None. | ||
* Fixed a crash when querying a mixed property with a string operator (contains/like/beginswith/endswith) or with case insensitivity. ([6376](https://github.com/realm/realm-core/issues/6376) since introduction of Mixed) | ||
* Querying for equality of a string on a mixed property was returning case insensitive matches. ([6376](https://github.com/realm/realm-core/issues/6376) since introduction of Mixed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate which matches wasn't working? Was it not taking casing into consideration? So it would return true for "A"="a" where it was supposed to return false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct. I have added an example to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
one sync test failure on evergreen is unrelated to these changes |
StringIndex::find_all_no_copy()
which avoids copying out all the results into a temporary buffer of ObjKeys. Taking advantage of that provides a good speed up when onlyQuery::count()
is used and the results aren't actually needed. It also reduces memory use which might be noticeable for queries that return an enormous number of results.Bugs discovered while making changes:
☑️ ToDos