Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query index improvements #6376

Merged
merged 10 commits into from
Mar 23, 2023
Merged
12 changes: 8 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
# NEXT RELEASE

### Enhancements
* <New feature description> (PR [#????](https://github.com/realm/realm-core/pull/????))
* None.
* Performance improvement for the following queries ([6376](https://github.com/realm/realm-core/issues/6376)):
* Reduced constant factor and memory use in Query::count() on simple equality queries for string/int/UUID/ObjectID when using an index. This is due to using `StringIndex::find_all_no_copy` instead of copying out all the result object keys into a buffer.
bmunkholm marked this conversation as resolved.
Show resolved Hide resolved
* Significant improvement on Timestamp equality queries when using an index.
bmunkholm marked this conversation as resolved.
Show resolved Hide resolved
* Moderate improvement on Bool equality queries when using an index.
* Moderate improvement on Mixed case insensitive equality queries.

### Fixed
* <How do the end-user experience this issue? what was the impact?> ([#????](https://github.com/realm/realm-core/issues/????), since v?.?.?)
* None.
* Fixed a crash when querying a mixed property with a string operator (contains/like/beginswith/endswith) or with case insensitivity. ([6376](https://github.com/realm/realm-core/issues/6376) since introduction of Mixed)
* Querying for equality of a string on a mixed property was returning case insensitive matches. ([6376](https://github.com/realm/realm-core/issues/6376) since introduction of Mixed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate which matches wasn't working? Was it not taking casing into consideration? So it would return true for "A"="a" where it was supposed to return false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct. I have added an example to clarify.

* Adding an index to a Mixed property on a non-empty table would crash with an assertion. ([6376](https://github.com/realm/realm-core/issues/6376) since introduction of Mixed)

### Breaking changes
* None.
Expand Down
5 changes: 5 additions & 0 deletions src/realm/index_string.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,11 @@ size_t StringIndex::count(T value) const
return m_array->index_string_count(Mixed(value), m_target_column);
}

inline Allocator& StringIndex::get_alloc() const noexcept
{
return m_array->get_alloc();
}

inline void StringIndex::destroy() noexcept
{
return m_array->destroy_deep();
Expand Down
21 changes: 14 additions & 7 deletions src/realm/query.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -992,12 +992,14 @@ void Query::aggregate(QueryStateBase& st, ColKey column_key) const
auto node = pn->m_children[best];
if (node->has_search_index()) {
auto keys = node->index_based_keys();
REALM_ASSERT(keys);
// The node having the search index can be removed from the query as we know that
// all the objects will match this condition
pn->m_children[best] = pn->m_children.back();
pn->m_children.pop_back();
for (auto key : keys) {
auto obj = m_table->get_object(key);
const size_t num_keys = keys->size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this loop really making a difference?

for (size_t i = 0; i < num_keys; ++i) {
auto obj = m_table->get_object(keys->get(i));
if (pn->m_children.empty() || eval_object(obj)) {
st.m_key_offset = obj.get_key().value;
st.match(realm::npos, obj.get<T>(column_key));
Expand Down Expand Up @@ -1312,15 +1314,18 @@ void Query::do_find_all(TableView& ret, size_t limit) const
auto best = find_best_node(pn);
auto node = pn->m_children[best];
if (node->has_search_index()) {
auto keys = node->index_based_keys();
REALM_ASSERT(keys);
KeyColumn& refs = ret.m_key_values;

// The node having the search index can be removed from the query as we know that
// all the objects will match this condition
pn->m_children[best] = pn->m_children.back();
pn->m_children.pop_back();

auto keys = node->index_based_keys();
for (auto key : keys) {
const size_t num_keys = keys->size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK it seems it is a pattern, why do we need to do this rather than a simpler for(const auto& key : keys)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iteration pattern here isn't really part of the optimization. The change is because keys was previously a std::vector<ObjKey> and now it is a IndexEvaluator* which doesn't have built in iterator support (I suppose I could add iterator support if you really wanted to keep the loop the same). The change in type allows us to dynamically fetch the keys directly from BPTree one at a time as needed. The previous way was to always evaluate all the results and return them in the vector. But that is entirely unused when we only have one query condition and are doing a count of results. So the optimization here is to fetch the matching object keys as needed because sometimes we don't need them at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the iterator is a "nice to have" feature, but that's ok to leave it as it is, it is up to you.

for (size_t i = 0; i < num_keys; ++i) {
ObjKey key = keys->get(i);
if (limit == 0)
break;
if (pn->m_children.empty()) {
Expand Down Expand Up @@ -1433,13 +1438,15 @@ size_t Query::do_count(size_t limit) const
auto node = pn->m_children[best];
if (node->has_search_index()) {
auto keys = node->index_based_keys();
REALM_ASSERT(keys);
if (pn->m_children.size() > 1) {
// The node having the search index can be removed from the query as we know that
// all the objects will match this condition
pn->m_children[best] = pn->m_children.back();
pn->m_children.pop_back();
for (auto key : keys) {
auto obj = m_table->get_object(key);
const size_t num_keys = keys->size();
for (size_t i = 0; i < num_keys; ++i) {
auto obj = m_table->get_object(keys->get(i));
if (eval_object(obj)) {
++cnt;
if (cnt == limit)
Expand All @@ -1449,7 +1456,7 @@ size_t Query::do_count(size_t limit) const
}
else {
// The node having the search index is the only node
auto sz = keys.size();
auto sz = keys->size();
cnt = std::min(limit, sz);
}
}
Expand Down
30 changes: 20 additions & 10 deletions src/realm/query_conditions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ struct Contains : public HackClass {
{
if (m1.is_null())
return !m2.is_null();
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return operator()(b1, b2, false, false);
Expand Down Expand Up @@ -133,7 +133,7 @@ struct Like : public HackClass {
{
if (m1.is_null() && m2.is_null())
return true;
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return operator()(b1, b2, false, false);
Expand Down Expand Up @@ -186,7 +186,7 @@ struct BeginsWith : public HackClass {

bool operator()(const QueryValue& m1, const QueryValue& m2) const
{
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return b2.begins_with(b1);
Expand Down Expand Up @@ -232,7 +232,7 @@ struct EndsWith : public HackClass {

bool operator()(const QueryValue& m1, const QueryValue& m2) const
{
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return operator()(b1, b2, false, false);
Expand Down Expand Up @@ -395,7 +395,7 @@ struct ContainsIns : public HackClass {
{
if (m1.is_null())
return !m2.is_null();
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return operator()(b1, b2, false, false);
Expand Down Expand Up @@ -479,7 +479,7 @@ struct LikeIns : public HackClass {
{
if (m1.is_null() && m2.is_null())
return true;
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, since you are repeating this over several ifs, you could have added this check inside Mixed::types_are_comparable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that both of the values could be numerics, so Mixed::types_are_comparable would actually return true, but trying to convert to a string will fail. We don't want to change Mixed::types_are_comparable because that is used in other places for other non-string values.

BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return operator()(b1, b2, false, false);
Expand Down Expand Up @@ -544,7 +544,7 @@ struct BeginsWithIns : public HackClass {

bool operator()(const QueryValue& m1, const QueryValue& m2) const
{
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return operator()(b1, b2, false, false);
Expand Down Expand Up @@ -610,7 +610,7 @@ struct EndsWithIns : public HackClass {

bool operator()(const QueryValue& m1, const QueryValue& m2) const
{
if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary) && Mixed::types_are_comparable(m1, m2)) {
BinaryData b1 = m1.get_binary();
BinaryData b2 = m2.get_binary();
return operator()(b1, b2, false, false);
Expand Down Expand Up @@ -675,8 +675,18 @@ struct EqualIns : public HackClass {

bool operator()(const QueryValue& m1, const QueryValue& m2) const
{
return (m1.is_null() && m2.is_null()) ||
(Mixed::types_are_comparable(m1, m2) && operator()(m1.get_binary(), m2.get_binary(), false, false));
if (m1.is_null() && m2.is_null()) {
return true;
}
else if (Mixed::types_are_comparable(m1, m2)) {
if (m1.is_type(type_String, type_Binary)) {
return operator()(m1.get_binary(), m2.get_binary(), false, false);
}
else {
return m1 == m2;
}
}
return false;
}

template <class A, class B>
Expand Down
Loading