Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Don't deserialize full document Proto for Query execution #561

Merged
merged 10 commits into from
Jun 27, 2019

Conversation

schmidt-sebastian
Copy link
Contributor

@schmidt-sebastian schmidt-sebastian commented Jun 21, 2019

The PR adds a lazy-deserialization option to Documents. This allows Query filter evaluation without parsing the entire document Proto.

Speedup of a Query that returns 2 documents out of a collection with 1700 documents:

  • With master: 3644ms, 3591ms, 3577ms, 3616ms, 3603ms (avg 3606ms)
  • With PR: 3012ms, 2881ms, 2894ms, 2929ms, 2888ms (avg 2920ms)

These numbers, as always, are from my trusted Nexus 5X.

@schmidt-sebastian
Copy link
Contributor Author

Note: This pulls in firebase/firebase-js-sdk#1903 to fix the unit tests since we now always assert that no Document proto has a zero-version, regardless of source.

@schmidt-sebastian schmidt-sebastian changed the title Don't deserialize full document Proto for Query execution Performance: Don't deserialize full document Proto for Query execution Jun 22, 2019
}
value = value.getMapValue().getFieldsMap().get(path.getSegment(i));
}
return value != null ? serializer.decodeValue(value) : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our OrderBy implementation does not implement a Schwartzian transform so we'll be performing these decodes for each comparison. For a large collection this could potentially eat into the gains you're observing here. I wonder if a cache of FieldPath to FieldValue would hurt here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a cache - but this has to be a ConcurrentHashMap since this function can be called from user land. Please do shout if that is not actually an issue :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can call this, albeit indirectly. I don't really care if we cached these values for that purpose. I wish there were a way to avoid instantiating a new ConcurrentHashMap per Document.

Were you able to see the effect of the cache in a query that had an orderby? If not, this probably isn't justified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the impact of this caching will be hard to quantify. I thought about adding this initially, but was only won over by your comments earlier. We do re-run getField() excessively during the computation of our Views.

I changed the code to lazy-init the field value cache. I do want to change the comparator to compare on the Protos directly, at which point we can remove the cache again.

Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this version a LOT more, but I'm concerned about the differences in equality/hashCode, especially because this ends up having an effect on external users.

}
value = value.getMapValue().getFieldsMap().get(path.getSegment(i));
}
return value != null ? serializer.decodeValue(value) : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can call this, albeit indirectly. I don't really care if we cached these values for that purpose. I wish there were a way to avoid instantiating a new ConcurrentHashMap per Document.

Were you able to see the effect of the cache in a query that had an orderby? If not, this probably isn't justified?

}
private @Nullable final com.google.firestore.v1.Document proto;
private @Nullable final Function<Value, FieldValue> converter;
private ObjectValue objectValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nullable, right? (in the case of the second constructor it stays null initially).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Done.

@@ -45,12 +45,13 @@ public void testEquals() {
assertNotEquals(base, differentData);
assertNotEquals(base, fromCache);

// Note: `base` and `differentData` have the same hash code since we no longer take document
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be right. equals and hashCode have to be consistent or all kinds of hell breaks loose when you use these in hash sets or as keys in a map. If two objects are equal for equality purposes, the contract for hashCode requires that the hashCodes be equal too.

We can't just ignore data in equality. There are a few cases to consider:

  • When version=SnapshotVersion.MIN the documents are just made-up, so data must play a roll
  • When DocumentState isn't synced, the documents are potentially dirty even if they have a version

However if we have a non-MIN version and the DocumentState is SYNCED then we could ignore the data for equality and hashing purposes. This would preserve what you're after here which is avoiding rehydrating the data in the common case of loading a value from the remote document cache.

What do you think of implementing both equality and hashCode according to the principle above? I think that would give most (if not all) the benefit without compromising the contract of equals/hashCode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm. This is a good comment, and I 100% agree with you, but I don't think I broke the contract here in the way that you stated. I removed data from hash code (I am still using document key and version). This might cause more hash collisions, but the invariant that two equal objects have the same hash code should remain intact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! Sorry about misreading that. I didn't realize that this was about an expected collision.

An alternative phrasing that might be less confusing is:

Note: the assertions below that hash codes of different values are not equal is not something that can be guaranteed. In particular base and differentData have a hash collision because we don't use data in the hashCode.

You might also want to call attention to this in an implementation comment in DocumentSnapshot.hashCode(). That way someone won't see that the value is unused in the future and then accidentally "fix" it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea about the additional comment. I added it to Document.hashCode().

@wilhuff wilhuff assigned schmidt-sebastian and unassigned wilhuff Jun 25, 2019
Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -45,12 +45,13 @@ public void testEquals() {
assertNotEquals(base, differentData);
assertNotEquals(base, fromCache);

// Note: `base` and `differentData` have the same hash code since we no longer take document
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! Sorry about misreading that. I didn't realize that this was about an expected collision.

An alternative phrasing that might be less confusing is:

Note: the assertions below that hash codes of different values are not equal is not something that can be guaranteed. In particular base and differentData have a hash collision because we don't use data in the hashCode.

You might also want to call attention to this in an implementation comment in DocumentSnapshot.hashCode(). That way someone won't see that the value is unused in the future and then accidentally "fix" it.

@wilhuff wilhuff assigned schmidt-sebastian and unassigned wilhuff Jun 27, 2019
@schmidt-sebastian schmidt-sebastian merged commit c11a7a4 into master Jun 27, 2019
@schmidt-sebastian schmidt-sebastian deleted the mrschmidt/querywithoutproto branch July 3, 2019 15:46
@firebase firebase locked and limited conversation to collaborators Oct 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants