Servers must provide empty blobs that weren't uploaded #131

mostynb · 2020-04-16T05:56:29Z

Some clients refer to empty blobs without previously uploading them, as a performance optimization. Servers must support this.

EdSchouten

In my opinion this is the worst outcome that could have come out of this discussion. It adds unnecessary complexity to a system that would otherwise have been completely regular. To me it is also not evident how this leads to any meaningful increase in performance. Any client with some level of smartness would only download the empty blob once, due to it also using CAS-based storage locally. Because CAS blobs are (generally) only uploaded after calling FindMissingBlobs(), it would also save just one upload. It also doesn't reduce the total number of FindMissingBlobs() calls, as it is extremely unlikely that FindMissingBlobs() is called on just the empty blob.

Smarter ways to solve this would either have been to teach Bazel to upload empty blobs, or to simply disallow Digest messages to exist with size_bytes <= 0. References to empty objects could use null Digests. The latter would solve the problem in a completely obvious, non-ambiguous way, while also reducing the size of, most notably, ActionResult.

That said, I suspect that I am in a minority here, so let's just get this over with.

build/bazel/remote/execution/v2/remote_execution.proto

mostynb · 2020-04-16T07:24:04Z

In my opinion this is the worst outcome that could have come out of this discussion. It adds unnecessary complexity to a system that would otherwise have been completely regular.

Note that I'm documenting existing behaviour for compatibility reasons, not advocating for the design.

Smarter ways to solve this would either have been to teach Bazel to upload empty blobs, or to simply disallow Digest messages to exist with size_bytes <= 0. References to empty objects could use null Digests. The latter would solve the problem in a completely obvious, non-ambiguous way, while also reducing the size of, most notably, ActionResult.

Agreed- I initially tried this approach with bazel-remote based on the spec, but had to resort to this behaviour in order to support bazel. Something to improve for REAPIv3 (since we can't do it without breaking existing clients)?

Some clients refer to empty blobs without previously uploading them, as a performance optimization. Servers must support this. Perhaps we can improve on this for REAPIv3, eg require that empty blobs be represented by null Digests.

scele · 2020-04-17T10:02:14Z

In my opinion this is the worst outcome that could have come out of this discussion. -- I suspect that I am in a minority here

FWIW, I agree with @EdSchouten. Seems backwards to change the spec to be more complex, just because one client (bazel) is not conforming to the existing spec. Why not fix this on bazel side instead?

mostynb · 2020-04-17T14:39:26Z

FWIW, I agree with @EdSchouten. Seems backwards to change the spec to be more complex, just because one client (bazel) is not conforming to the existing spec. Why not fix this on bazel side instead?

IIUC there are many bazel projects still using older versions of bazel, so even if we could convince the bazel team to accept such a change we would probably suffer from incompatibility problems indefinitely.

EricBurnett · 2020-04-17T14:53:05Z

In my opinion this is the worst outcome that could have come out of this discussion. -- I suspect that I am in a minority here

FWIW, I agree with @EdSchouten. Seems backwards to change the spec to be more complex, just because one client (bazel) is not conforming to the existing spec. Why not fix this on bazel side instead?

A few reasons.

The complexity server-side is quite small - upon receipt of a upload, download, or FindMissing request for the empty blob (by hash+size), short circuit. This is probably not even a net increase in complexity, in that it simplifies lower layers and makes it easier to add sanity error-checking to know that they'll never need to deal with 0-byte payloads, any other request observed with a 0-size blob is a client error, etc.
The converse is not true - the client-side complexity is ugly. To do null-blob optimizations clients can simply ignore downloading the empty blob, but have to track global state to be able to ensure that the empty blob is uploaded and present whenever referenced. And this includes potentially "touching" it or downloading it anyways if some action is required to ensure its lifetime is extended remotely, etc.
There are more clients than servers (and that will only grow), so there's extra bias to push complexity server-side.
As noted, for anyone who deals with Bazel (or any other clients making this assumption) it's a practical necessity today.
And finally, this optimization actually matters. I haven't checked stats recently, but from when we implemented this same optimization early on in our bot, we observed some stupid fraction of blob references were for the empty blob. It's weirdly prevalent. And so to not special case it puts more read/write load on the server unnecessarily, risks hotspotting a row in the backing DB because it's the single hottest blob in existence, etc etc.

illicitonion · 2020-04-17T14:56:27Z

In my opinion this is the worst outcome that could have come out of this discussion. -- I suspect that I am in a minority here

FWIW, I agree with @EdSchouten. Seems backwards to change the spec to be more complex, just because one client (bazel) is not conforming to the existing spec. Why not fix this on bazel side instead?

A few reasons.

The complexity server-side is quite small - upon receipt of a upload, download, or FindMissing request for the empty blob (by hash+size), short circuit. This is probably not even a net increase in complexity, in that it simplifies lower layers and makes it easier to add sanity error-checking to know that they'll never need to deal with 0-byte payloads, any other request observed with a 0-size blob is a client error, etc.

The converse is not true - the client-side complexity is ugly. To do null-blob optimizations clients can simply ignore downloading the empty blob, but have to track global state to be able to ensure that the empty blob is uploaded and present whenever referenced. And this includes potentially "touching" it or downloading it anyways if some action is required to ensure its lifetime is extended remotely, etc.

There are more clients than servers (and that will only grow), so there's extra bias to push complexity server-side.

As noted, for anyone who deals with Bazel (or any other clients making this assumption) it's a practical necessity today.

And finally, this optimization actually matters. I haven't checked stats recently, but from when we implemented this same optimization early on in our bot, we observed some stupid fraction of blob references were for the empty blob. It's weirdly prevalent. And so to not special case it puts more read/write load on the server unnecessarily, risks hotspotting a row in the backing DB because it's the single hottest blob in existence, etc etc.

Similarly, we independently implemented this optimisation in another client (Pants) and another server (Scoot) because we found it both simplified implementation, and had noticeable performance effects.

ulfjack · 2020-04-17T15:28:33Z

We had a release blocker because of this special case (thanks to the person who started this thread for us finding it ;-), and it also seems to increase server-side complexity rather than decrease. A server can still special-case this even if it isn't in the spec, so I think the primary question is whether it improves the client or not.

I don't immediately see the client-side value - the client already has to keep track of all the other things, and adding one more doesn't seem to make a big difference.

ulfjack · 2020-04-17T15:29:52Z

The reason we didn't see it before is that it only caused an issue for us if the client attempts to run an action consuming an empty file as an input on a fresh cluster. The empty output is so common that it's basically guaranteed to be there after a few actions.

EdSchouten · 2020-04-17T15:32:51Z

And finally, this optimization actually matters. I haven't checked stats recently, but from when we implemented this same optimization early on in our bot, we observed some stupid fraction of blob references were for the empty blob. It's weirdly prevalent. And so to not special case it puts more read/write load on the server unnecessarily, risks hotspotting a row in the backing DB because it's the single hottest blob in existence, etc etc.

What I've observed is that this optimization is just a drop in the ocean compared to what may actually be done to improve performance. Right now Bazel makes absolutely no attempt to cache knowledge of which blobs exist on the server. FindMissingBlobs() calls tend to be huge, relative to what is actually meaningful to request at that point in time.

For a workload of mine I have observed that if Bazel were to just cache blob existence info for a single minute (which seems very fair), I can obtain a >99.8% hit rate for blobs that ended up existing. Such an optimization has a lot more impact than distinguishing between the empty blob.

EricBurnett · 2020-04-17T16:06:42Z

...and it also seems to increase server-side complexity rather than decrease...

To put numbers to this, I checked and our special-casing of the empty blob is approximately 20 LoC in our server in total, not all of which are strictly required (we do IsEmpty() checks at various layers, sometimes redundantly). These are pretty boring lines of code, e.g. for ByteStream.Read the entire special-case looks like if digest.IsEmpty(dg) { return nil }

I can't quantify the savings we get from knowing that our storage should never legitimately see 0-byte blobs, so let's just ignore that and say that our anecdotal evidence is it's at worst +20 LoC for us.

For a workload of mine I have observed that if Bazel were to just cache blob existence info for a single minute (which seems very fair), I can obtain a >99.8% hit rate for blobs that ended up existing. Such an optimization has a lot more impact than distinguishing between the empty blob.

Certainly bazel is super dumb about blob existence checks today, and so the presence or absence of the empty blob may not be material there. (Noting the fact that bazel exists in the wild with this assumption already, so this point is also somewhat moot.)

Bot side it's a little different though, in that most actions output 2-3 blobs, one of which is usually the empty blob, and so having to upload it is in the realm of +50-100% of blob uploads. Which might be negligible (if all blobs fit in a batch upload, it's a few bytes in an existing RPC); and might not be (e.g. if there's one other file and it's >4MB, we'd be making an additional RPC purely for uploading the empty blob). Though I should also note that we don't require a REAPI spec change to be able to unblock our bots on this, so this is only an argument for why it's useful, not why it's required.

peterebden · 2020-04-17T19:34:36Z

I agree with @scele and @EdSchouten ; if this is supposed to be a generic API, it should be driven by what makes for a clean API rather than "what does Bazel do".

I don't strongly disagree in that this isn't hard to implement on the server side, but as an optimisation it seems clearly better to be implemented on the client side, where a client conforming to the current API could simply optimise away some requests. Why isn't a SHOULD on the client side enough?

sstriker · 2020-04-17T20:17:11Z

Personally I'm fine with including the MUST language for the server side for v2. I think we can revisit in v3. I am not convinced we would end up in a different place.

EdSchouten · 2020-04-17T20:43:37Z

I think we can revisit in v3. I am not convinced we would end up in a different place.

Wouldn't my proposal to use null digests for empty objects even be worth considering for a v3?

sstriker · 2020-04-17T21:13:55Z

@EdSchouten I think it certainly would be consider. Personally, I am just thinking that that may be more complexity for clients and servers than the current requirement for a server to always have the empty blob available.

EdSchouten · 2020-04-17T21:29:34Z

@EdSchouten I think it certainly would be consider. Personally, I am just thinking that that may be more complexity for clients and servers than the current requirement for a server to always have the empty blob available.

You’d be amazed, but that’s often not the case. For certain classes of programming languages, null Protobuf messages are mapped directly to null pointers, option types with value ‘None’, etc. In those programming languages you already need to do null pointer comparisons or pattern matching whenever you access a Digest that’s stored in a message, as it could be null right now. These comparisons could be used to skip loading empty blobs.

This is what I meant with that using null Protobuf messages for this is more obvious and not ambiguous.

EricBurnett · 2020-04-18T03:43:22Z

I don't feel terribly strongly on this issue and will be fine either way, but I also cast my vote for including the MUST language, on the grounds that it's quite simple (and generally required) server-side, and materially simplifies client-side empty-blob optimizations. @EdSchouten <https://github.com/EdSchouten> I'm not sure I understand the connection to using null Digests to represent the empty blob? Seems somewhat orthogonal to me, in that it's mostly a question of whether Fetch or FindMissingBlobs can ever fail on the empty blob, regardless of how we describe "the empty blob". I think I'm missing some context here? At any rate, I think your proposal to change how we describe the empty blob is worth considering for V3, but I had thought it independent of this discussion.

…

On Fri, Apr 17, 2020 at 5:29 PM Ed Schouten ***@***.***> wrote: @EdSchouten <https://github.com/EdSchouten> I think it certainly would be consider. Personally, I am just thinking that that may be *more* complexity for clients and servers than the current requirement for a server to always have the empty blob available. You’d be amazed, but that’s often not the case. For certain classes of programming languages, null Protobuf messages are mapped directly to null pointers, option types with value ‘None’, etc. In those programming languages you already need to do null pointer comparisons or pattern matching whenever you access a Digest that’s stored in a message, as it could be null right now. These comparisons could be used to skip loading empty blobs. This is what I meant with that using null Protobuf messages for this obvious and not ambiguous. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#131 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABREW3B22DBVJ65BFS5KQTRNDC4ZANCNFSM4MJHDL6A> .

mostynb · 2020-04-19T22:35:59Z

I think we have a slightly better chance of reducing interoperability problems if we say "MUST"- so this gets my vote. But I'm also OK with "SHOULD".

Re improving this for REAPIv3, should we move that discussion somewhere else? (If so, where? Are there any other v3 changes already being discussed?)

EricBurnett · 2020-04-20T14:10:44Z

Re improving this for REAPIv3, should we move that discussion somewhere

else? (If so, where? Are there any other v3 changes already being discussed?) There are a few other desired changes pushed to V3, yes. @bergsieker has started collecting them into https://github.com/bazelbuild/remote-apis/milestone/1 , and I just filed #132 to collect another that has been discussed but doesn't have its own bug yet. I would suggest filing a separate Issue for the V3 portion of the discussion and we can get it tagged appropriately.

…

On Sun, Apr 19, 2020 at 6:36 PM mostynb ***@***.***> wrote: I think we have a slightly better chance of reducing interoperability problems if we say "MUST"- so this gets my vote. But I'm also OK with "SHOULD". Re improving this for REAPIv3, should we move that discussion somewhere else? (If so, where? Are there any other v3 changes already being discussed?) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#131 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABREWZ4VG3ZYIWM57OTFV3RNN4FXANCNFSM4MJHDL6A> .

This was made the default in bazelbuild/continuous-integration#957.

mostynb requested review from agoulti, bergsieker, buchgr and ola-rozenfeld as code owners April 16, 2020 05:56

googlebot added the cla: yes Pull requests whose authors are covered by a CLA with Google. label Apr 16, 2020

mostynb mentioned this pull request Apr 16, 2020

Bazel uploads AC entries that reference non-existent (zero-size) CAS blob bazelbuild/bazel#11063

Closed

EdSchouten requested changes Apr 16, 2020

View reviewed changes

build/bazel/remote/execution/v2/remote_execution.proto Outdated Show resolved Hide resolved

Servers must provide empty blobs that weren't uploaded

26c67c2

Some clients refer to empty blobs without previously uploading them, as a performance optimization. Servers must support this. Perhaps we can improve on this for REAPIv3, eg require that empty blobs be represented by null Digests.

mostynb force-pushed the empty_blob_handling branch from af403fc to 26c67c2 Compare April 16, 2020 07:55

mostynb changed the title ~~Servers should provide empty blobs that weren't uploaded~~ Servers must provide empty blobs that weren't uploaded Apr 16, 2020

sstriker approved these changes Apr 16, 2020

View reviewed changes

bergsieker approved these changes Apr 17, 2020

View reviewed changes

EdSchouten mentioned this pull request Apr 20, 2020

V3 idea: No longer allow Digest.size_bytes <= 0 #134

Open

sstriker merged commit 26c67c2 into bazelbuild:master May 2, 2020

mostynb deleted the empty_blob_handling branch May 3, 2020 15:09

santigl pushed a commit to santigl/remote-apis that referenced this pull request Aug 26, 2020

Remove explicit "set -e" from presubmit.yml (bazelbuild#131)

7476b11

This was made the default in bazelbuild/continuous-integration#957.

ulfjack mentioned this pull request Sep 16, 2020

Duplicate file uploads over gRPC bazelbuild/bazel#12113

Closed

peterebden mentioned this pull request Sep 21, 2020

Optimise out uploading the empty blob in a couple of places bazelbuild/remote-apis-sdks#184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Servers must provide empty blobs that weren't uploaded #131

Servers must provide empty blobs that weren't uploaded #131

mostynb commented Apr 16, 2020 •

edited

Loading

EdSchouten left a comment

mostynb commented Apr 16, 2020

scele commented Apr 17, 2020

mostynb commented Apr 17, 2020

EricBurnett commented Apr 17, 2020

illicitonion commented Apr 17, 2020

ulfjack commented Apr 17, 2020

ulfjack commented Apr 17, 2020 •

edited

Loading

EdSchouten commented Apr 17, 2020 •

edited

Loading

EricBurnett commented Apr 17, 2020

peterebden commented Apr 17, 2020

sstriker commented Apr 17, 2020

EdSchouten commented Apr 17, 2020

sstriker commented Apr 17, 2020

EdSchouten commented Apr 17, 2020 •

edited

Loading

EricBurnett commented Apr 18, 2020 via email

mostynb commented Apr 19, 2020

EricBurnett commented Apr 20, 2020 via email

Servers must provide empty blobs that weren't uploaded #131

Servers must provide empty blobs that weren't uploaded #131

Conversation

mostynb commented Apr 16, 2020 • edited Loading

EdSchouten left a comment

Choose a reason for hiding this comment

mostynb commented Apr 16, 2020

scele commented Apr 17, 2020

mostynb commented Apr 17, 2020

EricBurnett commented Apr 17, 2020

illicitonion commented Apr 17, 2020

ulfjack commented Apr 17, 2020

ulfjack commented Apr 17, 2020 • edited Loading

EdSchouten commented Apr 17, 2020 • edited Loading

EricBurnett commented Apr 17, 2020

peterebden commented Apr 17, 2020

sstriker commented Apr 17, 2020

EdSchouten commented Apr 17, 2020

sstriker commented Apr 17, 2020

EdSchouten commented Apr 17, 2020 • edited Loading

EricBurnett commented Apr 18, 2020 via email

mostynb commented Apr 19, 2020

EricBurnett commented Apr 20, 2020 via email

mostynb commented Apr 16, 2020 •

edited

Loading

ulfjack commented Apr 17, 2020 •

edited

Loading

EdSchouten commented Apr 17, 2020 •

edited

Loading

EdSchouten commented Apr 17, 2020 •

edited

Loading