-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offline blob uploads Phase 2 - Support "offline" blob uploads with detached container #6128
Comments
can someone expand the scenarios here? is it just detached? what about serialize? same session only? different sessions? |
Hi @anthony-murphy, thanks for looking at this. Here's more information about this scenario (along with some thinking of what a solution might look like)--hopefully it will address some of the questions. The scenario we'd like to support is as follows--the host allows editing a detached container, and relies on ExternalContainer.serialize() along with some other storage option of its choice (e.g. IndexedDB) for persistence. At some point in the future, the host may choose to attach the container to a driver, assumed here odsp-driver. The following issues currently exist with blobs:
On point (2)--when we attach later to odsp-driver, we'd like to upload all the blobs to Sharepoint and then replace the old handles in the ops with the new Sharepoint handles, but there's some question around the best way to do that. Here's a sketch of a solution that I've proposed elsewhere--at some point when the host creates the detached container, it provides an instance of an interface (maybe IDetachedBlobStore) with the following methods:
If the container is in a detached state, then the Fluid runtime uses storeBlob() and retrieveBlobForHandle() as listed above to implement runtime.uploadBlob() and IFluidHandle.get(). When the container is attached, listBlobHandles() is used to upload all the blobs to Sharepoint. At this point, the runtime has a correspondence map from the old, local handles to the new Sharepoint handles. So the question at this point is how to map old handles in ops to the new handles. Here's one idea I've floated--the runtime somehow attaches the map with the container data in the driver, either as an op or using the container quorum feature, the outcome is that any client attached to the container can get the map. We assume that every DDS that stores handles in ops will deserialize them to an IFluidHandle via IFluidSerializer.parse(). The idea is that under the hood, if the container is attached, IFluidSerializer.parse() checks if the handle is a local handle, and uses the knowledge of the mapping to translate to a Sharepoint-meaningful IFluidHandle. Anyways, thanks again for looking at this, and please let me know if I can clarify the scenario further. The approaches above are just to help sketch out what we're hoping for (i.e. as an API consumer), I'm definitely assuming your thinking on how to go about doing it is likely better than mine, here. Thanks again! |
Also--another couple of quick comments:
might also be addressed by allowing a container to be detached from one driver and reattached to another one, however I believe the problem of moving the blobs will still be essentially the same as that described here, using ExternalContainer.serialize(). Thanks again so much. |
There are multiple directions we can go here, it would be great to first understand constraints. Some number of questions:
As for workflow, and possible designs, here are some thoughts:
The work on runtime side would include
I'm sure I'm missing some key things here, but worth writing it out as is - I'll keep adding to this write up. |
@coclauso, @DLehenbauer, I've updated my long post above. It would be great for you to take a look and comment, especially on perf aspect of it - I think it would be the most challenging part of whole effort. We could run some experiment with some workloads approximating what users would experience RE uploading that much data and that many blobs in one go, to get a scenes if current approach will work, or not. Number of roundtrips likely can be addressed in reasonable amount of time by adding multi-blob upload API or leveraging summary blob upload path. But there is really no solution for bandwidth here, and we can't leverage multiple parallel uploads (as SPO can't handle it), so we are limited to what single connection can give us. Note that existing file flow already serializes all attachment uploads, so you can experience it in existing product by ensuring you create scenario that hits many uploads at once. If the whole process and perf looks reasonable, we could start breaking it into individual work items. |
Hi @vladsud , Thanks for your response. I’ll try to go through this and add my own comments, for whatever light they can shed. Please point out anything I don’t address well. It’s also possible that @DLehenbauer can reply to some portions.
I’ll start a separate communication for this with folks who can provide more details.
I could imagine something like .[extension].partial or .[extension].incomplete until it’s fully uploaded. On our side, we’re planning to delete the local data once it’s been transferred to Sharepoint. We should make it such that in the case where the app crashes mid-upload, then that wouldn’t happen, and the next time the app was launched, the upload would retry.
This might be partially addressed by a comment that follows. In our application, we reserve blobbing for large chunks of data. I believe they’re binary (i.e. we don’t base64 encode) but could double check.
Most of this seems fine, I’d comment on this, though:
Right now, we’re using blobs to refer to data that should be stored as Sharepoint attachment, and our application has heuristic code to only blob large chunks of data, not small chunks. My suggestion here would be for simplicity not to consider the case of a “small blob” at the runtime level and leave it to the API consumer to ensure that only large blocks of data are blobbed.
I think the question here is, while the document is being transferred to Sharepoint, is it acceptable for some app functionality to block until the transfer is complete? If we decide it is, for simplicity of the API consumer, I'm wondering if the best way to expose this to the app would be to have the async runtime.uploadBlob() function block until the attachment process is complete?
If allowing blob creation during the attachment process rather than blocking isn’t too much harder, it might be worth doing. It could provide a smoother user experience during transfer. Let me know if there's anything else I can provide that can help. Thanks, |
Also, on this:
It's presently the case that attaching a container to Sharepoint takes a number of seconds (approx. 5 or so), generally apps allow the user to edit the document seamlessly during this time so the experience is still good. If this fundamental SPO bandwidth bottleneck were just manifested as a longer attachment waiting time, say a few seconds more, then I'd imagine the existing mechanisms that mask this from the user would be fine here. This issue relates back to these earlier comments:
It could be the right approach here is to allow for the app to create additional blobs during attachment. |
Thanks! So here is modified plan: Pre-work:
Upload process:
Detached serialization / rehydration is easy and small part of it (see prep work), however size should be considered (i.e. is it Ok to serialize all the content of container into single string). Even if it works, it would likely be super-inefficient (too many memory copies, base64 encoding). Future plan likely needs to involve something more complicated, where serialization is done through some interface that supports working with individual binary blobs (this should be considered, but handled outside of this item as future improvement). We should likely fork this item into two
|
It sounds like we're deciding to boot implementing this suggestion from earlier:
A comment I would make is that if this isn't too difficult, it might be worth doing earlier, the reason is that if a product ships at any point with blobs in the serialized snapshot, then whatever functionality is required to deserialize them from this source has to be maintained indefinitely into the future, so there could be some value in doing this in the first version. |
Agree, though I'd point out we are already sort of in that bucket (if anybody uses string-based functionality, then their drafts would either be lost, or we would need to build an adapter to convert it to this new format). That said, I'd modify it to have only 4 methods (blob-related):
No listing, no handles. |
@danielroney - FYI: Breakdown of work items below. Phase1: Support blobs while detached
Phase 2: Upload blobs on attachment
Open issues:
|
This item will track Phase 2 of this effort, comprised of:
--original issue description--
Our recommended path for offline document creation using a detached container does not currently work with SharedTree. This is because the Fluid runtime currently does not support uploading blobs while in a detached state. This is problematic for the SharedTree DDS, which uses Blobs both at the public API layer as well as internally while summarizing.
One potential solution is for the runtime to permit blob uploads while detached, temporarily storing them in memory or persistent storage. During attachment, these blobs will be uploaded to the service prior to sending the initial summary and the IFluidHandle patched with resulting 'absoluteUri'. (For correctness, absoluteUri should be made opaque.)
The text was updated successfully, but these errors were encountered: