Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move versioning and trashbin functionalities to actual interfaces on the storage level. #12274

Closed
4 tasks
DeepDiver1975 opened this issue Nov 18, 2014 · 32 comments
Closed
4 tasks

Comments

@DeepDiver1975
Copy link
Member

DeepDiver1975 commented Nov 18, 2014

As of today we are facing storage backends (e.g. SWIFT) which come with their own versioning/trash functionality.

We need to have the possibility to respect these functions and give developers the proper handle to implement/integrate these mechanisms.

Proposed Steps:

  • Introduce new interface VersionHandler which is attached to the Storage
  • Move actual local filesystem versioning/trashbin code to an local file system implementation
  • Versioning and trash app will continue to exist as the front end part working on the storage backend interfaces
  • Expose and respect backend capabilities on storage level

@schiesbn @icewind1991 @PVince81 @karlitschek please share your comments and ideas - THX

@karlitschek
Copy link
Contributor

This is the right way to move forward. So we can have different behaviors depending on the storage backend. 👍

@PVince81
Copy link
Contributor

Yes, makes sense.

Should the versioning system be interchangeable or always specific to a storage ?
For example:

  1. Use SWIFT storage with ClassicVersionHandler (the current one that stores in "data")
  2. Use SWIFT storage with HiddenFolderOnTheStorageVersionHandler
  3. Use SWIFT storage with AnotherVersionHandlerThatRemembersEverything (another ficticious one)

This could provide more flexibility and choice about how to store the version, but might also make everything too complex to maintain.
So the best approach might be to only have a single version handler implementation per storage.

@PVince81
Copy link
Contributor

So this means we'll finally have the opportunity to try moving the trash+versions inside the external storage itself, inside a ".trash" and ".versions" hidden folders. This also means a lot of the specialized encryption code required for trash/version can be thrown away, and we can simple use the "View" and let the encryption code take care of the keys in all cases.

@DeepDiver1975
Copy link
Member Author

I would tie the versioning impl to the storage. In case multiple implementations are possible this might be set up using some configuration.

But lets keep this simple for now.

Regarding encryption: we have to consider making encryption a storage wrapper - see icewinds proposal regarding sync performance improvements which is also necessary for engagements in the area of chunking.

What about a small storage hack week soon?

@PVince81
Copy link
Contributor

From what I understand, moving encryption to a storage wrapper isn't something trivial and might need a lot of thinking and changes. But maybe experimenting with it first (POC) is the way to go.

@schiesbn didn't seem to be convinced though.

Storage wrappers have another drawback that might need addressing: the order of execution is dependent on the order in which the wrappers are registered.

@schiessle
Copy link
Contributor

@schiesbn didn't seem to be convinced though.

Just to make this clear, I'm not against moving encryption from the proxies to the storage wrapper. I just think that it is not something which should be done quickly and we should first think about all possible implications. I don't want to throw out matured and quite stable code that easy.

As for moving the trash and versioning to the storage. Generally I like the idea. But also in this case we should first think about all possible implications. For example how will the trash bin retrieve the deleted files in the future? Right now we just have to look into one folder, find all files and can navigate in sub folders, etc. If the deleted files are distributed across the storages we would need to collect them from all storages and create a virtual centralized view for it. Also we need to think about how additional data is handled, e.g. versions and encryption keys also need to be moved to the trash and restored again. I don't like the idea that the storage implementation should handle all this stuff. All doable, but it is not a small task and needs some thoughts first. Just saying this in order to have a feeling about the dimension of this changes.

I'm still not sure if this centralized view on the trash bin is the best idea. Ideally we would keep the deleted files where they are and just hide them. This way we wouldn't need to move around anything. Neither the files nor versions, encryption keys, etc. The trash bin button wouldn't lead to a centralized view of all deleted files but would be a toggle to show/hide the deleted files in the current folder. I think this would be the much more promising future for the trash bin from a ownCloud user point of view.

@LukasReschke
Copy link
Member

I'm still not sure if this centralized view on the trash bin is the best idea. Ideally we would keep the deleted files where they are and just hide them.

I think that might lead to confusion when using external storage such as a mounted windows home directory.

@schiessle
Copy link
Contributor

I think that might lead to confusion when using external storage such as a mounted windows home directory.

Yes, that's a common argument. On the other hand people also open their windows shares and other network drives regularly with their desktop file manager. These desktop file manager also put metadata into this directories (Thumbnails, some create a .trash folder, afaik MacOS creates .DS files,...). People don't care about it, so why should they care about ownCloud specific files?

At the end it boils down to the question how well we want to integrate this storages into ownCloud. Imho the best integration is achieved if we use the storage how it fits best for the ownCloud user experience. Other do the same, as shown with the file manager example.

@PVince81
Copy link
Contributor

Just a little clarification related to "leaving version handling to the storage implementation". I think it's not 100% accurate. I'd say it's up to the storage implementation to choose what implementation to use. Core can still provide the VersionHandler instances (CentralVersionHandler that stores to data dir, StorageVersionHandler that stores directly on storage, etc) but it's up to the storages' getVersionHandler() to return the appropriate and compatible implementation. No need to rewrite new version handlers for every storage.

@schiessle
Copy link
Contributor

No need to rewrite new version handlers for every storage.

That's not what I said. But what's happens if you delete a file on a storage with his own trash handler. How will we handle the encryption keys for example? Either the storage handler needs to handle the encryption keys (as said, I wouldn't like the idea) or we create a hook and let the encryption app handle it. But what does the second solution mean? The encryption app would move the keys to a generic place and need to store additional meta information where the real file is. You would also need this mapping if you want to show previews in the trash bin for example. We would increase the dependency on the database for this metadata. Honestly I'm quite happy that currently we go in the opposite direction, reducing the dependency on the database and and make the key<->file mapping easier. Also what would happen if the storage disappears? The deleted files would be gone but all the meta data and the keys would still be there.

As said, this is all solvable. But please don't underestimate (1) the amount of work, (2) the risk to replace stable and mature code with not so stable and not so mature code and (3) the thoughts we need to put into it before we move into one direction.

@PVince81
Copy link
Contributor

@schiesbn I think we can decouple encryption from versions/trashbin as discussed yesterday. Version handlers should simply use the View to handle versions/deleted files and encryption stuff is handled transparently. We might be able to implement this without having to use storage wrappers.

First step would be to come up with a list of steps / checklist of what we need to be careful with, then address all the points.

@icewind1991
Copy link
Contributor

If encryption were to use file ids to store it's keys instead of paths there would be no need to encryption to do any work for handling encryption keys while moving files in and out the trashbin.

@PVince81
Copy link
Contributor

Since OC7 we have trashbin + versions info in the oc_filecache, so in theory it should be possible to switch to using file ids.

The idea we discussed yesterday was to have the encryption app work at the "data/$user" level instead of "data/$user/files". It means that "files_trashbin" and "files_versions" would be considered like regular folders when working with files. So moving a file to the trashbin from "data/user/files/afile.txt" to "data//user/files_trashbin/afile.txt.d123456" wouldn't be any different than moving a file from "data/user/files/afile.txt" to "data/user/files/subdir/afile.txt". No special handling needed for keys.

@DeepDiver1975
Copy link
Member Author

Since OC7 we have trashbin + versions info in the oc_filecache, so in theory it should be possible to switch to using file ids.

One thing we shall keep in mind is that there are actually systems which no longer use our file cache table because they implement their own cache classes to retrieve any metadata information from the storage itself. In such an implementation versions, trashbin, sharing and storage itself (maybe encryption as well ???) are actually (re)implemented for exactly this storage.

At the end we should not make any assumption about the internal operations of these modules.

In such an environment ownCloud is basically only visualizing the data and taking care about syncronization.

So maybe we have to think about our implementations as of today (sharing, versions, trashbin and encryption) to be the implementation of these interfaces for the regular and local posix based filesystem.

@schiessle
Copy link
Contributor

If encryption were to use file ids to store it's keys instead of paths there would be no need to encryption to do any work for handling encryption keys while moving files in and out the trashbin.

If files are only moved yes, at the moment we copy them so the file ID in the trash bin is not the same as the file ID of the original file. Also what happens if a admin enforced a file cache re-scan? Will the file IDs change? Also in case of bugs it is really hard to debug it and to find the cause of the problem if the keys are only named by the file id. Any bug e.g. if the file ID changes because of a bug in the file scanner or a key gets attached to the wrong file ID would be really hard or even impossible to resolve which would lead to data lose. I'm not sure if it is a good idea to drop human readable key names in favor of file ids even if I see the obvious advantages.

@PVince81
Copy link
Contributor

In the long run we need to find a way to make file ids stable. There are other features that already rely on it like sharing and in the future metadata/favorites/tags.

@schiessle
Copy link
Contributor

In the long run we need to find a way to make file ids stable. There are other features that already rely on it like sharing and in the future metadata/favorites/tags.

Sure, but bugs will always exists. Losing a share because of a bug is one thing, data lose is a different thing. I would like to maintain as few dependencies as possible for file encryption. Ideally you can copy your data folder to a different computer and run a local PHP script to decrypt your files or copy it to a different ownCloud and it works. Both will not be the case if we depend on file IDs.

Think about the encryption bugs you debugged lately and imagine that instead of human readable paths and filenames all keys would be in one folder with names like: 1223243.key, 343647.user1.shareKey, 436743.key,.... This would be a nightmare.

@PVince81
Copy link
Contributor

Note: @MTRichards has mentionned this requirement for external storage as well in this ticket so I added it there as bullet point: #12216 (it gathers multiple requirements/improvements for external storage)

@PVince81
Copy link
Contributor

@schiesbn mind posting your WIP PR here where you're doing the decoupling of the encryption app ?

@schiessle
Copy link
Contributor

@schiesbn mind posting your WIP PR here where you're doing the decoupling of the encryption app?

There is no PR. That's already how it works today. Today all you need for encryption is stored on file system level and you can easily see which key belongs to which file. You can also copy the complete data folder to a new ownCloud no matter of encryption was enabled or not and it will continue to work on the new ownCloud. This wouldn't be the case if we would depend on the file cache ID.

The same way we decoupled the trash bin from the database from OC6 to OC7.

@schiessle
Copy link
Contributor

Another point just come to my mind. If every storage can implement his own trash bin and versioning and use the functionality provided by the storage back-end. What would this mean for example for the expire function for versions? Would we apply our implementation to all versions? What if the storage back-end also expire versions in some way? Would we use the expire method of the back-end, ours, both or can we chose at all? Wouldn't this lead to a quite inconsistent behavior and become highly confusing for the user? The user shouldn't care about the back-end and experience a consistent behavior across all folder.

@MTRichards
Copy link
Contributor

This is an interesting point. And yes, you are correct. The external storage provider would replace our versioning and our trashbin for those files and folders that it owns. This would mean that if you erase a file in sharepoint, and it is in the sharepoint trash it would also appear in the ownCloud trash. In theory anyway. Then when ownCloud deleted trash, it would follow the external storage provider trash rules for those files and folders impacted there.

BUT, to be clear, I am not asking this to be done for SharePoint right now. I am just making sure that we have a path to this sort of capability over time.
@karlitschek

@PVince81 PVince81 mentioned this issue Nov 28, 2014
22 tasks
@PVince81
Copy link
Contributor

PVince81 commented Dec 8, 2014

We seemed to agree today that we'd rather have better integration with the remote storages (ex: Sharepoint, Dropbox) for things regarding trash and versions instead of trying to have consistent behavior between storages.

The Storage interface will be extended:

  • Storage::getVersionHandler()
  • Storage::getTrashbinHandler() (@icewind1991 suggested to merge it with the version handler somehow)
  • Storage::getPreviewHandler() (to manage previews and where to store them)
  • Storage::getChunkHandler()

As a next step we can identify how these interfaces will look like.
The chunk handler is being worked on here already: #12160

@DeepDiver1975 DeepDiver1975 added this to the backlog milestone Mar 21, 2015
@PVince81
Copy link
Contributor

@labkode as discussed

@PVince81
Copy link
Contributor

For trashbin:

The way I see it is that by default every storage would have a hidden ".trashbin" folder (external storage, received shares, etc).

@PVince81
Copy link
Contributor

Hmm, also before we can decide on a "trashbin Webdav API" for public access, we need to sort out whether we still want this "per storage" trashbin because it would influence the format of the endpoint as it might need an extra "storage id" argument.

@labkode
Copy link

labkode commented Sep 14, 2016

@PVince81 the trashbin functionality should be delegated to the storage as an interface to implement, as you mentioned, one could use the default implementation in its own storage.
I would also like to see a version functionality at the same level.

My ideal storage would be the one that can handle itself data, metadata, versions and trashbin without relying on hooks from a top level app like it is the case with the current files_trashbin and files_versions app.

What do you think?

@PVince81
Copy link
Contributor

@labkode my concern at the moment is mostly about the UI and the current app.
Is the "Deleted files" view in the web UI a view of the trashbin from all storages ? Or do we get rid of this view and display deleted files differently ?

Also what about external storages like SMB, do we store files in a hidden folder ".trash" there ? If not, we'd need a fallback like "use the default trashbin in the data folder" instead.

@PVince81
Copy link
Contributor

Regarding versions, there are even more concerns.
If versions are implemented differently per storage, differently not only where the versions are stored but also their behavior. For example the Dropbox storage versions could be based on Dropbox' versions API. The trouble with different behaviors is that it will break the user experience, because versions would behave differently between storages.

This would be even more complex when moving files between storages, because moving files also carry their versions. It means that versions would need to be converted from the local format to the Dropbox format for example. This is too complex.

So maybe we should restrict the spec to "allow storages to store trashed files and versions in a different location" instead of "allow storages to do whatever they want with trashed files and versions" ? Basically having a more specific use case.

Would be good to get more information about the intended use case for this extensibility.
Is it about:

  • different behavior for trash/versions for each storage ? (IMO no go due to above reasons)
  • a different storage location/format for trash/versions but with same behavior (ex: retention) ? (better)
  • ... something else ?

@PVince81
Copy link
Contributor

  • or is it simply about getting rid of hooks/storage wrappers, basically tech debt ?

@PVince81
Copy link
Contributor

Another possibility is to make it possible to have a custom trash/versions handler that can be specified in config.php, basically an alternative backend. The frontend would stay the same. The backend would not be storage dependent, but if needed the backend itself could decide to behave differently based on storage implementations.

Or simply write a new trashbin/versions app that implement the same APIs if the frontend is not important ?

@PVince81
Copy link
Contributor

The spec might have changed a bit, can you update ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants