Checksums #21997

rullzer · 2016-01-28T21:44:27Z

First step at a checksums in file transfers.

The current modular approach allows for implementations of different checksum providers. The example app will just store them in the database and return them if requested. However other apps could be writen that actually validate the data etc.

Extra column added to the filecache table.
Updated methods to handle extra column
dav stuff to get and return the checksum.

rullzer · 2016-01-28T21:47:07Z

Very early WIP. But feedback is appricaited: @PVince81 @icewind1991 @DeepDiver1975

now time for sleeeep

PVince81 · 2016-01-29T09:24:43Z

Does it really have to be a separate app ? I thought this would be core functionality.

For the DAV stuff, just add code to the existing files plugin with new attributes.

karlitschek · 2016-01-29T09:49:37Z

i think having it as an app that is enabled by default is not a problem. code modularity is good as long as it doesn't introduce a performance penalty

rullzer · 2016-01-29T10:03:12Z

@PVince81 well I was thinking that it made more sense to have it as app.

On some installations people might want to just store and return the checksums.
On others people might want to really verify the checksums.
And there might even be installations out there that have checksums in the FS layer by default so just return/fetch that.

I think having it as an app gives us more modularity and easier extensability.

rullzer · 2016-01-29T21:18:07Z

Ok all the basics are there now.

To test. Upload a file via webdav and set the OC-Checksum header:

curl -u user:pass -X PUT http://server/remote.php/webdav/file -T file -H "OC-Checksum: MD5:mymd5checksum"

Now when you do a propfind and requests the <oc:checksums> property on file.. It gives you back

...
<oc:checksums>
  <oc:checksum type="MD5">mymd5checksum</oc:checksum>
</oc:checksums>
...

And when you get the file you get an additional OC-Checksum header.

guruz · 2016-02-01T06:10:18Z

@evert Is there a standard property in WebDAV that can be used for (different kinds of) content checksums?

rullzer · 2016-02-01T10:14:43Z

I'd like some more feedback on this. Is everybody happy with the approach here? @PVince81 @DeepDiver1975 @karlitschek ?

PVince81 · 2016-02-01T10:20:24Z

apps/dav/lib/connector/sabre/file.php

+						//TODO: Implement
+					}
+				}
+			}


Move this to a separate function. There are two code paths for Webdav PUT: this one is non-chunking and the other one goes into createFileChunked

aaah yes...

PVince81 · 2016-02-01T10:32:02Z

I'd like to know what @icewind1991 thinks

PVince81 · 2016-02-01T10:37:15Z

lib/private/files/fileinfo.php

 	 */
-	public function __construct($path, $storage, $internalPath, $data, $mount, $owner= null) {
+	public function __construct($path, $storage, $internalPath, $data, $mount, $owner= null, $checksumManager = null) {


I'm a bit worried about FileInfo (and the Node classes) becoming too fat.

Adding checksum functions on FileInfo and Node is only useful if we want to expose the checksum directly to apps, for convenience. If we don't, apps can still use the checksum manager from \OC::$server->getCheckSumManager(). For a first version I'd suggest to not expose this here and have the Sabre File node use the checksum manager directly.

Then if in the future there is a request for having the checksum available on the node API / FileInfo, then we can always add it back.

Something to discuss: @icewind1991 @rullzer

DeepDiver1975 · 2016-02-01T10:40:43Z

Why is there a new table for storing checksums?
We discussed to add this as a column on it's own to the file cache table

karlitschek · 2016-02-01T10:46:36Z

yes. please store this in the file cache table for performance reasons.
I'm not sure it the manager/provider approach is something we should do today. the current checksum flow is that it is only created in the client so using a storage backend is not useful.
something for later perhaps

DeepDiver1975 · 2016-02-03T08:04:37Z

rebased - I want all smashbox test green on this one

rullzer · 2016-02-03T08:19:35Z

request for intergration tests in owncloud/QA#113
Since I think that is by far the best way to ensure this works.

PVince81 · 2016-02-03T10:17:56Z

@DeepDiver1975 all green apparently!

Checksums

dragotin · 2016-02-04T08:54:32Z

I don't think the implementation fulfills the requirement as it only can store one checksum type. We agreed that we want to be able to store multiple checksums for the same file. I would highly suggest to implement that now rather than shipping the "simple" version now and be limited by that for long, and/or keep on discussing to enhance that.

PVince81 · 2016-02-04T09:21:25Z

@dragotin can you link to the discussion ? It seems @karlitschek might have overlooked it or there was a misunderstanding about the agreement.

rullzer · 2016-02-04T09:35:21Z

@dragotin I'm sorry if I confused you with the question about multiple checksums recently in IRC. But after that (as you can read in this ticket) the decision was made to follow the original feature description in #18716

Or is the description as it is in #18716 not complete?

dragotin · 2016-02-04T09:42:13Z

Well, in #18716 we talk about multiple checksums support. That involves for me that we also store multiple checksums, or get them from the underlying storage (btw, is that supported?) if needed.

I just think that it is not a good idea to go "the simple way for now" way with that. No offense.

dragotin · 2016-02-04T09:45:55Z

In addition, this requires a database structure change. We do one now anyway. Why not do it right now, instead of running into the need of another db structure change later?

If we now define that storing one checksum type is enough, well, that is a difference from what was discussed before, but we can live with that as well.

karlitschek · 2016-02-04T10:41:37Z

hmmm. in my understanding the requirements description was only about one type of checksum. i don't understand why we would switch to a different algorithm later here. so i suggest to keep it simple for 9.0
im the future we MIGHT support server generated checksums where we have to support different types but this need to be planed properly later and decided if we want this at all. so let's please stick to the documented requirement for 9.0 and with one checksum

DeepDiver1975 · 2016-02-04T10:46:09Z

The checksum as stored in the db holds the prefix with the type e.g. md5:xxxxx - so it is quite possible to store various types of checksums if this is necessary.

What is not possible is that we store two checksums for the same file - but this is anyhow not what we need/want. Is should be enough to ping-pong one checksum per file.

MTRichards · 2016-02-04T15:37:44Z

Ack! We had the requirement of this:

The current modular approach allows for implementations of different checksum providers. The example app will just store them in the database and return them if requested. However other apps could be written that actually validate the data etc.

But then just now saw this:

i don't understand why we would switch to a different algorithm later here.

As discussed previously, we want to be able to support different backends in the fullness of time, handing of checksums to the backend storage for full end to end checksums, right? I thought customer was specific there, and we were on a path to that. Perhaps it is only one supported in use at a time, but storage backends all have their own checksum algorithms.

So, the real question: Is this just logical step one to the requirement?

As I read through this, it doesn't sound that it is:

this requires a database structure change

rullzer · 2016-02-04T15:41:17Z

I do still have my old PR in a local branch

karlitschek · 2016-02-04T18:22:58Z

O.K. Discussed with @MTRichards Let's stick with the current plan for 9.0 as described in the ticket #18716 by me.
For 9.1 we could implement something more advanced by supporting different hashing algorithms or maybe making it possible to leverage the storage. But there is a big MAYBE because we have to keep performance and different hashing and storage backends in mind.

But for now please let's stick to the discussed simple behavior.

PVince81 · 2016-02-05T08:57:33Z

I think the current simple approach (with this PR) already allows custom storages to read checksums from the filesystem while scanning and then store them into the cache. So that's already a good thing.

rullzer · 2016-02-08T06:57:14Z

After thinking some more this weekend I have a simple extention here that might make our lives so much easier in the future. If we want to support multiple checksums eventually (and I think we do... since different checksums have different uses... we even provide 2 on our download page).

The header can already handle that pretty well. Since there we need to do comma separated anyway. This works for both the upload and the download.

However for the propfind it might make sense to convert this into:

<oc:checksums>
  <oc:checksum>TYPE:CHECKSUM</oc:checksum>
</oc:checksum>

This way we at least do not block our future self with a locked in on 1 checksum implementation there is now:

<oc:checksum>TYPE:CHECKSUM</oc:checksum>

Because extentending that will be a mess anyway.

dragotin · 2016-02-08T09:40:04Z

@rullzer yes, great catch.

karlitschek · 2016-02-08T16:44:47Z

agreed

bugsyb · 2016-02-09T07:21:31Z

Thanks for working on it.

Just a quick question what is expected behavior if some files are added locally and one runs occ files scan command?
This would then need to calculate checksum locally, right? Unless it behaves already as a client and is covered by above conversation.

Just wantd to shed some light on scenario I'm running into often.

rullzer · 2016-02-09T07:44:40Z

@bugsyb no not yet. Currently we do not calculate/verify checksums on the server side.

DeepDiver1975 · 2016-02-09T07:51:47Z

no not yet.

@rullzer don't set wrong expectations please - there are no plans currently to do so.

rullzer added 2 - Developing p2-high Escalation, on top of current planning, release blocker labels Jan 28, 2016

rullzer added this to the 9.0-current milestone Jan 28, 2016

DeepDiver1975 added the in progress label Jan 28, 2016

rullzer force-pushed the checksums branch from 34f3962 to 615dd52 Compare January 28, 2016 21:46

rullzer force-pushed the checksums branch 4 times, most recently from 944698b to 29b2cec Compare January 29, 2016 21:00

rullzer mentioned this pull request Jan 29, 2016

File Transport Checksums Between Server and Desktop owncloud/client#3735

Closed

PVince81 reviewed Feb 1, 2016
View reviewed changes

rullzer mentioned this pull request Feb 3, 2016

Intergration tests for webdav checksums owncloud/QA#113

Closed

DeepDiver1975 added a commit that referenced this pull request Feb 3, 2016

Merge pull request #21997 from owncloud/checksums

621f54d

Checksums

DeepDiver1975 merged commit 621f54d into master Feb 3, 2016

DeepDiver1975 deleted the checksums branch February 3, 2016 10:36

DeepDiver1975 removed the in progress label Feb 3, 2016

rullzer mentioned this pull request Feb 3, 2016

Store client's file checksum as metadata and return it again in downloads and PROPFIND #18716

Closed

rullzer mentioned this pull request Feb 8, 2016

Make checksum propfind future proof #22199

Merged

butonic mentioned this pull request Mar 1, 2016

Adding a column to the filecache is super time consuming (depending on db) #22747

Closed

nickvergessen mentioned this pull request Sep 13, 2016

File Verification nextcloud/server#1381

Closed

lock bot locked as resolved and limited conversation to collaborators Aug 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checksums #21997

Checksums #21997

rullzer commented Jan 28, 2016

rullzer commented Jan 28, 2016

PVince81 commented Jan 29, 2016

karlitschek commented Jan 29, 2016

rullzer commented Jan 29, 2016

rullzer commented Jan 29, 2016

guruz commented Feb 1, 2016

rullzer commented Feb 1, 2016

PVince81 Feb 1, 2016

rullzer Feb 1, 2016

PVince81 commented Feb 1, 2016

PVince81 Feb 1, 2016

DeepDiver1975 commented Feb 1, 2016

karlitschek commented Feb 1, 2016

DeepDiver1975 commented Feb 3, 2016

rullzer commented Feb 3, 2016

PVince81 commented Feb 3, 2016

dragotin commented Feb 4, 2016

PVince81 commented Feb 4, 2016

rullzer commented Feb 4, 2016

dragotin commented Feb 4, 2016

dragotin commented Feb 4, 2016

karlitschek commented Feb 4, 2016

DeepDiver1975 commented Feb 4, 2016

MTRichards commented Feb 4, 2016

rullzer commented Feb 4, 2016

karlitschek commented Feb 4, 2016

PVince81 commented Feb 5, 2016

rullzer commented Feb 8, 2016

dragotin commented Feb 8, 2016

karlitschek commented Feb 8, 2016

bugsyb commented Feb 9, 2016

rullzer commented Feb 9, 2016

DeepDiver1975 commented Feb 9, 2016

Checksums #21997

Checksums #21997

Conversation

rullzer commented Jan 28, 2016

rullzer commented Jan 28, 2016

PVince81 commented Jan 29, 2016

karlitschek commented Jan 29, 2016

rullzer commented Jan 29, 2016

rullzer commented Jan 29, 2016

guruz commented Feb 1, 2016

rullzer commented Feb 1, 2016

PVince81 Feb 1, 2016

Choose a reason for hiding this comment

rullzer Feb 1, 2016

Choose a reason for hiding this comment

PVince81 commented Feb 1, 2016

PVince81 Feb 1, 2016

Choose a reason for hiding this comment

DeepDiver1975 commented Feb 1, 2016

karlitschek commented Feb 1, 2016

DeepDiver1975 commented Feb 3, 2016

rullzer commented Feb 3, 2016

PVince81 commented Feb 3, 2016

dragotin commented Feb 4, 2016

PVince81 commented Feb 4, 2016

rullzer commented Feb 4, 2016

dragotin commented Feb 4, 2016

dragotin commented Feb 4, 2016

karlitschek commented Feb 4, 2016

DeepDiver1975 commented Feb 4, 2016

MTRichards commented Feb 4, 2016

rullzer commented Feb 4, 2016

karlitschek commented Feb 4, 2016

PVince81 commented Feb 5, 2016

rullzer commented Feb 8, 2016

dragotin commented Feb 8, 2016

karlitschek commented Feb 8, 2016

bugsyb commented Feb 9, 2016

rullzer commented Feb 9, 2016

DeepDiver1975 commented Feb 9, 2016