Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checksums #21997

Merged
merged 1 commit into from
Feb 3, 2016
Merged

Checksums #21997

merged 1 commit into from
Feb 3, 2016

Conversation

rullzer
Copy link
Contributor

@rullzer rullzer commented Jan 28, 2016

For #18716

First step at a checksums in file transfers.

The current modular approach allows for implementations of different checksum providers. The example app will just store them in the database and return them if requested. However other apps could be writen that actually validate the data etc.

  • Extra column added to the filecache table.
  • Updated methods to handle extra column
  • dav stuff to get and return the checksum.

@rullzer rullzer added 2 - Developing p2-high Escalation, on top of current planning, release blocker labels Jan 28, 2016
@rullzer rullzer added this to the 9.0-current milestone Jan 28, 2016
@rullzer
Copy link
Contributor Author

rullzer commented Jan 28, 2016

Very early WIP. But feedback is appricaited: @PVince81 @icewind1991 @DeepDiver1975

now time for sleeeep

@PVince81
Copy link
Contributor

Does it really have to be a separate app ? I thought this would be core functionality.

For the DAV stuff, just add code to the existing files plugin with new attributes.

@karlitschek
Copy link
Contributor

i think having it as an app that is enabled by default is not a problem. code modularity is good as long as it doesn't introduce a performance penalty

@rullzer
Copy link
Contributor Author

rullzer commented Jan 29, 2016

@PVince81 well I was thinking that it made more sense to have it as app.

  • On some installations people might want to just store and return the checksums.
  • On others people might want to really verify the checksums.
  • And there might even be installations out there that have checksums in the FS layer by default so just return/fetch that.

I think having it as an app gives us more modularity and easier extensability.

@rullzer
Copy link
Contributor Author

rullzer commented Jan 29, 2016

Ok all the basics are there now.

To test. Upload a file via webdav and set the OC-Checksum header:

curl -u user:pass -X PUT http://server/remote.php/webdav/file -T file -H "OC-Checksum: MD5:mymd5checksum"

Now when you do a propfind and requests the <oc:checksums> property on file.. It gives you back

...
<oc:checksums>
  <oc:checksum type="MD5">mymd5checksum</oc:checksum>
</oc:checksums>
...

And when you get the file you get an additional OC-Checksum header.

@guruz
Copy link
Contributor

guruz commented Feb 1, 2016

@evert Is there a standard property in WebDAV that can be used for (different kinds of) content checksums?

@rullzer
Copy link
Contributor Author

rullzer commented Feb 1, 2016

I'd like some more feedback on this. Is everybody happy with the approach here? @PVince81 @DeepDiver1975 @karlitschek ?

//TODO: Implement
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to a separate function. There are two code paths for Webdav PUT: this one is non-chunking and the other one goes into createFileChunked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaah yes...

@PVince81
Copy link
Contributor

PVince81 commented Feb 1, 2016

I'd like to know what @icewind1991 thinks

*/
public function __construct($path, $storage, $internalPath, $data, $mount, $owner= null) {
public function __construct($path, $storage, $internalPath, $data, $mount, $owner= null, $checksumManager = null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about FileInfo (and the Node classes) becoming too fat.

Adding checksum functions on FileInfo and Node is only useful if we want to expose the checksum directly to apps, for convenience. If we don't, apps can still use the checksum manager from \OC::$server->getCheckSumManager(). For a first version I'd suggest to not expose this here and have the Sabre File node use the checksum manager directly.

Then if in the future there is a request for having the checksum available on the node API / FileInfo, then we can always add it back.

Something to discuss: @icewind1991 @rullzer

@DeepDiver1975
Copy link
Member

Why is there a new table for storing checksums?
We discussed to add this as a column on it's own to the file cache table

@karlitschek
Copy link
Contributor

yes. please store this in the file cache table for performance reasons.
I'm not sure it the manager/provider approach is something we should do today. the current checksum flow is that it is only created in the client so using a storage backend is not useful.
something for later perhaps

@DeepDiver1975
Copy link
Member

rebased - I want all smashbox test green on this one

@rullzer
Copy link
Contributor Author

rullzer commented Feb 3, 2016

request for intergration tests in owncloud/QA#113
Since I think that is by far the best way to ensure this works.

@PVince81
Copy link
Contributor

PVince81 commented Feb 3, 2016

@DeepDiver1975 all green apparently!

@dragotin
Copy link
Contributor

dragotin commented Feb 4, 2016

I don't think the implementation fulfills the requirement as it only can store one checksum type. We agreed that we want to be able to store multiple checksums for the same file. I would highly suggest to implement that now rather than shipping the "simple" version now and be limited by that for long, and/or keep on discussing to enhance that.

@PVince81
Copy link
Contributor

PVince81 commented Feb 4, 2016

@dragotin can you link to the discussion ? It seems @karlitschek might have overlooked it or there was a misunderstanding about the agreement.

@rullzer
Copy link
Contributor Author

rullzer commented Feb 4, 2016

@dragotin I'm sorry if I confused you with the question about multiple checksums recently in IRC. But after that (as you can read in this ticket) the decision was made to follow the original feature description in #18716

Or is the description as it is in #18716 not complete?

@dragotin
Copy link
Contributor

dragotin commented Feb 4, 2016

Well, in #18716 we talk about multiple checksums support. That involves for me that we also store multiple checksums, or get them from the underlying storage (btw, is that supported?) if needed.

I just think that it is not a good idea to go "the simple way for now" way with that. No offense.

@dragotin
Copy link
Contributor

dragotin commented Feb 4, 2016

In addition, this requires a database structure change. We do one now anyway. Why not do it right now, instead of running into the need of another db structure change later?

If we now define that storing one checksum type is enough, well, that is a difference from what was discussed before, but we can live with that as well.

@karlitschek
Copy link
Contributor

hmmm. in my understanding the requirements description was only about one type of checksum. i don't understand why we would switch to a different algorithm later here. so i suggest to keep it simple for 9.0
im the future we MIGHT support server generated checksums where we have to support different types but this need to be planed properly later and decided if we want this at all. so let's please stick to the documented requirement for 9.0 and with one checksum

@DeepDiver1975
Copy link
Member

The checksum as stored in the db holds the prefix with the type e.g. md5:xxxxx - so it is quite possible to store various types of checksums if this is necessary.

What is not possible is that we store two checksums for the same file - but this is anyhow not what we need/want. Is should be enough to ping-pong one checksum per file.

@MTRichards
Copy link
Contributor

Ack! We had the requirement of this:

The current modular approach allows for implementations of different checksum providers. The example app will just store them in the database and return them if requested. However other apps could be written that actually validate the data etc.

But then just now saw this:

i don't understand why we would switch to a different algorithm later here.

As discussed previously, we want to be able to support different backends in the fullness of time, handing of checksums to the backend storage for full end to end checksums, right? I thought customer was specific there, and we were on a path to that. Perhaps it is only one supported in use at a time, but storage backends all have their own checksum algorithms.

So, the real question: Is this just logical step one to the requirement?

As I read through this, it doesn't sound that it is:

this requires a database structure change

@rullzer
Copy link
Contributor Author

rullzer commented Feb 4, 2016

I do still have my old PR in a local branch

@karlitschek
Copy link
Contributor

O.K. Discussed with @MTRichards Let's stick with the current plan for 9.0 as described in the ticket #18716 by me.
For 9.1 we could implement something more advanced by supporting different hashing algorithms or maybe making it possible to leverage the storage. But there is a big MAYBE because we have to keep performance and different hashing and storage backends in mind.

But for now please let's stick to the discussed simple behavior.

@PVince81
Copy link
Contributor

PVince81 commented Feb 5, 2016

I think the current simple approach (with this PR) already allows custom storages to read checksums from the filesystem while scanning and then store them into the cache. So that's already a good thing.

@rullzer
Copy link
Contributor Author

rullzer commented Feb 8, 2016

After thinking some more this weekend I have a simple extention here that might make our lives so much easier in the future. If we want to support multiple checksums eventually (and I think we do... since different checksums have different uses... we even provide 2 on our download page).

The header can already handle that pretty well. Since there we need to do comma separated anyway. This works for both the upload and the download.

However for the propfind it might make sense to convert this into:

<oc:checksums>
  <oc:checksum>TYPE:CHECKSUM</oc:checksum>
</oc:checksum>

This way we at least do not block our future self with a locked in on 1 checksum implementation there is now:

<oc:checksum>TYPE:CHECKSUM</oc:checksum>

Because extentending that will be a mess anyway.

@dragotin
Copy link
Contributor

dragotin commented Feb 8, 2016

@rullzer yes, great catch.

@karlitschek
Copy link
Contributor

agreed

@bugsyb
Copy link

bugsyb commented Feb 9, 2016

Thanks for working on it.

Just a quick question what is expected behavior if some files are added locally and one runs occ files scan command?
This would then need to calculate checksum locally, right? Unless it behaves already as a client and is covered by above conversation.

Just wantd to shed some light on scenario I'm running into often.

@rullzer
Copy link
Contributor Author

rullzer commented Feb 9, 2016

@bugsyb no not yet. Currently we do not calculate/verify checksums on the server side.

@DeepDiver1975
Copy link
Member

no not yet.

@rullzer don't set wrong expectations please - there are no plans currently to do so.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
4 - To release p2-high Escalation, on top of current planning, release blocker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants