Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial checksum support for ocis #1400

Merged
merged 12 commits into from
Jan 25, 2021
Merged

initial checksum support for ocis #1400

merged 12 commits into from
Jan 25, 2021

Conversation

butonic
Copy link
Contributor

@butonic butonic commented Jan 14, 2021

TL;dr

Fixes owncloud/ocis#1291

Long version

The ocis driver now calculates the sha1, md4 and adler32 checksums when finalizing uploads. While this adds significant io because every file that was written has to be read it ensures that the hashes we store in the metadata reflect what was wraitten to the storage. We may improve performance in two ways:

  1. run a go routine to calculate the hashas asynchronously
  2. for uploads that are finished within one request calculate the hash when writing the file (although this would miss the point of the checksum, but this is what we do in oc10)

For chunked uploads this already does the right thing: read the completed file from disk and calculate the hashes.

oc10 supports multiple checksums for a file, even if the oc10 db contains only one checksum column and the propfind returns sth like

        <oc:checksums>
          <oc:checksum>SHA1:941f6bb205d2e9f MD5:aeoaeoaeoaeoaeo ADLER32:iaeoioeao</oc:checksum>
        </oc:checksums>

that is a multi value tag badly encoded as a single string / xml tag value. 🤦 ... legacy reasons ... 😭

Anyway, the cs3 Resource info can only transport one Checksum. @ishank011 @labkode We should let it hold multiple checksums.

History:
introduced with owncloud/core#22199 ... here the idea still was:

<oc:checksums>
  <oc:checksum>TYPE1:CHECKSUM1</oc:checksum>
  <oc:checksum>TYPE2:CHECKSUM2</oc:checksum>
</oc:checksum>

no idea where it broke. tried the related oc10 issues:
owncloud/core#18716
owncloud/client#3735
owncloud/ios-legacy#719
owncloud/core#22711
owncloud/core#11811
owncloud-archive/documentation#2964

I only learned that the desktop client onld reads and sends the OC-Checksum header.

Will continue digging tomorrow...

@butonic butonic requested a review from labkode as a code owner January 14, 2021 15:56
@update-docs
Copy link

update-docs bot commented Jan 14, 2021

Thanks for opening this pull request! The maintainers of this repository would appreciate it if you would create a changelog item based on your changes.

@butonic butonic marked this pull request as draft January 14, 2021 15:56
@butonic
Copy link
Contributor Author

butonic commented Jan 14, 2021

todo:


runsh: Total unexpected failed scenarios throughout the test run:
--
983 | apiOcisSpecific/apiMain-checksums.feature:18
984 | apiOcisSpecific/apiMain-checksums.feature:19
985 | apiOcisSpecific/apiMain-checksums.feature:23
986 | apiOcisSpecific/apiMain-checksums.feature:61
987 | apiOcisSpecific/apiMain-checksums.feature:62


@butonic
Copy link
Contributor Author

butonic commented Jan 15, 2021

The original pr that should have mades checksums more future proof expects a comma separated list of checksums. see owncloud/core#21997 (comment) for the first mention of commas.

PR to fix this is owncloud core is: owncloud/core#38304 with an analysis of what went wrong ...

pkg/storage/fs/ocis/node.go Outdated Show resolved Hide resolved
pkg/storage/fs/ocis/ocis.go Outdated Show resolved Hide resolved
@butonic
Copy link
Contributor Author

butonic commented Jan 18, 2021

k, this fixes 22 checksum features. the remaining failures:


/drone/src/tmp/testrunner/tests/acceptance/features/apiMain/checksums.feature:58 | 70s
-- | --
1264 | /drone/src/tmp/testrunner/tests/acceptance/features/apiMain/checksums.feature:67 | 70s
1265 | /drone/src/tmp/testrunner/tests/acceptance/features/apiMain/checksums.feature:119 | 70s
1266 | /drone/src/tmp/testrunner/tests/acceptance/features/apiMain/checksums.feature:129 | 70s
1267 | /drone/src/tmp/testrunner/tests/acceptance/features/apiMain/checksums.feature:138 | 70s
1268 | /drone/src/tmp/testrunner/tests/acceptance/features/apiMain/checksums.feature:147 | 70s
1269 | /drone/src/tmp/testrunner/tests/acceptance/features/apiMain/checksums.feature:158 | 70s
1270 | /drone/src/tmp/testrunner/tests/acceptance/features/apiMain/checksums.feature:174 | 70s
1271 | /drone/src/tmp/testrunner/tests/acceptance/features/apiMain/checksums.feature:192 | 70s
1272 | /drone/src/tmp/testrunner/tests/acceptance/features/apiMain/checksums.feature:258 | 70s
1273 | /drone/src/tmp/testrunner/tests/acceptance/features/apiMain/checksums.feature:312 | 70s
1274 | /drone/src/tmp/testrunner/tests/acceptance/features/apiMain/checksums.feature:324


are related to chunked upload or sharing and subject of a subsequent PR.

@labkode
Copy link
Member

labkode commented Jan 19, 2021

@butonic is this compatible with existing sync clients?

@butonic
Copy link
Contributor Author

butonic commented Jan 19, 2021

@labkode yes. I made sure that the rendering, as weird as it looks, matches what clients expect. Unfortunately, the desktop client would not properly handle correct xml like

<oc:checksums>
  <oc:checksum>TYPE1:CHECKSUM1</oc:checksum>
  <oc:checksum>TYPE2:CHECKSUM2</oc:checksum>
</oc:checksum>

so I implemented this bug compatible with oc10 🤷‍♂️ owncloud/core#38304 contains the fix as well as discussion on what iOS and the desktop would do.

@refs
Copy link
Member

refs commented Jan 19, 2021

@butonic while trying this out I get tons of errors:

grafik

even after deleting everything under /var/tmp/ocis

@refs
Copy link
Member

refs commented Jan 19, 2021

I also don't see the <oc:checksum> header on the response XML:

<?xml version="1.0" encoding="UTF-8"?>
<d:multistatus xmlns:d="DAV:" xmlns:oc="http://owncloud.org/ns" xmlns:s="http://sabredav.org/ns">
   <d:response>
      <d:href>/remote.php/webdav/</d:href>
      <d:propstat>
         <d:prop>
            <oc:permissions>DNVCKR</oc:permissions>
            <oc:favorite>0</oc:favorite>
            <oc:fileid>MTI4NGQyMzgtYWE5Mi00MmNlLWJkYzQtMGIwMDAwMDA5MTU3OmZhOGU1MjYyLTZhNjMtNDUzNS1hNzY4LTk0NzI1NDU3MzRiNw==</oc:fileid>
            <oc:owner-id>admin</oc:owner-id>
            <oc:owner-display-name>Admin</oc:owner-display-name>
            <oc:size>96</oc:size>
            <d:getlastmodified>Tue, 19 Jan 2021 16:23:43 +0000</d:getlastmodified>
            <d:getetag>"41aead0722284a3a739c9138996acd23"</d:getetag>
            <d:resourcetype>
               <d:collection />
            </d:resourcetype>
         </d:prop>
         <d:status>HTTP/1.1 200 OK</d:status>
      </d:propstat>
      <d:propstat>
         <d:prop>
            <oc:share-types />
            <oc:privatelink />
            <d:getcontentlength />
         </d:prop>
         <d:status>HTTP/1.1 404 Not Found</d:status>
      </d:propstat>
   </d:response>
   <d:response>
      <d:href>/remote.php/webdav/ownCloud_Admin_Training_Handout.pdf</d:href>
      <d:propstat>
         <d:prop>
            <oc:permissions>DNVWR</oc:permissions>
            <oc:favorite>0</oc:favorite>
            <oc:fileid>MTI4NGQyMzgtYWE5Mi00MmNlLWJkYzQtMGIwMDAwMDA5MTU3OmFiZmViNTAzLTNjMzctNDRkMS05ZjJiLWM2ZWJhMWQ4MWIyNg==</oc:fileid>
            <oc:owner-id>admin</oc:owner-id>
            <oc:owner-display-name>Admin</oc:owner-display-name>
            <d:getcontentlength>395672</d:getcontentlength>
            <oc:size>395672</oc:size>
            <d:getlastmodified>Tue, 19 Jan 2021 16:23:43 +0000</d:getlastmodified>
            <d:getetag>"4958951fc0a36dd05d6e200849f54f94"</d:getetag>
            <d:resourcetype />
         </d:prop>
         <d:status>HTTP/1.1 200 OK</d:status>
      </d:propstat>
      <d:propstat>
         <d:prop>
            <oc:share-types />
            <oc:privatelink />
         </d:prop>
         <d:status>HTTP/1.1 404 Not Found</d:status>
      </d:propstat>
   </d:response>
</d:multistatus>

@butonic
Copy link
Contributor Author

butonic commented Jan 19, 2021

those are for existing files without checksums ;-)
...
or might be folders 🤔

@refs
Copy link
Member

refs commented Jan 19, 2021

those are for existing files without checksums ;-)
...
or might be folders 🤔

Hm, they also popup when I delete everything on /var/tmp/ocis/*

@refs
Copy link
Member

refs commented Jan 19, 2021

so correct me if I'm wrong but after an upload the uploaded file should have a checksum and this should be included in the PROPFIND request, is that right?

@labkode
Copy link
Member

labkode commented Jan 20, 2021

@refs https://github.com/cernbox/smashbox/blob/master/protocol/protocol.md#checksumming-extensions

@butonic
Copy link
Contributor Author

butonic commented Jan 20, 2021

@refs the errors are caused by the accounts in the metadata storage ... changing log level for not existing checksums to debug. leaving not found or read errors as error. rebasing

Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
@butonic
Copy link
Contributor Author

butonic commented Jan 20, 2021

so correct me if I'm wrong but after an upload the uploaded file should have a checksum and this should be included in the PROPFIND request, is that right?

@refs only if it was requested in the PROPFIND:

curl 'https://demo.owncloud.com/remote.php/webdav/ownCloud Manual.pdf'   -X 'PROPFIND' -d $'<?xml version="1.0"?>
> <d:propfind  xmlns:d="DAV:" xmlns:oc="http://owncloud.org/ns">
>   <d:prop>
>     <oc:checksums />
>   </d:prop>
> </d:propfind>' -k -u demo:demo | xmllint -format -
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   593  100   451  100   142   1518    478 --:--:-- --:--:-- --:--:--  1996
<?xml version="1.0"?>
<d:multistatus xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns" xmlns:oc="http://owncloud.org/ns">
  <d:response>
    <d:href>/remote.php/webdav/ownCloud%20Manual.pdf</d:href>
    <d:propstat>
      <d:prop>
        <oc:checksums>
          <oc:checksum>SHA1:d4c9b536a6d082dcef9b96a9956986bd3472196f MD5:7dfd959493b7680b5a9e6dae7e2912f8 ADLER32:f7d46dfc</oc:checksum>
        </oc:checksums>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
  </d:response>
</d:multistatus>

<oc:checksums /> is the magic key

@refs
Copy link
Member

refs commented Jan 21, 2021

so correct me if I'm wrong but after an upload, the uploaded file should have a checksum and this should be included in the PROPFIND request, is that right?

@refs only if it was requested in the PROPFIND:

curl 'https://demo.owncloud.com/remote.php/webdav/ownCloud Manual.pdf'   -X 'PROPFIND' -d $'<?xml version="1.0"?>
> <d:propfind  xmlns:d="DAV:" xmlns:oc="http://owncloud.org/ns">
>   <d:prop>
>     <oc:checksums />
>   </d:prop>
> </d:propfind>' -k -u demo:demo | xmllint -format -
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   593  100   451  100   142   1518    478 --:--:-- --:--:-- --:--:--  1996
<?xml version="1.0"?>
<d:multistatus xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns" xmlns:oc="http://owncloud.org/ns">
  <d:response>
    <d:href>/remote.php/webdav/ownCloud%20Manual.pdf</d:href>
    <d:propstat>
      <d:prop>
        <oc:checksums>
          <oc:checksum>SHA1:d4c9b536a6d082dcef9b96a9956986bd3472196f MD5:7dfd959493b7680b5a9e6dae7e2912f8 ADLER32:f7d46dfc</oc:checksum>
        </oc:checksums>
      </d:prop>
      <d:status>HTTP/1.1 200 OK</d:status>
    </d:propstat>
  </d:response>
</d:multistatus>

<oc:checksums /> is the magic key

oooooh I see. Shows I'm still fresh flesh with the WebDAV spec. Thanks for explaining.

refs
refs previously approved these changes Jan 21, 2021
Copy link
Contributor

@ishank011 ishank011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@butonic mostly looks good. A few minor comments. And totally agree, we should return multiple checksums in ResourceInfo and not rely on storing those in the opaque map.

internal/http/services/owncloud/ocdav/head.go Outdated Show resolved Hide resolved
internal/http/services/owncloud/ocdav/propfind.go Outdated Show resolved Hide resolved
}

// compare if they match the sent checksum
// TODO the tus checksum extension would do this on every chunk, but I currently don't see an easy way to pass in the requested checksum. for now we do it in FinishUpload which is also called for chunked uploads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the client send the checksum only with the last chunk or with all of those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would send the checksum for a PATCH request. without the chunking extension that is the complete file. with the chunking extension it would be the checksum of the chunk.

Copy link
Contributor

@ishank011 ishank011 Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But when FinishUpload is called, binPath would have the complete file, right? In that case, the last chunk request would need to have the checksum of the whole file, otherwise, the asserts would fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata["checksum"] is set in InitiateUpload(), which will store it in the info file. So the checksum (if one was sent) is available in FinishUpload(). Even if it was not requested, we still need to store it in the metadata, which is why we have to split checking them from persisting them. The latter of which has to happen always.

pkg/storage/fs/ocis/upload.go Show resolved Hide resolved
Signed-off-by: Jörn Friedrich Dreyer <jfd@butonic.de>
@ishank011 ishank011 merged commit 0c10b33 into cs3org:master Jan 25, 2021
@phil-davis
Copy link
Contributor

🚀 a list of expected-failures were removed - great stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checksum feature (OCIS storage driver)
5 participants