Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta-sync upload fixes #6382

Merged
merged 5 commits into from
Mar 16, 2018
Merged

Delta-sync upload fixes #6382

merged 5 commits into from
Mar 16, 2018

Conversation

ckamm
Copy link
Contributor

@ckamm ckamm commented Mar 5, 2018

I still need to add a bunch of unit tests but manual upload tests now pass much more reliably for me.

@mrow4a Possibly rerun your tests with this branch to verify.

@ahmedammar

On non block-aligned appends to local files overlapping upload ranges
would be computed.
@ckamm ckamm added this to the 2.5.0 milestone Mar 5, 2018
@ckamm ckamm self-assigned this Mar 5, 2018
@ckamm ckamm requested a review from ogoffart March 5, 2018 10:52
@ogoffart
Copy link
Contributor

ogoffart commented Mar 9, 2018

TestChunkingNG::testResume is failling but i don't know if it is a regression.

Otherwise, just merge it.

@ckamm
Copy link
Contributor Author

ckamm commented Mar 9, 2018

@ogoffart It's a regression I'll have to investigate.

@mrow4a
Copy link
Contributor

mrow4a commented Mar 9, 2018

are you sure it is not related with assert fix?

_rangesToUpload and _sent were adjusted when the PUT job started instead
of at the end. Since the condition for whether to attempt the final MOVE
was _rangesToUpload.isEmpty() this lead to the MOVE being done too early
in some cases.
If existing chunks on the server needed to be deleted, the whole job
could get stuck if the zsync metadata upload was running when the last
delete job finished.

The new code checks whether all delete jobs have finished instead of
just checking for running jobs.
If the last upload finished before the zsync metadata upload was done
we'd run into an assert. This fixes the assert and moves it to a better
place.
No matter whether it's aligned to zsync blocksize or not.
@ckamm
Copy link
Contributor Author

ckamm commented Mar 9, 2018

Fixed. The issue was that I adjusted _sent to only be adjusted after PUT jobs finish instead of at the start - but I forgot to change the progress reporting to match. All tests pass for me now.

@mrow4a
Copy link
Contributor

mrow4a commented Mar 11, 2018

I still have problems with delta sync when I manualy change the file - no idea what is wrong.

openssl rand 33000000 -base64 -out test.txt
less test.txt 
nano test.txt - change some line or characters inside

delta-test-1

@ahmedammar

I upload a file:
delta-test-2

Then I modify one character inside the file and this happens:
delta-test--3

After the highlighned file I get error, and delete of zsync starts, and in next run the file in normaly uploaded.

@pmaier1

@mrow4a
Copy link
Contributor

mrow4a commented Mar 11, 2018

Tried bin/smash -t 2 lib/test_basicSync.py - this test checks handling of the conflicts in ownCloud - unfortunetelly client freezes and goes into some strange loop. The last log I can see is

03-11 16:12:22:271 [ info sync.networkjob ]:	OCC::PUTFileJob created for "http://localhost/delta" + "" "OCC::PropagateUploadFileNG"
03-11 16:12:22:315 [ info sync.networkjob.mkcol ]:	MKCOL of QUrl("http://test_basicSync@localhost/delta/remote.php/dav/uploads/test_basicSync/3953323096") FINISHED WITH STATUS QNetworkReply::NetworkError(NoError) ""
03-11 16:12:22:344 [ info sync.accessmanager ]:	3 "" "http://test_basicSync:test@localhost/delta/remote.php/dav/uploads/test_basicSync/3953323096/0000000000000000" has X-Request-ID "f126b119-d4f3-4943-bb4c-20d61fbfd298"
03-11 16:12:22:344 [ info sync.networkjob ]:	OCC::PUTFileJob created for "http://localhost/delta" + "" "OCC::PropagateUploadFileNG"
03-11 16:12:23:253 [ info sync.networkjob.put ]:	PUT of "http://test_basicSync:test@localhost/delta/remote.php/dav/uploads/test_basicSync/3397897757/0000000010000000" FINISHED WITH STATUS QNetworkReply::NetworkError(NoError) "" QVariant(int, 201) QVariant(QString, "Created")
03-11 16:12:23:253 [ info sync.propagator.upload ]:	Chunked upload of 40000000 bytes took 983 ms, desired is 60000 ms, expected good chunk size is 2441505595 bytes and nudged next chunk size to  100000000 bytes
03-11 16:12:23:254 [ info sync.accessmanager ]:	6 "MOVE" "http://test_basicSync:test@localhost/delta/remote.php/dav/uploads/test_basicSync/3397897757/.file.zsync" has X-Request-ID "f5f70287-6a34-4273-aefd-b4201339ba6a"
03-11 16:12:23:254 [ info sync.networkjob ]:	OCC::MoveJob created for "http://localhost/delta" + "" "OCC::PropagateUploadFileNG"
03-11 16:12:23:603 [ info sync.networkjob.put ]:	PUT of "http://test_basicSync:test@localhost/delta/remote.php/dav/uploads/test_basicSync/3953323096/0000000000000000" FINISHED WITH STATUS QNetworkReply::NetworkError(NoError) "" QVariant(int, 201) QVariant(QString, "Created")
03-11 16:12:23:603 [ info sync.propagator.upload ]:	Chunked upload of 50000000 bytes took 1260 ms, desired is 60000 ms, expected good chunk size is 2380952380 bytes and nudged next chunk size to  100000000 bytes
03-11 16:12:23:608 [ info sync.accessmanager ]:	3 "" "http://test_basicSync:test@localhost/delta/remote.php/dav/uploads/test_basicSync/3953323096/.zsync" has X-Request-ID "17483a5b-2281-44ee-ba97-5782b95a210b"
03-11 16:12:23:609 [ info sync.networkjob ]:	OCC::PUTFileJob created for "http://localhost/delta" + "" "OCC::PropagateUploadFileNG"
03-11 16:12:23:973 [ info sync.networkjob.put ]:	PUT of "http://test_basicSync:test@localhost/delta/remote.php/dav/uploads/test_basicSync/3953323096/.zsync" FINISHED WITH STATUS QNetworkReply::NetworkError(NoError) "" QVariant(int, 201) QVariant(QString, "Created")
03-11 16:12:23:973 [ info sync.accessmanager ]:	6 "MOVE" "http://test_basicSync:test@localhost/delta/remote.php/dav/uploads/test_basicSync/3953323096/.file.zsync" has X-Request-ID "0c2fb1fe-8f10-445d-8389-45a2e24e2693"
03-11 16:12:23:974 [ info sync.networkjob ]:	OCC::MoveJob created for "http://localhost/delta" + "" "OCC::PropagateUploadFileNG"
03-11 16:12:25:670 [ info sync.networkjob.move ]:	MOVE of QUrl("http://test_basicSync@localhost/delta/remote.php/dav/uploads/test_basicSync/3397897757/.file.zsync") FINISHED WITH STATUS QNetworkReply::NetworkError(NoError) ""
03-11 16:12:25:670 [ info sync.database ]:	Updating file record for path: "TEST_FILE_ADDED_LOSER.dat" inode: 8594251986 modtime: 1520781138 type: 0 etag: "d30eefdffb8aa5f724f19bd8c3b835eb" fileId: "00000333ocosxbmlf31g" remotePerm: "" fileSize: 50000000 checksum: "SHA1:27dca84b8b251f2bba072dc779605ce4428c68b9"
03-11 16:12:25:671 [ info sync.propagator ]:	Completed propagation of "TEST_FILE_ADDED_LOSER.dat" by OCC::PropagateUploadFileNG(0x7f82dac5ffc0) with status 4
03-11 16:12:26:678 [ info sync.networkjob.move ]:	MOVE of QUrl("http://test_basicSync@localhost/delta/remote.php/dav/uploads/test_basicSync/3953323096/.file.zsync") FINISHED WITH STATUS QNetworkReply::NetworkError(NoError) ""
03-11 16:12:26:678 [ info sync.database ]:	Updating file record for path: "TEST_FILE_MODIFIED_LOSER.dat" inode: 8594251912 modtime: 1520781138 type: 0 etag: "28f62f3012dfdb22397372fd466cb8e5" fileId: "00000290ocosxbmlf31g" remotePerm: "WDNVRz" fileSize: 50000000 checksum: "SHA1:27dca84b8b251f2bba072dc779605ce4428c68b9"
03-11 16:12:26:678 [ info sync.propagator ]:	Completed propagation of "TEST_FILE_MODIFIED_LOSER.dat" by OCC::PropagateUploadFileNG(0x7f82dd036e10) with status 4
03-11 16:17:24:067 [ warning sync.networkjob ]:	Network job timeout QUrl("/TEST_FILE_MODIFIED_BOTH.dat")
03-11 16:17:24:067 [ warning sync.networkjob.get ]:	Timeout QUrl("/TEST_FILE_MODIFIED_BOTH.dat")
03-11 16:17:24:067 [ warning sync.networkjob ]:	Network job timeout QUrl("/TEST_FILE_ADDED_BOTH.dat")
03-11 16:17:24:067 [ warning sync.networkjob.get ]:	Timeout QUrl("/TEST_FILE_ADDED_BOTH.dat")

This is also the last thing before long hang:
delta-test-4

On the server I get

{"reqId":"18977641-1982-42cc-88a0-633b6788f57d","level":2,"time":"2018-03-11T15:11:44+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_basicSync\/2630130657\/.file.zsync","message":"Could not get node for path: \"uploads\/test_basicSync\/2630130657\/.file.zsync\" : File with name \/\/2630130657 could not be located"}
{"reqId":"9ebb5469-bc09-41de-af58-b9b871d9b46c","level":2,"time":"2018-03-11T15:11:44+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_basicSync\/3716006871\/.file.zsync","message":"Could not get node for path: \"uploads\/test_basicSync\/3716006871\/.file.zsync\" : File with name \/\/3716006871 could not be located"}
{"reqId":"1305fb2f-f6e1-4e76-82b9-03dacd1c8cb0","level":2,"time":"2018-03-11T15:11:44+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_basicSync\/600759151\/.file.zsync","message":"Could not get node for path: \"uploads\/test_basicSync\/600759151\/.file.zsync\" : File with name \/\/600759151 could not be located"}
{"reqId":"846aba1d-48d1-48f4-9d84-e05cb6e822a9","level":2,"time":"2018-03-11T15:11:49+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_basicSync\/2471593967\/.file.zsync","message":"Could not get node for path: \"uploads\/test_basicSync\/2471593967\/.file.zsync\" : File with name \/\/2471593967 could not be located"}
{"reqId":"7079bb6f-9e8b-4911-a629-7cd4c30f884d","level":2,"time":"2018-03-11T15:11:49+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_basicSync\/3564440858\/.file.zsync","message":"Could not get node for path: \"uploads\/test_basicSync\/3564440858\/.file.zsync\" : File with name \/\/3564440858 could not be located"}
{"reqId":"83528ad3-8e8c-4987-bf5f-4da5eb1e915c","level":2,"time":"2018-03-11T15:11:49+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_basicSync\/2324440087\/.file.zsync","message":"Could not get node for path: \"uploads\/test_basicSync\/2324440087\/.file.zsync\" : File with name \/\/2324440087 could not be located"}
{"reqId":"b3cfaba9-5f58-4cf0-a31c-73994c9c70f4","level":2,"time":"2018-03-11T15:11:52+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_basicSync\/2436604134\/.file.zsync","message":"Could not get node for path: \"uploads\/test_basicSync\/2436604134\/.file.zsync\" : File with name \/\/2436604134 could not be located"}
{"reqId":"f1415c27-6d8a-4e2d-a7cf-496e2638e851","level":2,"time":"2018-03-11T15:12:06+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"DELETE","url":"\/delta\/remote.php\/dav\/files\/test_basicSync\/TEST_FILE_DELETED_WINNER.dat","message":"Could not get node for path: \"files\/test_basicSync\/TEST_FILE_DELETED_WINNER.dat\" : File with name TEST_FILE_DELETED_WINNER.dat could not be located"}
{"reqId":"8d0f7b36-9806-4e9e-b69a-4a2d0f6795e1","level":2,"time":"2018-03-11T15:12:06+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"DELETE","url":"\/delta\/remote.php\/dav\/files\/test_basicSync\/TEST_FILE_DELETED_BOTH.dat","message":"Could not get node for path: \"files\/test_basicSync\/TEST_FILE_DELETED_BOTH.dat\" : File with name TEST_FILE_DELETED_BOTH.dat could not be located"}
{"reqId":"784080d9-9b47-455b-a6d7-e0b524876099","level":2,"time":"2018-03-11T15:12:11+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_basicSync\/3696272866\/.file.zsync","message":"Could not get node for path: \"uploads\/test_basicSync\/3696272866\/.file.zsync\" : File with name \/\/3696272866 could not be located"}
{"reqId":"b2db533a-551c-4848-9014-21c078d1bc30","level":2,"time":"2018-03-11T15:12:11+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_basicSync\/899128082\/.file.zsync","message":"Could not get node for path: \"uploads\/test_basicSync\/899128082\/.file.zsync\" : File with name \/\/899128082 could not be located"}
{"reqId":"f11c373a-29b2-4a5e-b3ac-9829ab63d046","level":2,"time":"2018-03-11T15:12:13+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_basicSync\/2540450659\/.file.zsync","message":"Could not get node for path: \"uploads\/test_basicSync\/2540450659\/.file.zsync\" : File with name \/\/2540450659 could not be located"}
{"reqId":"768aaec9-8405-4a0b-8cec-d3277122233e","level":2,"time":"2018-03-11T15:12:16+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_basicSync\/2937705167\/.file.zsync","message":"Could not get node for path: \"uploads\/test_basicSync\/2937705167\/.file.zsync\" : File with name \/\/2937705167 could not be located"}
{"reqId":"36c8306a-263b-4ab5-a44b-388da6948803","level":2,"time":"2018-03-11T15:12:20+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"DELETE","url":"\/delta\/remote.php\/dav\/files\/test_basicSync\/TEST_FILE_DELETED_LOSER.dat","message":"Could not get node for path: \"files\/test_basicSync\/TEST_FILE_DELETED_LOSER.dat\" : File with name TEST_FILE_DELETED_LOSER.dat could not be located"}
{"reqId":"f5f70287-6a34-4273-aefd-b4201339ba6a","level":2,"time":"2018-03-11T15:12:25+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_basicSync\/3397897757\/.file.zsync","message":"Could not get node for path: \"uploads\/test_basicSync\/3397897757\/.file.zsync\" : File with name \/\/3397897757 could not be located"}
{"reqId":"0c2fb1fe-8f10-445d-8389-45a2e24e2693","level":2,"time":"2018-03-11T15:12:26+00:00","remoteAddr":"::1","user":"test_basicSync","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_basicSync\/3953323096\/.file.zsync","message":"Could not get node for path: \"uploads\/test_basicSync\/3953323096\/.file.zsync\" : File with name \/\/3953323096 could not be located"}

@mrow4a
Copy link
Contributor

mrow4a commented Mar 11, 2018

Tried bin/smash -t 1 lib/test_deltamove.py and bin/smash -t 0 lib/test_deltamove.py which were most important for me - as this passes, I am happy enough - conflict resolution is next step to look.

I also confirmed, that delta worked correctly - and did delta

test-delta-5

@mrow4a
Copy link
Contributor

mrow4a commented Mar 11, 2018

Additionaly, please note that if I move the file - all zsync metadata gets removed (?) and GET will fetch whole file.

Server side errors for test with moving file:

{"reqId":"39c4edc2-0414-479f-8d4c-8a7c7b7ea492","level":2,"time":"2018-03-11T15:31:55+00:00","remoteAddr":"::1","user":"test_deltamove","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_deltamove\/3620441526\/.file.zsync","message":"Could not get node for path: \"uploads\/test_deltamove\/3620441526\/.file.zsync\" : File with name \/\/3620441526 could not be located"}
{"reqId":"dba1e777-1e20-4c2b-bcdb-761e66b0f326","level":2,"time":"2018-03-11T15:32:05+00:00","remoteAddr":"::1","user":"test_deltamove","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_deltamove\/2615858117\/.file.zsync","message":"Could not get node for path: \"uploads\/test_deltamove\/2615858117\/.file.zsync\" : File with name \/\/2615858117 could not be located"}
{"reqId":"0bd2dadb-70ed-49b4-b651-dc8be831c95d","level":2,"time":"2018-03-11T15:32:06+00:00","remoteAddr":"::1","user":"test_deltamove","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_deltamove\/3063368404\/.file.zsync","message":"Could not get node for path: \"uploads\/test_deltamove\/3063368404\/.file.zsync\" : File with name \/\/3063368404 could not be located"}
{"reqId":"b4192e96-8f9f-446a-9406-c0e04e50b284","level":2,"time":"2018-03-11T15:32:15+00:00","remoteAddr":"::1","user":"test_deltamove","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_deltamove\/2884876309\/.file.zsync","message":"Could not get node for path: \"uploads\/test_deltamove\/2884876309\/.file.zsync\" : File with name \/\/2884876309 could not be located"}
{"reqId":"0fc1eb81-be08-4b78-9265-7bd0f3bc5f64","level":2,"time":"2018-03-11T15:32:15+00:00","remoteAddr":"::1","user":"test_deltamove","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_deltamove\/3919753160\/.file.zsync","message":"Could not get node for path: \"uploads\/test_deltamove\/3919753160\/.file.zsync\" : File with name \/\/3919753160 could not be located"}
{"reqId":"537eebd3-c187-4ecf-b526-ddaaf18638d2","level":2,"time":"2018-03-11T15:32:28+00:00","remoteAddr":"::1","user":"test_deltamove","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_deltamove\/1771044864\/.file.zsync","message":"Could not get node for path: \"uploads\/test_deltamove\/1771044864\/.file.zsync\" : File with name \/\/1771044864 could not be located"}
{"reqId":"41fb56ae-2f5c-4c84-8e9e-a43b5a45c62f","level":2,"time":"2018-03-11T15:32:28+00:00","remoteAddr":"::1","user":"test_deltamove","app":"dav","method":"MOVE","url":"\/delta\/remote.php\/dav\/uploads\/test_deltamove\/1036536264\/.file.zsync","message":"Could not get node for path: \"uploads\/test_deltamove\/1036536264\/.file.zsync\" : File with name \/\/1036536264 could not be located"}

@ckamm
Copy link
Contributor Author

ckamm commented Mar 12, 2018

@mrow4a On #6382 (comment) - I've tried your specific test case but can't reproduce the "Unable to parse zsync file." error. Can you try again and check stderr? zsync_parse() puts its error details into stderr...

-t 2 lib/test_basicSync passes for me. Can you upload the log file that looks odd from your smashdir? Probably makes sense to address that surprising "can't parse" error first though.

@mrow4a
Copy link
Contributor

mrow4a commented Mar 12, 2018

Are you sure you run cmd with --deltasync flag?

@ckamm
Copy link
Contributor Author

ckamm commented Mar 12, 2018

For the basic test yes (so we do indeed need more information about that parse error), but not for the smashbox test. With the flag I can reproduce errors on test_basicSync -t 2.

@ckamm
Copy link
Contributor Author

ckamm commented Mar 12, 2018

That took some digging! Here's why zsync gets stuck with the particular test data:

  • zsync compares rolling checksums at each possible file offset and does an expensive checksum only if the rolling checksums match
  • but for files composed entirely of '1' or '2' and the 1MB blocksize the rolling checksum is identical at each offset, so it runs the expensive checksum for each possible block - which takes forever
  • the rolling checksum is identical for each block here because in the update equation a core part of it is a 2-byte value updated with c * blocksize. But since blocksize & 0xFFFF == 0 that just does nothing.
  • this is a parameter-validation bug in zsync: it's command line help suggests 4kb and 8kb as potential block sizes, the 1mb size we use is probably untested

We either need smaller block sizes or to adjust the rolling checksum. Just increasing its width might already suffice.

@ogoffart I think that fix is independent of this PR.

@ahmedammar

@mrow4a
Copy link
Contributor

mrow4a commented Mar 12, 2018

👍

@ckamm
Copy link
Contributor Author

ckamm commented Mar 15, 2018

A simpler restatement of the problem:

For files where every byte has value d and the block size is s the rolling checksum components will be (d * s) mod 2^16 and (d * s * (s + 1) / 2) mod 2^16. If s = 2^n for n >= 16 then both checksum components will be 0 for any value of d.

I can actually reproduce this particular problem with rsync!

mkdir d1
mkdir d2
mkdir sync
dd if=/dev/zero  bs=1024 count=2048 | tr "\000" '1' > d1/foo
dd if=/dev/zero  bs=1024 count=2048 | tr "\000" '2' > d2/foo

rsync --stats --no-whole-file --block-size=131072 d1/* sync # initial sync
rsync --stats --no-whole-file --block-size=131072 d2/* sync # hangs

Note that 131072 is 2^17 and the largest block size that rsync allows. Probably for good reason? But since it's larger than 2^16 it already breaks with this particular test data.

@ckamm ckamm merged commit 74f46d4 into delta-sync Mar 16, 2018
@TheOneRing TheOneRing deleted the delta-upload-fixes branch December 2, 2019 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants