Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resuming a paused sync that was uploading a bunch of files and were moved in the client is duplicating one file in both locations #5949

Closed
SamuAlfageme opened this issue Aug 10, 2017 · 16 comments
Assignees
Labels
ReadyToTest QA, please validate the fix/enhancement type:bug
Milestone

Comments

@SamuAlfageme
Copy link
Contributor

SamuAlfageme commented Aug 10, 2017

Expected behavior

The reconcile phase should match the server and local contents properly and move the last file the client uploaded to the server.

Actual behavior

That last file is being downloaded from the server:

screenshot_2017-08-10_10_14_56

I think it has to do with pausing the sync either:

  • ...before receiving the upload ACK reply from the server with the ETAG (we should wait for it in this case)
  • ...and not storing the last ETAG properly on the sync journal

Steps to reproduce

  1. Create a bunch of large files (I did generate_n_files.sh 130 file_ 15M: 130 files , 15MB size with the file_ prefix)
  2. Pause the sync; you'll be prompted with the 'sync will be terminated' confirmation dialog
  3. Move all the files to a new location (mkdir filezz && mv file_* filezz)
  4. Resume the sync

screenshot_2017-08-10_10_22_07

Server configuration

  • Operating system: ubuntu
  • Web server: Apache/2.4.18 (Ubuntu)
  • Database: 5.7.17-0ubuntu0.16.04.1 - (Ubuntu)
  • PHP version: 7.0.15-0ubuntu0.16.04.4
  • ownCloud version: 10.0.2

Client configuration

Client version: 2.3.3rc1

Logs

  1. Client logfile: I pasted just the related entries (i.e. grep file_114 in this case) - https://gist.github.com/SamuAlfageme/b9e6a34e7fcd3963c7bb6a353995668f

I also have the recorded mitmproxy session recorded, will upload in a minute (after removing sensitive info)

cc/ @ogoffart @guruz @mrow4a

@ogoffart
Copy link
Contributor

Here is what happens:
The PUT is cancelled as we already sent data, but we did not receive the answer yet with the new etag, so we don't write it in the DB.

The solution could be to change what "pause sync" is doing. We should not blindly abort all requests. We should wait for the PUT to finish (possibly with a lower timeout)

@SamuAlfageme SamuAlfageme added this to the 2.4.0 milestone Aug 10, 2017
@SamuAlfageme SamuAlfageme mentioned this issue Aug 10, 2017
84 tasks
@mrow4a
Copy link
Contributor

mrow4a commented Aug 10, 2017

@ogoffart 👍

@guruz
Copy link
Contributor

guruz commented Aug 10, 2017

We should not blindly abort all requests. We should wait for the PUT to finish (possibly with a lower timeout)

Or abort depending on if all data had been sent already. This should be doable since the upload speed limiting gives us this much control over the data flow. So only do a hard abort() if not all data was sent.

@SamuAlfageme
Copy link
Contributor Author

Something that crosses my mind: since the same would happen on client crash/being killed is there a solution we could apply both for it and pause? e.g. a 'handshake' when starting a sync to check if last operation(s) before the crash were completed on the server.

@guruz
Copy link
Contributor

guruz commented Aug 10, 2017

@SamuAlfageme that's quite hard when moves are involved i think :-/

In other cases, we can/could do a lot based on filesize/mtime/filehash

@ogoffart
Copy link
Contributor

when the client crash or is killed, or if the connection drops right at the right moment, there is no way to know if the file was uploaded or not. So we recover in the next sync. But if the file is moved or edited in the next sync, we have a conflict. This is something we have to live with. Only when pausing/aborting the sync, we can do something about it.

@mrow4a
Copy link
Contributor

mrow4a commented Aug 10, 2017

@ogoffart you can assign me to it if you want, it does not seem difficult.

@mrow4a
Copy link
Contributor

mrow4a commented Aug 10, 2017

I see 2 use cases for timeout of e.g. 5 sec:

  • small PUT
  • new chunking MOVE

The others, it might not make sense because of filesize or because protocol is obsolete (old chunking)

@guruz
Copy link
Contributor

guruz commented Aug 10, 2017

@mrow4a What do you mean? This would be about abort() only. I think what I had proposed in #5949 (comment) is enough..

@mrow4a
Copy link
Contributor

mrow4a commented Aug 15, 2017

PR ready #5953

mrow4a added a commit that referenced this issue Sep 2, 2017
Dont abort final chunk immedietally

Use sync and async aborts
mrow4a added a commit that referenced this issue Sep 5, 2017
Dont abort final chunk immedietally

Use sync and async aborts
mrow4a added a commit that referenced this issue Sep 11, 2017
Dont abort final chunk immedietally

Use sync and async aborts
ckamm pushed a commit that referenced this issue Sep 22, 2017
Dont abort final chunk immedietally

Use sync and async aborts
@ckamm
Copy link
Contributor

ckamm commented Sep 22, 2017

Wouldn't this be a situation where the local and remote checksums are identical, therefore skipping the download and just populating the etag on the next sync? Aka: is there really an issue here?

@mrow4a
Copy link
Contributor

mrow4a commented Sep 22, 2017

I am not sure @ckamm how is that related to aborting, which just waits a bit until the job is finished, doing no harm to nothing.

@ckamm
Copy link
Contributor

ckamm commented Sep 22, 2017

I made a testcase for this, will add here later.

@ckamm
Copy link
Contributor

ckamm commented Sep 25, 2017

@mrow4a I think it's the other way around. We're discussing the introduction of complex aborting logic in response to this issue. If the issue doesn't exist, why introduce aborting?

It's possible that my test case doesn't go far enough. But I'd like to know whether you or @ogoffart managed to reproduce the problem with master.

@guruz
Copy link
Contributor

guruz commented Sep 25, 2017

@ckamm I think this issue is you move the files locally that were PUTted in the aborted upload.
And @mrow4a 's patch tries to reduce the likelyhood of

@ckamm
Copy link
Contributor

ckamm commented Sep 25, 2017

@guruz @mrow4a Oops, I forgot the move-file-away part in my test case. No wonder I didn't see anything.

I'll update it and expect everything will make more sense to me then :)

ckamm pushed a commit that referenced this issue Sep 25, 2017
Dont abort final chunk immedietally

Use sync and async aborts
ckamm added a commit that referenced this issue Sep 25, 2017
ckamm pushed a commit that referenced this issue Oct 6, 2017
Dont abort final chunk immedietally

Use sync and async aborts
ckamm added a commit that referenced this issue Oct 6, 2017
ckamm pushed a commit that referenced this issue Oct 17, 2017
Dont abort final chunk immedietally

Use sync and async aborts
ckamm added a commit that referenced this issue Oct 17, 2017
@ckamm ckamm added the ReadyToTest QA, please validate the fix/enhancement label Oct 17, 2017
@guruz guruz closed this as completed Feb 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ReadyToTest QA, please validate the fix/enhancement type:bug
Projects
None yet
Development

No branches or pull requests

5 participants