Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Owncloud deletes files & tries to sync again (with little success) #6677

Closed
Phlos opened this issue Jul 30, 2018 · 69 comments
Closed

Owncloud deletes files & tries to sync again (with little success) #6677

Phlos opened this issue Jul 30, 2018 · 69 comments
Assignees
Labels
ReadyToTest QA, please validate the fix/enhancement type:bug
Milestone

Comments

@Phlos
Copy link

Phlos commented Jul 30, 2018

I have a 250 Gb storage allowance through a university system which I use to sync my files between several computers that I work on. The university system supplies a tailored client for windows, but for linux I use OwnCloud. I noticed in the last few days that OwnCloud is deleting files from a local computer and then tries to redownload them. This is an issue I have had several times in the past and somehow keeps popping up. [edit: That it started happening again is possibly related to the fact that I updated to 2.4.2 on 19 July).] I have only experienced this on my linux machines, never with Windows (although it should be noted that I sync only a rather small subset of my stuff to the Windows OS).

Expected behaviour

I have the OwnCloud client on my laptop. It syncs my files. If I am offline for a while, it syncs everything as soon as I'm back online.

Actual behaviour

I have the OwnCloud client on my laptop. It deletes files that I had on the local machine and on the server from the local machine, then tries to redownload them.

The sync for redownloading is not unproblematic:

  • at some point, it stops downloading and loses the connection, and then sits stuck -- as far as I can see, until I forcefully do something about it. The connection issue is not solved by clicking log out and log in alone (as far as I can see); I have to restart owncloud and then it connects.
  • at each of the retries, it tells me for a different set of directories that "a new directory > 500MB has been discovered" which I should approve the download of (500 MB is the prompt limit in my client).
  • This is especially problematic because in general (also when OC is not deleting files) it takes ages to initialise OC after a restart - it takes absolutely forever to work through all the directories and files to check whether any changes have taken place.
  • Owncloud has, up until now, not realised the new folders that I have created during my time offline.

Consequently, I wait for hours to get a sync that maybe 30 minutes, after which it stops working, I restart everything, etc.

Steps to reproduce

Since this is not a very straightforward issue, I'm afraid there are no straightforward steps either. I did the following, though:

  1. Have owncloud on my laptop (Ubuntu 16.04; currently I have OC 2.4.2)
  2. Work on it without internet for a while.
  3. Connect back to the internet
  4. Discover the file & directory deletion mayhem.

Server configuration

Unfortunately, I do not know what specs the university system has.

Client configuration

Client version: 2.4.2

Operating system: Ubuntu 16.04

OS language: English (United States)

Qt version used by client package (Linux only, see also Settings dialog):
Not sure, but the settings dialog says "Built from Git revision d6e975 on Jul 19 10:54:56 using Qt 5.6.2 OpenSSL 1.0.2g 1 Mar 2016"

Client package (From ownCloud or distro) (Linux only): I'm not sure what is meant by this question.

Installation path of client: /opt/ownCloud/

Logs

I had no logs up until now -- using the instructions below I have now started logging what is going on. I'll leave my laptop at work and connected overnight, and come back tomorrow to see what the dawn brings...

Other comments:

This issue might be related to #3102, #6282 and #6322, but since my issue seems to be subtly different (and since I don't understand half of those conversations), I hesitantly file this as a new issue. Apologies if this is not the way forward.

@ogoffart
Copy link
Contributor

It seems that the owncloud client does not see that the file are on the server and thus believe they were removed. This may happen if a bug in the server make the file disapear (for example because of some external storage temporarily not available, and no file is reported to the client instead of an error.)
And what is the client you are using on windows? And it does not have this problem?

The blocked connection is another wierd problem. As we normally have timeout to avoid this kind of problem.

Please provide the log as it may contains information about both issues.

@Phlos
Copy link
Author

Phlos commented Jul 31, 2018

Hi @ogoffart - thanks for the answer. As to your questions:

  • Client on Windows: binary from https://www.surfdrive.nl/ (by the looks of it it's an owncloud adaptation, but that's as much as I know)
  • I don't remember ever experiencing any issues there, but as I said, I only have a small subset of my data synced on that OS.
  • as to the log -- by now it is 14 GB. Oops. I will try to send it to you via a filesender.

Other info

  • I currently have three systems that are syncing my data: an Ubuntu 16.04 laptop (the problematic system), a Win 10 laptop (actually the same, dualboot -- no problems), and a workstation at my work (Ubuntu 18.04 with owncloud 2.4.1+dfsg-1 -- also no problems).
  • Observation: since restarting owncloud the last time, no syncing whatsoever has taken place.
  • Other observation: In the log I grepped "ERROR" and the most peculiar result is the following, with errors at almost-exactly-one-hour intervals since I started logging (which was yesterday at 18:54):
07-30 19:35:49:960 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-30 20:35:52:161 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-30 21:35:55:285 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-30 22:35:57:566 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Host requires authentication"
07-30 23:35:57:202 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-31 00:35:59:614 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-31 01:36:03:443 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Host requires authentication"
07-31 02:36:03:175 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Host requires authentication"
07-31 03:36:03:032 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-31 04:36:04:578 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-31 05:36:08:303 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Host requires authentication"
07-31 06:36:08:371 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Host requires authentication"
07-31 07:36:08:096 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-31 08:36:09:183 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
07-31 09:36:10:146 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"

@ogoffart
Copy link
Contributor

ogoffart commented Aug 1, 2018

I only downloaded the first 3.8 GB before it aborted.

Every hour, the sync is aborting because of a 401 error. This is most likely caused by the expiration of the session cookie. And since it appears that the discovery phase taks more than one hours, it never progress.

I recommand you uncheck the new folders that have too many sub-folder so the discovery phase will be quicker and the sync will proceed.

In the log, there are no evidence of files being removed. So the logs can't answer this question.

My suspission is that the server failed to show some files. And following that the client thought it was removed on the server, so it removed the files locally.
Then the files re-appears, but there are so many directories that it can't finish the discovery phase without timing out.

What is strange is that the surfdrive client should have had exactly the same behavior.

Also, there should normally not be a 401 as we authenticate every requests.

@guruz guruz added this to the 2.5.0 milestone Aug 1, 2018
@Phlos
Copy link
Author

Phlos commented Aug 1, 2018

Hi @ogoffart - thanks for your analysis. A couple of points:

  • as to files being deleted - I guess this took place before I started logging...
  • as to your suggestion, the thing is: there are no "new folders" -- everything was already there, only it was subsequently deleted. In order to uncheck the folders that are causing trouble, I would first have to know which these folders are -- which I don't.
  • as to the surfdrive client not showing the same behaviour: this may be because on Windows I only sync some directories with documents and no deep trees -- most significantly not my raw data, and it is the raw data that has so many subdirectories and files.

Is there any way to make the discovery phase less painful (i.e. slow)? This is, in general, an issue that might be worth looking into. I am often waiting super long for a sync to show up.

Two more thoughts:

  • the discovery phase only seems to be causing trouble if the data are already (partially) on the machine. Every time I have connected a new system, it would have no trouble finding the whole directory tree and subsequently downloading it.
  • I don't think I mentioned this quite clearly, but the trouble only started after I ran a software update (including an update of OC) on 19 July (I now added this to the initial report). While it is an issue that I've had before, I don't remember it from the past months. Could it be an issue that only touches certain versions of OC, or cases where the tree was out of sync before an update, then the update takes place, then the sync? (I'm just speculating here, but this might definitely have been the case)

@ogoffart
Copy link
Contributor

ogoffart commented Aug 1, 2018

as to your suggestion, the thing is: there are no "new folders" -- everything was already there, only it was subsequently deleted. In order to uncheck the folders that are causing trouble, I would first have to know which these folders are -- which I don't.

They got removed, now they are considered new :-)

Is there any way to make the discovery phase less painful (i.e. slow)? This is, in general, an issue that might be worth looking into. I am often waiting super long for a sync to show up.

Yes, we are working on that. But it won't be there before 2.6 or later.

the discovery phase only seems to be causing trouble if the data are already (partially) on the machine. Every time I have connected a new system, it would have no trouble finding the whole directory tree and subsequently downloading it.

That's interesting. There should normally be no differences.

Do you have selected the option to ask for permission to download new folders of a given size? Can you try to disable it? Maybe this slows a bit the sync as it needs to query the size.

@Phlos
Copy link
Author

Phlos commented Aug 1, 2018

the discovery phase only seems to be causing trouble if the data are already (partially) on the machine. Every time I have connected a new system, it would have no trouble finding the whole directory tree and subsequently downloading it.

That's interesting. There should normally be no differences.

It could also be that every time I connected a system, it just so happened to be a version that wasn't causing any trouble...? Too little data to tell, really, could be either.

As to your suggestions, I will try again tonight when I get home.

@guruz
Copy link
Contributor

guruz commented Aug 1, 2018

FYI @tomneedham from ownCloud will get in contact with the SURF guys to try to track this down from server side.
Meanwhile feel free to investigate more with @ogoffart

@Phlos
Copy link
Author

Phlos commented Aug 1, 2018

Ah @guruz @tomneedham good to know, I was planning to drop the SURFers a line as well, but then I'll leave that to you.

@michaelstingl
Copy link
Contributor

@Phlos You can ping them as well. Point them to this issue…

@Phlos
Copy link
Author

Phlos commented Aug 1, 2018

@michaelstingl I e-mailed my institute which seems to be the only way for mere mortals to get in touch with the SURFdrive ppl. Let's see.... ;)

@T0mWz
Copy link

T0mWz commented Aug 2, 2018

Hi @Phlos , Strange problem, you are experiencing this issue since the latest 2.4.2 client version if I understand you correct? Thank you for register this issue, will check on our side what is going on.

@Phlos
Copy link
Author

Phlos commented Aug 2, 2018

Hi @T0mWz Yes, although the problem has happened before. Not in the past couple of months though. As said above, sth with the server connection is going wrong, it seems (at hourly intervals)...

@Phlos
Copy link
Author

Phlos commented Aug 3, 2018

Update:

  • After unchecking the "new" folders and clicking Apply, it froze for a while, but then started looking.
  • I unchecked the max folder size permission option.
  • Same behaviour as previously. Every hour, the process terminates, interestingly now around :38 (which seems to be roughly a continuation of the times I saw on 31 July)... @T0mWz could that be the server?
  • Around every ERROR line, it looks like this:
08-03 00:38:45:587 [ info sync.csync.updater ]: file: work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_1_4, instruction: INSTRUCTION_NEW <<=
08-03 00:38:45:587 [ info sync.accessmanager ]: 6 "PROPFIND" "https://surfdrive.surf.nl/files/remote.php/dav/files/[deleted by Phlos]@soliscom.uu.nl/work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_1_4" has X-Request-ID "318d9941-feae-421f-b2b0-a192b2dfa603"
08-03 00:38:45:587 [ info sync.networkjob ]:    OCC::LsColJob created for "https://surfdrive.surf.nl/files" + "/work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_1_4" "OCC::DiscoverySingleDirectoryJob"
08-03 00:38:45:642 [ warning sync.credentials.http ]:   Stop request: Authentication failed for  "https://surfdrive.surf.nl/files/remote.php/dav/files/[deleted by Phlos]@soliscom.uu.nl/work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_1_4"
08-03 00:38:45:642 [ warning sync.networkjob ]: QNetworkReply::NetworkError(OperationCanceledError) "Operation canceled" QVariant(Invalid)
08-03 00:38:45:642 [ info gui.account.state ]:  Invalid credentials for "https://surfdrive.surf.nl/files" asking user
08-03 00:38:45:642 [ info gui.account.state ]:  AccountState state change:  "Connected" -> "Asking Credentials"
08-03 00:38:45:678 [ info gui.folder.manager ]: Account "[deleted by Phlos]@soliscom.uu.nl@surfdrive.surf.nl" disconnected or paused, terminating or descheduling sync folders
08-03 00:38:45:678 [ info gui.folder ]: folder  "1"  Terminating!
08-03 00:38:45:678 [ info gui.account.manager ]:        Saving account "https://surfdrive.surf.nl/files"
08-03 00:38:45:678 [ info gui.account.manager ]:        Saving  1  unknown certs.
08-03 00:38:45:687 [ info gui.account.manager ]:        Saving cookies. "/home/Phlos/.local/share/data//ownCloud//cookies0.db"
08-03 00:38:45:687 [ info gui.account.manager ]:        Saved account settings, status: 0
08-03 00:38:45:687 [ info sync.accessmanager ]: 4 "" "https://surfdrive.surf.nl/files/index.php/apps/oauth2/api/v1/token" has X-Request-ID "bb42e3d7-715b-4c2d-954e-a56ad4a900f2"
08-03 00:38:45:687 [ info sync.networkjob ]:    OCC::SimpleNetworkJob created for "https://surfdrive.surf.nl/files" + "" "OCC::Account"
08-03 00:38:45:687 [ info sync.networkjob.lscol ]:      LSCOL of QUrl("https://surfdrive.surf.nl/files/remote.php/dav/files/[deleted by Phlos]@soliscom.uu.nl/work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_1_4") FINISHED WITH STATUS QNetworkReply::NetworkError(OperationCanceledError) "Operation canceled"
08-03 00:38:45:687 [ warning sync.discovery ]:  LSCOL job error "Operation canceled" 0 QNetworkReply::NetworkError(OperationCanceledError)
08-03 00:38:45:687 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-03 00:38:45:687 [ info sync.engine ]:        Update phase was aborted by user!
08-03 00:38:49:047 [ info sync.database ]:      Closing DB "/home/Phlos/SURFdrive/._sync_8249cb86cd45.db"
08-03 00:38:50:107 [ info sync.engine ]:        CSync run took  3603659 ms
08-03 00:38:50:108 [ info gui.folder ]: Client version 2.4.2 (build 9883)  Qt 5.6.2  SSL  OpenSSL 1.0.2g  1 Mar 2016
08-03 00:38:50:108 [ info gui.folder ]: SyncEngine finished without problem.
08-03 00:38:50:109 [ info gui.folder ]: Folder sync result:  4
08-03 00:38:50:109 [ info gui.application ]:    Sync state changed for folder  "https://surfdrive.surf.nl/files/remote.php/dav/files/[deleted by Phlos]@soliscom.uu.nl/" :  "Success"

Within my directory tree, only directories within my bigger directory "SYNTHETICS" were deleted, while a similarly large directory "DATA" was untouched. Hypothesising upon the cause of the deletion, could it be that if the connection breaks during the discovery phase, everything that has not yet been discovered is somehow branded as "doesn't exist on the server", upon which it is deleted?

@T0mWz
Copy link

T0mWz commented Aug 3, 2018

Hi All,

I checked the logs at our side, to see whats happening here.

At this time there were two clients online, this Linux 2.4.2 client and an another 2.4.1 Linux client which did not do much.
You see below that around that time a 401 error occurs for that specific folder. Will check @Phlos her account to see if there is a mismatch here between OC metadata and data on disk. On disk it looks good, but double check is always a good idea.

<IP-HERE> - - [03/Aug/2018:00:38:43 +0200] "PROPFIND /files/remote.php/dav/files/<USER-EPPN>/work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_19_4 HTTP/1.1" 207 339496 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:358666 response_size:339496 bytes_received:1078 bytes_sent:340626 W2OH84rL4pSvj14aHf699AAAABI <IP-HERE> - - [03/Aug/2018:00:38:44 +0200] "PROPFIND /files/remote.php/dav/files/<USER-EPPN>/work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_19_final HTTP/1.1" 207 661856 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:440227 response_size:661856 bytes_received:1082 bytes_sent:663299 W2OH9IrL4pSvj14aHf699QAAABI <IP-HERE> - - [03/Aug/2018:00:38:44 +0200] "PROPFIND /files/remote.php/dav/files/<USER-EPPN>/work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_19_final_init_rho HTTP/1.1" 207 669155 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:430606 response_size:669155 bytes_received:1091 bytes_sent:670606 W2OH9IrL4pSvj14aHf699gAAABI <IP-HERE> - - [03/Aug/2018:00:38:45 +0200] "PROPFIND /files/remote.php/dav/files/<USER-EPPN>/work/EMed/EMed_full/SYNTHETICS/event_77/ITERATION_1_4 HTTP/1.1" 401 476 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:30911 response_size:476 bytes_received:1077 bytes_sent:1052 W2OH9YrL4pSvj14aHf699wAAABI <IP-HERE> - - [03/Aug/2018:00:38:50 +0200] "PROPFIND /files/remote.php/dav/files/<USER-EPPN>/ HTTP/1.1" 207 501 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:108387 response_size:501 bytes_received:795 bytes_sent:1284 W2OH@mz3SSYPEGckPL3W2wAAAD8 <IP-HERE> - - [03/Aug/2018:00:38:50 +0200] "GET /files/remote.php/dav/avatars/<USER-EPPN>/128.png HTTP/1.1" 404 181 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:96575 response_size:181 bytes_received:615 bytes_sent:667 W2OH@mz3SSYPEGckPL3W3gAAAD8 <IP-HERE> - - [03/Aug/2018:00:39:01 +0200] "PROPFIND /files/remote.php/dav/files/<USER-EPPN>/ HTTP/1.1" 207 501 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:115889 response_size:501 bytes_received:795 bytes_sent:1284 W2OIBWgOj45xXDZgu-KULAAAADg <IP-HERE> - - [03/Aug/2018:00:39:33 +0200] "PROPFIND /files/remote.php/dav/files/<USER-EPPN>/ HTTP/1.1" 207 501 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:113601 response_size:501 bytes_received:795 bytes_sent:1284 W2OIJWO22vbteyYulfWu5gAAABs

After this 401 error, the client logs in again around 00:38:50 o'clock.

@Phlos
Copy link
Author

Phlos commented Aug 3, 2018

Hi @T0mWz should I do or check something? Not quite sure from what you write. You write

around that time a 401 error occurs for that specific folder

but it is important to note that it is not the specific folder, but the specific time (every hour) that is causing the trouble. Around the 31st of July it was at every ~[hour]:36 (see my message above), now on the 3rd of August it is at every ~[hour]:39. It seems to shift by 0-5s every time. Whether it is SYNTHETICS/event_77 or any other folder seems to be of no importance. To me, this points to the fact that there is not something wrong with the data, but with the connection. The question is: why is it happening and can we do something about it?

ps, if you want to insert multi-line output you can put it between triple back quotes ``` above and below

@T0mWz
Copy link

T0mWz commented Aug 3, 2018

Hi @Phlos ,

Today I have made a comparison of your real data and the metadata. OwnCloud works on the base of a database containing the metadata of your files, which does not seem to be completely in sync.

I start a command to bring them in sync again, but because you have a lot of files, this will takes a while. Tomorrow, I will check if everything is back in sync, from then we have to see if your client reacts normally again.

@T0mWz
Copy link

T0mWz commented Aug 4, 2018

Hi @Phlos ,

The metadata of your files is back in sync. Would you be able to see if the problems were solved for you?

@Phlos
Copy link
Author

Phlos commented Aug 6, 2018

@T0mWz No, unfortunately the problem persists :( Hourly errors as always.

@ogoffart also unchecking loads of directories didn't bring any solution.

Now, if I grep for "ERROR" in the log, I get some new error type ("Premature end of document") and the hourly schedule is a bit distorted every time this happens. See below.

08-05 01:58:26:190 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 02:58:30:700 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 03:58:31:047 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 04:58:33:472 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 05:55:02:799 [ warning sync.networkjob.lscol ]:   ERROR "Premature end of document." ""
08-05 05:55:02:801 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 05:58:46:414 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 06:13:10:797 [ warning sync.networkjob.lscol ]:   ERROR "Premature end of document." ""
08-05 06:13:10:798 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 06:14:46:798 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user"
08-05 06:41:58:800 [ warning sync.networkjob.lscol ]:   ERROR "Premature end of document." ""
08-05 06:41:58:802 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 08:03:56:880 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 09:03:57:821 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 10:03:58:520 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 11:04:01:048 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 12:04:01:179 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 13:04:02:385 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 14:04:02:243 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"
08-05 15:04:03:498 [ warning sync.engine ]:     ERROR during  csync_update :  "Aborted by the user Aborted by the user"

@ogoffart
Copy link
Contributor

ogoffart commented Aug 6, 2018

"Aborted by the user" is caused by the 401 error. (When we get a 401 we abort the request. We are not supposed to get 401 unless the password is wrong, so we'll let the UI retry a connection in order to show the password dialog)

"Premature end of document." Means that the PROPFIND reply was not a valid XML document, likely the server truncated the reply.

@T0mWz
Copy link

T0mWz commented Aug 6, 2018

Sorry @Phlos , you're right. It happened by coincidence that I saw twice the same file / folder where the 401 error occurred.

@ogoffart , It seems that the client receives a 401 error where the oAuth token may have expired, after which it does a POST for a new oAuth token refresh. See Apache log below;

<IP-user> - - [06/Aug/2018:08:04:18 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/work/EMed/EMed_full/ADJOINT_SOURCES_AND_WINDOWS/ADJOINT_SOURCES/event_58/ITERATION_19_final HTTP/1.1" 207 183548 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:1928315 response_size:183548 bytes_received:1111 bytes_sent:184526 W2fk4iTKOKp6PEB@gKkymgAAABU
<IP-user> - - [06/Aug/2018:08:04:20 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/work/EMed/EMed_full/ADJOINT_SOURCES_AND_WINDOWS/ADJOINT_SOURCES/event_58/ITERATION_19_final_init_rho HTTP/1.1" 401 476 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:65873 response_size:476 bytes_received:1120 bytes_sent:1052 W2fk5CTKOKp6PEB@gKkymwAAABU
<IP-user> - <TOKEN> [06/Aug/2018:09:04:22 +0200] "POST /files/index.php/apps/oauth2/api/v1/token HTTP/1.1" 200 343 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:111611 response_size:343 bytes_received:665 bytes_sent:1352 W2fy9p1fV5OlBGjdBHgpQAAAACU
<IP-user> - - [06/Aug/2018:09:04:23 +0200] "GET /files/status.php HTTP/1.1" 200 162 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:43304 response_size:162 bytes_received:576 bytes_sent:812 W2fy96H@ebmAEXQDGXoeyAAAABg
<IP-user> - - [06/Aug/2018:09:04:23 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 501 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:190154 response_size:501 bytes_received:799 bytes_sent:1284 W2fy96H@ebmAEXQDGXoeyQAAABg
<IP-user> - - [06/Aug/2018:09:04:23 +0200] "GET /files/ocs/v1.php/cloud/capabilities?format=json HTTP/1.1" 200 1030 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:120993 response_size:1030 bytes_received:629 bytes_sent:1664 W2fy96H@ebmAEXQDGXoeygAAABg
<IP-user> - - [06/Aug/2018:09:04:23 +0200] "GET /files/ocs/v1.php/config?format=json HTTP/1.1" 200 164 "-" "Mozilla/5.0 (Linux) mirall/2.4.2 (build 9883)" microsecs:135195 response_size:164 bytes_received:617 bytes_sent:797 W2fy92dCXu8zmEmaujv1PAAAABU

@Phlos , I see that you have also a sync-client version 2.4.1 running on an another location. Shows that client version also this behavior?

@Phlos
Copy link
Author

Phlos commented Aug 6, 2018

The 2.4.1 client (which is installed on my main workstation at work) has no issues.

@ogoffart
Copy link
Contributor

ogoffart commented Aug 6, 2018

So the problem is that the initial discovery is taking more than one hour which is more than the oauth token expiration.

And the sync client can't cope with that, as it has to cancel the sync and refresh the oauth token.

In theory we should refresh the oauth token without restarting the sync. But this is a bit tricky with the current client architecture.

The new discovery algorithm (2.6) will also allow to do the propagation and the discovery at the same time. Which will allow the next sync to restart where we left, without the need to spend one hour re-discovering again.

@ogoffart also unchecking loads of directories didn't bring any solution.

I was hoping it would.
Maybe you need to uncheck more. There must be really big directory in your tree that could maybe be unchecked.

The 2.4.1 client (which is installed on my main workstation at work) has no issues.

There are almost no difference with 2.4.2.
It could be that the 2.4.1 configuration is not using oauth because it was configured with a password before?
It could be that t started syncing when there was less directory on the server so that the discovery did not take more than one hour.

@Phlos
Copy link
Author

Phlos commented Aug 7, 2018

Hi @ogoffart --

Maybe you need to uncheck more. There must be really big directory in your tree that could maybe be unchecked.

I will try and update you with the results... Now unchecked a 51 GB dir, if that doesn't do the trick I don't know what will :-P -- let's hope the deletion is not futile!

There are almost no difference with 2.4.2.

Hm, interesting.

It could be that the 2.4.1 configuration is not using oauth because it was configured with a password before?

What do you mean by this? Not sure what oauth is.

It could be that t started syncing when there was less directory on the server so that the discovery did not take more than one hour.

There were definitely always the same amount of directories. The only significant difference is that the laptop (the problematic machine w/ 2.4.1) is only used intermittently (i.e. there may be several weeks where I don't use it a lot) whereas the workstation is only turned off briefly every now and then, and is therefore kept mostly in sync. Also, the laptop was probably not in sync at the time that its client was updated to 2.4.2. Can this be related?

@T0mWz
Copy link

T0mWz commented Aug 7, 2018

Hi @ogoffart ,

Thanks for your reply.

For SURFdrive, we have configured multiple back-ends. OwnCloud sync clients starting from 2.4.0 and newer will authenticate through oAuth against SURFdrive. Older clients still using the Shibboleth Single single sign-on method to login.

That is why I would expect the same behavior with the 2.4.1 client. @Phlos, do you synchronize on your other system with the 2.4.1 client also all files or just a selection of folders/files?
@ogoffart, could the local sync-client database still play a role?

@Phlos
Copy link
Author

Phlos commented Aug 7, 2018

@T0mWz

do you synchronize on your other system with the 2.4.1 client also all files or just a selection of folders/files?

My workstation (with OC 2.4.1) syncs everything.

@ckamm
Copy link
Contributor

ckamm commented Aug 20, 2018

Note, at 08-18 16:25:5 a sync failed with the error "Free space on disk is less than 48 MB". Maybe this somehow influenced the db? Even if sync finished successfully several times afterwards.

@Phlos
Copy link
Author

Phlos commented Aug 20, 2018

@ckamm I doubt it, because the disk space issue was only relevant this day. All the other times files got deleted, there was plenty of space. (on 08-18, I had temporarily rsynced a rather large subdirectory to my homedir, i.e. where it was safe from deletion ;) )

@tomneedham
Copy link

Interesting before the 05:26 deletion, there are two PROPFINDS from the server:

<CLIENT IP> - - [17/Aug/2018:05:25:23 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 501 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:129403 response_size:501 bytes_received:800 bytes_sent:1284 W3ZAI1HyHwQdPnjhy9YCiwAAACE
<CLIENT IP> - - [17/Aug/2018:05:25:34 +0200] "PROPFIND /files/remote.php/dav/files/<USERNAME>/ HTTP/1.1" 207 14829 "-" "Mozilla/5.0 (Linux) mirall/2.4.3 (build 10035)" microsecs:396194 response_size:14829 bytes_received:1049 bytes_sent:15640 W3ZALtbL2ItCwCIiNDF4bQAAAAc

All the previous PROPFINDS have the same size, but the last one before the deletion has quite a larger size in the response. 14829 vs 501 bytes. This could lead us down the thought path that it is an incorrect server response that is triggering the client to delete the files. mitmproxy with the request/response bodies here would be what we need next - but how do we capture just this, at the right moment?

@T0mWz
Copy link

T0mWz commented Sep 3, 2018

Was still thinking about this issue. Was wondering if there were changes on your side @Phlos .
Will try to make a change on our site for some webservers, to see of the strange 401 PROPFIND response can be fixed.

@owncloud Guys; Currently we capture 401 response to another page due to unauthorized user logins.
I will remove this 401 redirect for the oAuth2 sync client webservers, too see or that may have an impact on the strange propfind result of this specific issue.
But would expect that the client would handle this properly and would not show such as a strange behavior with propfind / locale remove action.

@Phlos, would you perhaps be able to try again to see of it makes a difference. Super thanks for your help!

@Phlos
Copy link
Author

Phlos commented Sep 12, 2018

@T0mWz sorry for the late reaction! I haven't had any more deleterious experiences recently. I strongly pruned my data, though, and only have subsets of my directory tree synced to my laptop, not the whole thing. At least for now, it seems to work, but I'm holding my breath continuously... ;)

Could there be a relation with directory size / number of files / depth/breadth of tree?

@Phlos
Copy link
Author

Phlos commented Sep 18, 2018

Something weird just happened again as I was looking at it. Unfortunately, no logging took place, but it was the directory work/EMed/plots that got deleted -- seemingly on the server side, because it still existed on my laptop. Now it is uploading the directory from my local laptop dir (where it was retained). VERY strange.....

@T0mWz can you see anything in the server logs around 15:19:16?

@ckamm
Copy link
Contributor

ckamm commented Sep 19, 2018

@Phlos About logging: If you're using 2.5.0 definitely

  • open the client settings
  • hit F12, the log window should pop up
  • enable "Permanently save logs"
  • hover over the checkbox and take note of the directory logs will be stored in

That way detailed logs will be written all the time and you'll have a couple of hours to make a copy of them when something strange-looking happens. You only need to do this once - it's a new feature to make it easier to have logs for rare, hard-to-reproduce issues like this.

@ogoffart
Copy link
Contributor

I opened #6814 to track the oauth2 and timeout issue.

(So far no progress have been done for a month regarding the reason why the directories got deleted)

@Phlos
Copy link
Author

Phlos commented Oct 22, 2018

Update: after purging my tree, for a while the problem didn't show up. However, it had been growing again since, and last week (of course just before I wanted to take the laptop along with me on a trip) it decided to do the big ol' delete again.

Potentially having a big tree not only results in the one hour discovery cutoff (which inhibits re-download, as per #6814 ) but also somehow results in deletion. Related?

@ckamm I tried the F12 option, but unfortunately the owncloud log dir defaults to my /tmp/ which is in my root which is extremely full as it is already - will have to start it up manually. Maybe something to change, i.e. let the user decide on the log dir location?

I'll now and tune down my selective sync in hopes of building up the tree again..

@T0mWz
Copy link

T0mWz commented Oct 23, 2018

Hi @Phlos ,

Painful to hear. :(

During the upgrade begin of October, we upgrade to latest OC version and latest oAuth version, assuming that this could possibly help in combination with latest OC sync-client. However, this has reset the AccessToken lifetime back to the default 1 hour.

Will restore this again to a longer period, so hopefully the sync client has some more breath to complete a sync. However .. this can not be infinite and would like to start with an increase to 2 hours.

@ogoffart, I think it's a hard subject. But is there a way that a client can better deal with an expired oAuth AccessToken?

@michaelstingl
Copy link
Contributor

But is there a way that a client can better deal with an expired oAuth AccessToken?

@T0mWz Solution in #6814 and #6819 . You could test with the latest daily builds from https://download.owncloud.com/desktop/daily/?C=M;O=D

@Phlos
Copy link
Author

Phlos commented Mar 1, 2019

It's happened again. Near-complete deletion of my largest directory w/ most elaborate tree.

I was working on my laptop yesterday after not having used it for about a week. It then proceeded to download most of the stuff in this one directory while I was working in another. This other directory keeps syncing fine (have in between worked on another machine on the same files). [this is an adapted message because I thought that the deletion had taken place today]
[Original message:] ~~I was working on the laptop yesterday early afternoon, all fine - hibernated, continued working on a desktop, all fine - started working again on the laptop, directory where I was working was behaving normally,but I just noticed that it had deleted everything in this other directory. Could hibernation have anything to do with it? ~~

I've got logs, if anyone's interested...
owncloud_deletion

@ckamm
Copy link
Contributor

ckamm commented Mar 1, 2019

@Phlos Logs are very interesting! mail at ckamm de.

@ckamm
Copy link
Contributor

ckamm commented Mar 4, 2019

The issue I see in the log is similar to what was observed before:

// local discovery example (thousands of entries)
02-28 09:11:12:885 [ info sync.csync.updater ]: Database entry found, compare: 1467030292 <-> 1467030292, etag:  <-> 3e981534e5b53cfc77ece8d17fa05013, inode: 2778272 <-> 2778272, size: 4096 <-> 4096, perms: 0 <-> 9f, type: 0 <-> 0, checksum:  <-> SHA1:, ignore: 0
02-28 09:11:12:885 [ info sync.csync.updater ]: file: A/B/C/file, instruction: INSTRUCTION_NONE 

// remote discovery doesn't find a single one
02-28 09:11:17:404 [ info sync.csync.updater ]: Reading from database: A/B
02-28 09:11:17:404 [ info sync.csync.updater ]: file: A/B, instruction: INSTRUCTION_NONE <<=
02-28 09:11:26:396 [ info sync.csync.updater ]: 0 entries read below path A/B from db.

I'll continue to look for potential causes.

@ckamm
Copy link
Contributor

ckamm commented Mar 4, 2019

The issue I see is that SqlQuery::next() doesn't distinguish between end-of-data and errors. It's possible that it returns false due to an error and the function that reads from db assumes there simply isn't any data. I'll fix that.

@ckamm ckamm modified the milestones: 2.6.0, 2.5.4 Mar 4, 2019
ckamm added a commit that referenced this issue Mar 4, 2019
This could fix a problem where the client incorrectly decides to delete
local data.

Previously any sqlite3_step() return value that wasn't SQLITE_ROW would
be interpreted as "there's no more data here". Thus an sqlite error at a
bad time could cause the remote discovery to fail to read an unchanged
subtree from the database. These files would then be deleted locally.

With this change sqlite errors from sqlite3_step are detected and
logged. For the particular case of SyncJournalDb::getFilesBelowPath()
the error will now be propagated and the sync run will fail instead of
performing spurious deletes.

Note that many other database functions still don't distinguish
not-found from error cases. Most of them won't have as severe effects on
affected sync runs though.
ckamm added a commit that referenced this issue Mar 4, 2019
This could fix a problem where the client incorrectly decides to delete
local data.

Previously any sqlite3_step() return value that wasn't SQLITE_ROW would
be interpreted as "there's no more data here". Thus an sqlite error at a
bad time could cause the remote discovery to fail to read an unchanged
subtree from the database. These files would then be deleted locally.

With this change sqlite errors from sqlite3_step are detected and
logged. For the particular case of SyncJournalDb::getFilesBelowPath()
the error will now be propagated and the sync run will fail instead of
performing spurious deletes.

Note that many other database functions still don't distinguish
not-found from error cases. Most of them won't have as severe effects on
affected sync runs though.
ckamm added a commit that referenced this issue Mar 5, 2019
This could fix a problem where the client incorrectly decides to delete
local data.

Previously any sqlite3_step() return value that wasn't SQLITE_ROW would
be interpreted as "there's no more data here". Thus an sqlite error at a
bad time could cause the remote discovery to fail to read an unchanged
subtree from the database. These files would then be deleted locally.

With this change sqlite errors from sqlite3_step are detected and
logged. For the particular case of SyncJournalDb::getFilesBelowPath()
the error will now be propagated and the sync run will fail instead of
performing spurious deletes.

Note that many other database functions still don't distinguish
not-found from error cases. Most of them won't have as severe effects on
affected sync runs though.
@ckamm ckamm added ReadyToTest QA, please validate the fix/enhancement and removed PR available labels Mar 5, 2019
@guruz
Copy link
Contributor

guruz commented Mar 10, 2019

Ouch!
Thanks a lot @ckamm for fixing.
@Phlos @T0mWz thanks for the logs and patience..

@jnweiger
Copy link
Contributor

jnweiger commented Mar 14, 2019

not tested with 2.5.4

It appears hard to reproduce a setup locally, that would trigger the issue. Not tested here.
@Phlos can probably only retest after we have a released branded 2.5.4 client.
-> suggest 'not tested' is acceptable for proceeding with 2.5.4rc1

Okayish

@ckamm ckamm closed this as completed Mar 15, 2019
@zinalili
Copy link

zinalili commented May 3, 2019

It just happened to me on Windows 2.5.4. A folder was still syncing, I renamed it and all the other folders got deleted. Only this one left. Hopping on beta channel now, hoping for the better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ReadyToTest QA, please validate the fix/enhancement type:bug
Projects
None yet
Development

No branches or pull requests

9 participants