Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix removing remote shares when the remote server is unreachable #25332

Conversation

danxuliu
Copy link
Member

@danxuliu danxuliu commented Jan 26, 2021

Fixes #14797

To delete a file through DAV the file info of the file is first got, but if its storage is not available an exception is thrown and the deletion is aborted. If a file is a received federated share and the remote server is down the storage is not available, and therefore it is not possible to delete the file.

A received remote share can be deleted using the RemoteController, even if the remote instance is down. However, that OCS endpoint does not seem to be called from any UI element. Moreover, in order to use that endpoint the id of the remote share is needed, and in some cases it is not currently possible to list the remote shares when one of the remote instances is down (because in those cases getFileInfo throws an uncaught exception).

I have added integration tests for the issue, but I do not know how to fix it in the storage, so I would need someone else to take over. Sorry and thanks a lot! @rullzer @icewind1991

Pending:

  • Check if kesselb's patch is the right approach to fix the bug
    • If it is, check if the return code when deleting a file from an unreachable remote server should be 204 or 404
      • If it should be 204, adjust the fix
      • If it should be 404, adjust the test
    • If it is not, fix the bug with a different approach

How to test

  • In server A, open the Files app
  • Share a file with user1 at server B
  • In server B, login as user1
  • Open the Files app and accept the remote share
  • Kill server A
  • In server B, try to delete the received file using the three dots menu

Expected result

The file is deleted

Actual result

The file can not be deleted

@faily-bot
Copy link

faily-bot bot commented Jan 26, 2021

🤖 beep boop beep 🤖

Here are the logs for the failed build:

Status of 1696: failure

integration-federation_features

  • build/integration/federation_features/federated.feature:326
Show full log
[Tue Jan 26 11:09:11 2021] 127.0.0.1:45212 [200]: /ocs/v1.php/apps/testing/api/v1/app/files_sharing/incoming_server2server_group_share_enabled
[Tue Jan 26 11:09:12 2021] 127.0.0.1:45218 [200]: /ocs/v1.php/apps/testing/api/v1/app/files_sharing/outgoing_server2server_group_share_enabled
  Scenario: Delete federate share from another server no longer reachable                                  # /drone/src/build/integration/federation_features/federated.feature:326
    Given Using server "LOCAL"                                                                             # FederationContext::usingServer()
[Tue Jan 26 11:09:12 2021] 127.0.0.1:45222 [404]: /ocs/v2.php/cloud/users/user0
[Tue Jan 26 11:09:13 2021] 127.0.0.1:45226 [200]: /ocs/v1.php/cloud/users
[Tue Jan 26 11:09:13 2021] 127.0.0.1:45230 [200]: /ocs/v1.php/cloud/users/user0
[Tue Jan 26 11:09:14 2021] 127.0.0.1:45236 [200]: /ocs/v2.php/cloud/users/user0
    And user "user0" exists                                                                                # FederationContext::assureUserExists()
    Given Using server "REMOTE"                                                                            # FederationContext::usingServer()
[Tue Jan 26 11:09:14 2021] 127.0.0.1:49694 [404]: /ocs/v2.php/cloud/users/user1
[Tue Jan 26 11:09:15 2021] 127.0.0.1:49696 [200]: /ocs/v1.php/cloud/users
[Tue Jan 26 11:09:15 2021] 127.0.0.1:49700 [200]: /ocs/v1.php/cloud/users/user1
[Tue Jan 26 11:09:16 2021] 127.0.0.1:49706 [200]: /ocs/v2.php/cloud/users/user1
    And user "user1" exists                                                                                # FederationContext::assureUserExists()
[Tue Jan 26 11:09:16 2021] 127.0.0.1:49710 [201]: /remote.php/webdav/textfile0.txt
    And User "user1" copies file "/textfile0.txt" to "/remote-share.txt"                                   # FederationContext::userCopiesFileTo()
[Tue Jan 26 11:09:16 2021] 127.0.0.1:45262 [200]: /ocm-provider/
[Tue Jan 26 11:09:16 2021] 127.0.0.1:45264 [201]: /index.php/ocm/shares
[Tue Jan 26 11:09:16 2021] 127.0.0.1:49712 [200]: /ocs/v1.php/apps/files_sharing/api/v1/shares
    And User "user1" from server "REMOTE" shares "/remote-share.txt" with user "user0" from server "LOCAL" # FederationContext::federateSharing()
    And Using server "LOCAL"                                                                               # FederationContext::usingServer()
[Tue Jan 26 11:09:17 2021] 127.0.0.1:45266 [200]: /ocs/v1.php/apps/files_sharing/api/v1/remote_shares/pending
[Tue Jan 26 11:09:17 2021] 127.0.0.1:49728 [200]: //ocm-provider/
[Tue Jan 26 11:09:17 2021] 127.0.0.1:49730 [201]: /index.php/ocm/notifications
[Tue Jan 26 11:09:17 2021] 127.0.0.1:49732 [200]: //ocs-provider/
[Tue Jan 26 11:09:17 2021] 127.0.0.1:49734 [200]: /ocs/v2.php/cloud/shares/16/accept?format=json
[Tue Jan 26 11:09:17 2021] 127.0.0.1:45268 [200]: /ocs/v1.php/apps/files_sharing/api/v1/remote_shares/pending/16
    And User "user0" from server "LOCAL" accepts last pending share                                        # FederationContext::acceptLastPendingShare()
    And remote server is stopped                                                                           # FederationContext::remoteServerIsStopped()
[Tue Jan 26 11:09:18 2021] 127.0.0.1:45282 [404]: /remote.php/webdav/remote-share.txt
    When User "user0" deletes file "/remote-share.txt"                                                     # FederationContext::userDeletesFile()
      Client error: `DELETE http://localhost:8080/remote.php/webdav/remote-share.txt` resulted in a `404 Not Found` response:
      <?xml version="1.0" encoding="utf-8"?>
      <d:error xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns">
        <s:exception>Sabre\DA (truncated...)
       (GuzzleHttp\Exception\ClientException)
    Then the HTTP status code should be "204"                                                              # FederationContext::theHTTPStatusCodeShouldBe()
[Tue Jan 26 11:09:18 2021] 127.0.0.1:49752 [200]: //ocm-provider/
[Tue Jan 26 11:09:19 2021] 127.0.0.1:49754 [201]: /index.php/ocm/notifications
[Tue Jan 26 11:09:19 2021] 127.0.0.1:49758 [200]: //ocs-provider/
[Tue Jan 26 11:09:19 2021] 127.0.0.1:49760 [200]: /ocs/v2.php/cloud/shares/16/decline?format=json
[Tue Jan 26 11:09:19 2021] 127.0.0.1:45292 [200]: /ocs/v1.php/cloud/users/user0
[Tue Jan 26 11:09:19 2021] 127.0.0.1:45308 [404]: /ocs/v2.php/cloud/users/user0
[Tue Jan 26 11:09:20 2021] 127.0.0.1:49766 [200]: /ocs/v1.php/cloud/users/user1
[Tue Jan 26 11:09:20 2021] 127.0.0.1:49776 [404]: /ocs/v2.php/cloud/users/user1
[Tue Jan 26 11:09:21 2021] 127.0.0.1:45326 [200]: /ocs/v1.php/apps/testing/api/v1/app/files_sharing/incoming_server2server_group_share_enabled
[Tue Jan 26 11:09:21 2021] 127.0.0.1:45328 [200]: /ocs/v1.php/apps/testing/api/v1/app/files_sharing/outgoing_server2server_group_share_enabled
[Tue Jan 26 11:09:21 2021] 127.0.0.1:45330 [200]: /ocs/v1.php/cloud/apps/testing
[Tue Jan 26 11:09:21 2021] 127.0.0.1:45336 [200]: /ocs/v1.php/cloud/apps?filter=enabled

This was referenced Jan 27, 2021
@rullzer rullzer modified the milestones: Nextcloud 21, Nextcloud 22 Feb 2, 2021
@rullzer
Copy link
Member

rullzer commented Feb 2, 2021

Master is Nextcloud 22 now.
If this should go into 21 it should be backported.

@danxuliu
Copy link
Member Author

danxuliu commented Feb 2, 2021

/backport to stable21

@danxuliu
Copy link
Member Author

danxuliu commented Feb 2, 2021

/backport to stable20

@danxuliu
Copy link
Member Author

danxuliu commented Feb 2, 2021

/backport to stable19

@danxuliu danxuliu force-pushed the fix-removing-remote-shares-when-the-remote-server-is-unreachable branch from e8cd226 to cfee9a2 Compare April 8, 2021 20:42
@danxuliu
Copy link
Member Author

danxuliu commented Apr 8, 2021

Help needed

I have rebased on latest master and added a few more test scenarios.

I tested the change suggested by @kesselb for getFileInfo and it works (although $data = null; needs to be added in the catch, otherwise $data is undefined).

However, although it is possible to delete the file using the Files app the last integration test does not pass even with that change. The problem is that it expects a 204 state when deleting the file, but it gets 404 instead. I guess that it makes sense that the file can not be found if the remote server is unreachable, but I do not know if that is just a side effect of the fix or if it is the right return code 🤷

In any case, I do not know either if there is a better way to fix this or if that fix is the proper one. Hence the help needed :-)

This was referenced May 20, 2021
@blizzz blizzz mentioned this pull request Jun 2, 2021
57 tasks
@skjnldsv skjnldsv modified the milestones: Nextcloud 22, Nextcloud 23 Jun 2, 2021
@danxuliu danxuliu force-pushed the fix-removing-remote-shares-when-the-remote-server-is-unreachable branch from cfee9a2 to 877f75f Compare July 2, 2021 09:26
@danxuliu danxuliu force-pushed the fix-removing-remote-shares-when-the-remote-server-is-unreachable branch from 877f75f to 38d6ff1 Compare September 13, 2021 16:48
@PVince81 PVince81 self-assigned this Sep 17, 2021
@PVince81 PVince81 force-pushed the fix-removing-remote-shares-when-the-remote-server-is-unreachable branch from 38d6ff1 to ef106f4 Compare September 18, 2021 09:48
@PVince81
Copy link
Member

the tests reported as failing in the previous iteration were:

/drone/src/build/integration/federation_features/federated.feature:343
/drone/src/build/integration/federation_features/federated.feature:486

I've rebased this branch locally and ran all the new integration tests, they all passed for me.

Now pushed the rebased branch, let's see if they are now all green also on CI...

@PVince81
Copy link
Member

so, the tests pass for me locally even on stable22 (with a local backport)

this is going to be tricky as it seems to be related to the env on which it is running
I'm guessing that the "server is down" detection might be triggering different behavior/exceptions depending on the setup which will make it harder to debug without relying on CI.

@PVince81
Copy link
Member

the exception from CI:

    And remote server is stopped                                                                           # FederationContext::remoteServerIsStopped()
    When As an "user0"                                                                                     # FederationContext::asAn()
[Sat Sep 18 09:55:29 2021] {"Exception":"OCP\\Files\\StorageNotAvailableException","Message":"Sabre\\HTTP\\ClientException: Failed to connect to localhost port 8180: Connection refused","Code":1,"Trace":[{"file":"\/drone\/src\/lib\/private\/Files\/Storage\/DAV.php","line":289,"function":"convertException","class":"OC\\Files\\Storage\\DAV","type":"->","args":[{"__class__":"Sabre\\HTTP\\ClientException"},""]},{"file":"\/drone\/src\/lib\/private\/Files\/Storage\/DAV.php","line":785,"function":"propfind","class":"OC\\Files\\Storage\\DAV","type":"->","args":[""]},{"file":"\/drone\/src\/apps\/files_sharing\/lib\/External\/Storage.php","line":176,"function":"hasUpdated","class":"OC\\Files\\Storage\\DAV","type":"->","args":["",1631958927]},{"file":"\/drone\/src\/lib\/private\/Files\/Storage\/Wrapper\/Wrapper.php","line":381,"function":"hasUpdated","class":"OCA\\Files_Sharing\\External\\Storage","type":"->","args":["",1631958927]},{"file":"\/drone\/src\/lib\/private\/Files\/Storage\/Wrapper\/Availability.php","line":384,"function":"hasUpdated","class":"OC\\Files\\Storage\\Wrapper\\Wrapper","type":"->","args":["",1631958927]},{"file":"\/drone\/src\/lib\/private\/Files\/Storage\/Wrapper\/Wrapper.php","line":381,"function":"hasUpdated","class":"OC\\Files\\Storage\\Wrapper\\Availability","type":"->","args":["",1631958927]},{"file":"\/drone\/src\/lib\/private\/Files\/Cache\/Watcher.php","line":127,"function":"hasUpdated","class":"OC\\Files\\Storage\\Wrapper\\Wrapper","type":"->","args":["",1631958927]},{"file":"\/drone\/src\/lib\/private\/Files\/View.php","line":1351,"function":"needsUpdate","class":"OC\\Files\\Cache\\Watcher","type":"->","args":["",{"__class__":"OC\\Files\\Cache\\CacheEntry"}]},{"file":"\/drone\/src\/lib\/private\/Files\/View.php","line":1393,"function":"getCacheEntry","class":"OC\\Files\\View","type":"->","args":[{"__class__":"OCA\\Files_Trashbin\\Storage","cache":null,"scanner":null,"watcher":null,"propagator":null,"updater":null},"","\/remote-share.txt"]},{"file":"\/drone\/src\/apps\/files_sharing\/lib\/Controller\/RemoteController.php","line":118,"function":"getFileInfo","class":"OC\\Files\\View","type":"->","args":["\/user0\/files\/remote-share.txt"]},{"function":"extendShareInfo","class":"OCA\\Files_Sharing\\Controller\\RemoteController","type":"::","args":[{"id":"19","share_type":"0","parent":"-1","remote":"http:\/\/localhost:8180\/","remote_id":"19","0":"And 6 more entries, set log level to debug to see all entries"}]},{"file":"\/drone\/src\/apps\/files_sharing\/lib\/Controller\/RemoteController.php","line":142,"function":"array_map","args":["self::extendShareInfo",[{"id":"19","share_type":"0","parent":"-1","remote":"http:\/\/localhost:8180\/","remote_id":"19","0":"And 6 more entries, set log level to debug to see all entries"}]]},{"file":"\/drone\/src\/lib\/private\/AppFramework\/Http\/Dispatcher.php","line":217,"function":"getShares","class":"OCA\\Files_Sharing\\Controller\\RemoteController","type":"->","args":[]},{"file":"\/drone\/src\/lib\/private\/AppFramework\/Http\/Dispatcher.php","line":126,"function":"executeController","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\Files_Sharing\\Controller\\RemoteController"},"getShares"]},{"file":"\/drone\/src\/lib\/private\/AppFramework\/App.php","line":157,"function":"dispatch","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\Files_Sharing\\Controller\\RemoteController"},"getShares"]},{"file":"\/drone\/src\/lib\/private\/Route\/Router.php","line":301,"function":"main","class":"OC\\AppFramework\\App","type":"::","args":["OCA\\Files_Sharing\\Controller\\RemoteController","getShares",{"__class__":"OC\\AppFramework\\DependencyInjection\\DIContainer"},{"_route":"ocs.files_sharing.Remote.getShares"}]},{"file":"\/drone\/src\/ocs\/v1.php","line":62,"function":"match","class":"OC\\Route\\Router","type":"->","args":["\/ocsapp\/apps\/files_sharing\/api\/v1\/remote_shares"]}],"File":"\/drone\/src\/lib\/private\/Files\/Storage\/DAV.php","Line":864,"Hint":"Storage is temporarily not available","CustomMessage":"--"}
[Sat Sep 18 09:55:29 2021] 127.0.0.1:48998 [200]: /ocs/v1.php/apps/files_sharing/api/v1/remote_shares

we see this happening as soon as user0 logs in.

Seeing the exception reminded me of the "availability" wrapper which buffers availability checks. Basically, if it sees that a remote was not available it stores that information for about 10 minutes. So the next checks will use the cached result for that time.

And maybe depending on the environment and test order and speed, the availability caching might behave differently, which can explain why a failure can sometimes happen in different parts of the code.

@skjnldsv skjnldsv requested review from nickvergessen and a team October 25, 2021 15:31
@PVince81
Copy link
Member

@icewind1991 any objection for returning false for unavailable storages in hasUpdated 64b8438 ?

got confirmation in chat that this is fine

@skjnldsv skjnldsv mentioned this pull request Nov 1, 2021
19 tasks
@skjnldsv
Copy link
Member

skjnldsv commented Nov 1, 2021

got confirmation in chat that this is fine

Let's review and merge then! 🚀

@skjnldsv skjnldsv added 3. to review Waiting for reviews and removed 2. developing Work in progress labels Nov 1, 2021
@blizzz blizzz mentioned this pull request Nov 3, 2021
18 tasks
danxuliu and others added 5 commits November 5, 2021 09:48
Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
The federated server needs to be stopped during the tests, so it is now
stopped in the FederationContext for each scenario instead of just once
in the run.sh script.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
Technically, saying that a storage has no updates when it's not
available is correct.

This makes it possible to retrieve the cache entry for the mount point
and also to list and remove unavailable federated shares.

Signed-off-by: Vincent Petry <vincent@nextcloud.com>
Signed-off-by: Vincent Petry <vincent@nextcloud.com>
@PVince81 PVince81 force-pushed the fix-removing-remote-shares-when-the-remote-server-is-unreachable branch from 40c5f68 to c687592 Compare November 5, 2021 08:49
@PVince81
Copy link
Member

PVince81 commented Nov 5, 2021

rebased to make this more up to date

@PVince81
Copy link
Member

PVince81 commented Nov 5, 2021

Hmmm, I've compared the behavior on master with this PR and now when a storage is not available, it is still possible to explore its folder tree. But any write operation will fail with StorageNotAvailableException.
Which means that the "gatekeeping" was mostly done by the hasUpdated method

This feels like a big improvement because it makes it possible to still view the cache tree instead of feeling like being locked out.

Signed-off-by: Vincent Petry <vincent@nextcloud.com>

Co-authored-by: Carl Schwan <carl@carlschwan.eu>
@PVince81 PVince81 merged commit 6d5f10e into master Nov 5, 2021
@PVince81 PVince81 deleted the fix-removing-remote-shares-when-the-remote-server-is-unreachable branch November 5, 2021 13:53
@backportbot-nextcloud
Copy link

The backport to stable20 failed. Please do this backport manually.

@backportbot-nextcloud

This comment has been minimized.

@solracsf
Copy link
Member

solracsf commented Nov 5, 2021

/backport to stable22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot delete/unshare shared file with me when remote server is down
6 participants