Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocis UI tests failing after account cache in ocis-proxy is removed #219

Closed
kulmann opened this issue Sep 8, 2020 · 23 comments · Fixed by owncloud/ocis#648
Closed

ocis UI tests failing after account cache in ocis-proxy is removed #219

kulmann opened this issue Sep 8, 2020 · 23 comments · Fixed by owncloud/ocis#648

Comments

@kulmann
Copy link

kulmann commented Sep 8, 2020

Web ui tests are failing on CI on this PR:
owncloud/ocis#525

Example drone run with the failures:
https://cloud.drone.io/owncloud/ocis/1505/6/7

Timeline and things I noticed:

  • at first the ocis PR was only pulling the new ocis-proxy version that removes the accounts cache from the account_uuid middleware, i.e. this PR from ocis-proxy: Get rid of cache in account_uuid middleware ocis-proxy#100
  • in this early state, CI was already failing
  • the first failing test in CI is tests/acceptance/features/webUIDeleteFilesFolders/deleteFilesFolders.feature:95
  • when running that test standalone, it succeeds. If you run the full test suite CI fails.
  • after some more digging, I found out that when a request to ocis-accounts is coming through grpc (which is the case for ocis-ocs), the handler was not provided with a client for the ocis-settings roleService. Because of that, setting the default user role on newly created accounts didn't work. I fixed that with this PR Init role service in grpc server ocis-accounts#114 and pulled it into this ocis PR as well.
  • CI is still failing with the exact same behaviour: the login times out in the same test scenario as stated above, resulting in 70 scenarios (36 failed, 34 passed). Always.
  • one assumption is that it has to do with OCS, failing to recreate user1 after deleting it. If that's true, the account cache in ocis-proxy would have hidden that, because, well, the account was cached in ocis-proxy. The cache doesn't exist anymore.
  • I can't see anything wrong with the test before the failing one (i.e. tests/acceptance/features/webUIDeleteFilesFolders/deleteFilesFolders.feature:65). Thought it might be leaving the server in an invalid state for that user1 but can't find anything wrong.

@individual-it @phil-davis @dpakach

Solution

  • We decided to add a config Flag which disables the autoprovisioning of users by default.

Long Term

  • A caching solution in the proxy needs to be designed to cache and invalitate access tokens.
@individual-it
Copy link
Member

🤔

2020-09-08T15:09:47Z DBG director found path=/ocs/v2.php/cloud/users/user1 policy=reva prefix=/ocs/v[12].php/cloud/user routeType=regex service=proxy
2020-09-08T15:09:47Z ERR could not load account error="{\"id\":\"com.owncloud.api.accounts\",\"code\":404,\"detail\":\"could not read account: open /var/tmp/ocis-accounts/accounts/user1: no such file or directory\",\"status\":\"Not Found\"}" id=user1 service=accounts
2020-09-08T15:09:47Z ERR could not delete user error="{\"id\":\"com.owncloud.api.accounts\",\"code\":404,\"detail\":\"could not read account: open /var/tmp/ocis-accounts/accounts/user1: no such file or directory\",\"status\":\"Not Found\"}" service=ocs userid=user1

@kulmann
Copy link
Author

kulmann commented Sep 8, 2020

I noticed that as well - and it was already happening in other PRs to ocis, see e.g. here: https://cloud.drone.io/owncloud/ocis/1495/6/6

I don't know if it is related to the issues I'm having now - it happens on the preparation step that deletes the user before recreating it. I guess in other PRs this was not crashing CI because the accounts cache in ocis-proxy was already caching the account, so failing to create it didn't blow up.

@individual-it

@phil-davis
Copy link

phil-davis commented Sep 8, 2020

deleteFilesFolders.feature:65 deletes all files and folders for the user. Is there some strange bug that when all the files/folders are deleted, code somewhere actually deletes the user's root folder? And then there is some trouble deleting the user cleanly?

(e.g. if I have topfolder/subfolder/files.* and I want to "batch" delete files.* then a "shortcut" is to do a single delete of subfolder)

Just trying to think what is interestingly unique about the previous scenario that might cause trouble for delete-create of user1.

@individual-it
Copy link
Member

there is also this known issue: #180 not sure if that is the same issue here

@individual-it
Copy link
Member

probably not, here the issue is already at the login level

@kulmann
Copy link
Author

kulmann commented Sep 8, 2020

I noticed that when running the full test suite and reaching the failing state, subsequent runs of the single scenario in line 95 (quoted above) also fail. Deleting /var/tmp/ocis-accounts and restarting ocis lets the test pass again. Need to reiterate on this to check if this was not a coincidence.

@individual-it
Copy link
Member

something strange is going on there.

  1. after login shortly an error page is shown, but login works (after deleting all folders and a fresh start of ocis)
  2. when deleting all files, the UI shows errors
  3. after a random amount of runs of tests/acceptance/features/webUIDeleteFilesFolders/deleteFilesFolders.feature:65 the login does not work anymore. Check this video: https://streamable.com/ntwewv the first time it happens after the first-run, the next time I had to run the tests multiple times till the issue occurs
  4. tests/acceptance/features/webUIDeleteFilesFolders/deleteFilesFolders.feature:65 always shows an error after deletion and so the test fails even if the login works

@kulmann
Copy link
Author

kulmann commented Sep 9, 2020

Thank you for reproducing it and even making a video!

Fun stuff: when I run tests/acceptance/features/webUIDeleteFilesFolders/deleteFilesFolders.feature:65 isolated and on a fresh instance, even after running it 10 times I don't see the error - neither the strange ignored error after login, nor the error on the kopano login form. For me it's only when I run more tests or the full test suite. 🤷

@kulmann
Copy link
Author

kulmann commented Sep 9, 2020

Now checking what happens on CI when we skip that test on ocis.
https://cloud.drone.io/owncloud/ocis/1522

@kulmann
Copy link
Author

kulmann commented Sep 9, 2020

@individual-it @phil-davis that one scenario is indeed blowing up the subsequent scenarios. Skipping that test in ocis CI passes: https://cloud.drone.io/owncloud/ocis/1522/6/7

@kulmann kulmann transferred this issue from owncloud/ocis Sep 10, 2020
@kulmann kulmann added QA-team bug Something isn't working labels Sep 10, 2020
@exalate-issue-sync exalate-issue-sync bot changed the title tests failing after account cache in ocis-proxy is removed ocis UI tests failing after account cache in ocis-proxy is removed Sep 10, 2020
@exalate-issue-sync
Copy link

Benedikt Kulmann commented: This behaviour was unintentionally reproduced on a QA instance by @dtoledo - no information existing about the steps, just that a QA instance had exactly the same symptoms.

@phil-davis
Copy link

The test was skipped by Phoenix PR owncloud/web#4051 and OCIS PR owncloud/ocis#540

Now the underlying cause needs to be found, fixed, and the test scenario enabled again.

@kulmann
Copy link
Author

kulmann commented Sep 14, 2020

@exalate-issue-sync
Copy link

Benedikt Kulmann commented: Phoenix UI Tests 1 pipeline fails in following scenarios:
tests/acceptance/features/webUIDeleteFilesFolders/deleteFilesFolders.feature:12
tests/acceptance/features/webUIFiles/breadcrumb.feature:53
tests/acceptance/features/webUIFiles/breadcrumb.feature:53
tests/acceptance/features/webUIDeleteFilesFolders/deleteFilesFolders.feature:12
tests/acceptance/features/webUIFiles/breadcrumb.feature:53
tests/acceptance/features/webUIFiles/breadcrumb.feature:53
tests/acceptance/features/webUIFiles/breadcrumb.feature:53

@exalate-issue-sync
Copy link

Benedikt Kulmann commented: server debug output gives a hint that user1@example.org somehow gets created or is loaded from disk without a password:

2020-09-14T14:58:05+02:00 DBG found account AccountEnabled=true CreatedDateTime=null DeletedDateTime=null Description= DisplayName= GidNumber=0 Id=30525cd5-b8b4-4068-a72f-cf562f9670b0 Identities=null IsResourceAccount=false Mail=user1@example.org MemberOf=null OnPremisesDistinguishedName= OnPremisesLastSyncDateTime=null OnPremisesSamAccountName=user1 OnPremisesSecurityIdentifier= OnPremisesSyncEnabled=false OnPremisesUserPrincipalName= PreferredName=user1 UidNumber=0 service=accounts
2020-09-14T14:58:05+02:00 DBG **no password profile** AccountEnabled=true CreatedDateTime=null DeletedDateTime=null Description= DisplayName= GidNumber=0 Id=30525cd5-b8b4-4068-a72f-cf562f9670b0 Identities=null IsResourceAccount=false Mail=user1@example.org MemberOf=null OnPremisesDistinguishedName= OnPremisesLastSyncDateTime=null OnPremisesSamAccountName=user1 OnPremisesSecurityIdentifier= OnPremisesSyncEnabled=false OnPremisesUserPrincipalName= PreferredName=user1 UidNumber=0 service=accounts
2020-09-14T14:58:05+02:00 ERR Login failed binddn=cn=user1,ou=users,dc=example,dc=org service=glauth src={"IP":"::1","Port":58546,"Zone":""} username=user1

@exalate-issue-sync
Copy link

Benedikt Kulmann commented: looking at /var/tmp/ocis-accounts/accounts I see multiple files for user1, only one with filename user1, the others having uuids:

{
  "id": "30525cd5-b8b4-4068-a72f-cf562f9670b0",
  "accountEnabled": true,
  "creationType": "LocalAccount",
  "preferredName": "user1",
  "mail": "user1@example.org",
  "onPremisesSamAccountName": "user1"
}
{
  "id": "35167839-33a9-4792-85c5-14cbfd377659",
  "accountEnabled": true,
  "creationType": "LocalAccount",
  "preferredName": "user1",
  "mail": "user1@example.org",
  "onPremisesSamAccountName": "user1"
}
{
  "id": "3f3dd336-18a6-4548-bec4-77f666f3948d",
  "accountEnabled": true,
  "creationType": "LocalAccount",
  "preferredName": "user1",
  "mail": "user1@example.org",
  "onPremisesSamAccountName": "user1"
}
{
  "id": "fe2252ef-f21d-4ace-a58c-bfb9e6b83d7d",
  "accountEnabled": true,
  "creationType": "LocalAccount",
  "preferredName": "user1",
  "mail": "user1@example.org",
  "onPremisesSamAccountName": "user1"
}
{
  "id": "user1",
  "accountEnabled": true,
  "preferredName": "user1",
  "mail": "user1@example.org",
  "passwordProfile": {
    "password": "$6$r.mDBtA3/VzWPNKp$Xxlhuk3f9WPy8G7iuQbh6CQEqHxN7KntNDQQdU.dD/k1i7BmS91r6.c4yc1X96aZQqN5tFwz2BYeVQ3K4pgu3/"
  },
  "onPremisesSamAccountName": "user1"
}

@exalate-issue-sync
Copy link

Benedikt Kulmann commented: The duplicate accounts are an error creeping into the debugging. I created a PR that prevents creating duplicate accounts in the future. This might help debugging as well. owncloud/ocis-accounts#123

@phil-davis
Copy link

I added this to the "OCIS Server QA/CI Automation" project so that it keeps being noticed by QA. Next step is to get the "no duplicate accounts" code merged... Then we can see if there is more to be done.

@exalate-issue-sync
Copy link

Benedikt Kulmann commented: The no duplicate accounts code refers to owncloud/ocis#587 which is stale because of drone being broken.

@individual-it
Copy link
Member

how to reproduce it manually:

  1. start ocis
  2. create a user called user1 curl -k -XPOST https://localhost:9200/ocs/v2.php/cloud/users?format=json -uadmin:admin -d"username=user1&email=user1%40example.org&userid=user1&password=1234
  3. login as user1 to phoenix
  4. start an action with a lot of requests that will take a while e.g. delete all files or upload a lot of files
  5. while the action is in process delete the user: curl -k -XDELETE https://localhost:9200/ocs/v2.php/cloud/users/user1?format=json -uadmin:admin;
  6. check content of /var/tmp/ocis-accounts/accounts/ => the new account file (with uuid as file-name) was created
  7. create user user1 again and try to login

@kulmann
Copy link
Author

kulmann commented Oct 2, 2020

aha, so the user gets auto-provisioned again from the IDP claims. which can only work if in the proxy:

  1. the oidc middleware successfully finds the user claims (which come fromaccounts, through glauth!!!)
  2. the user gets deleted (your step 5)
  3. the account_uuid middleware doesn't find the user and auto-provisions it. because the claims from oidc are in the context.

@exalate-issue-sync
Copy link

Benedikt Kulmann commented: This is now also the explanation why it popped up after we removed the account cache in the proxy. Previously there was a cache hit for the user that was deleted in the meantime. Now it gets re-provisioned.

@exalate-issue-sync
Copy link

Michael Barz commented: Fixed by owncloud/ocis#648

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants