Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup stern #845

Merged
merged 19 commits into from
Sep 17, 2019
Merged

Cleanup stern #845

merged 19 commits into from
Sep 17, 2019

Conversation

fisx
Copy link
Contributor

@fisx fisx commented Sep 5, 2019

This should not change behavior in any observable way except for better error messages. But it does add more type information to the code that should make mistakes like this one harder in this particular place. We should do that more in general!.

Fixes https://github.com/wearezeta/backend-issues/issues/877 (mostly with ed28786)

jschaul
jschaul previously approved these changes Sep 9, 2019
@jschaul jschaul dismissed their stale review September 9, 2019 17:31

new thoughts

@jschaul
Copy link
Member

jschaul commented Sep 9, 2019

Actually, as also mentioned on the related issue, the reason why this was not a type error in the first place is because booleans were used. However, booleans are not a great way to reflect the semantics of what is happening, as it's more complicated than yes/no. (is the feature generally available on this installation? Is the feature turn on or off by default for new teams? Can the feature be globally overridden via configuration, overriding per-team settings?)
So instead of adding Tagged everywhere - let's use non-boolean types instead?

This helps customer support to understand errors they run into without
consulting the server logs.  Stern is an internal tool, so it's ok to
be a little more generous with information on internal errors.
This may have prevented the bug fixed in
#844.
This will even show in the UI, not just in the code.
@fisx fisx self-assigned this Sep 11, 2019
This reverts commit 88cfe14.

This is ok: we can still keep backoffice from being able to
enable legalhold by turning it off permanently in the options
file in galley.
@fisx fisx force-pushed the fisx/repair-stern branch from eb68632 to 1371976 Compare September 11, 2019 14:01
@fisx
Copy link
Contributor Author

fisx commented Sep 11, 2019

I'm not sure I'm happier with the custom booleans than I was with the Tagged stuff, but one upside is that it now goes all the way up into the UI:

2019-09-11-154547_2048x1152_scrot

@fisx
Copy link
Contributor Author

fisx commented Sep 11, 2019

(updated description)

Right (LegalHoldTeamConfig LegalHoldDisabled) -> pure False
Right (LegalHoldTeamConfig LegalHoldEnabled) -> pure True
Right (LegalHoldTeamConfig LegalHoldDisabled) -> pure SetLegalHoldDisabled
Right (LegalHoldTeamConfig LegalHoldEnabled) -> pure SetLegalHoldEnabled
Left errmsg -> throwE (Error status502 "bad-upstream" ("bad response; error message: " <> pack errmsg))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you get a 4xx error upstream, that should not lead to a 502 here. This would trigger alerts. What do you get when legalhold is permanently-disabled on this endpoint? a 200? Or a 4xx? Is that case properly handled? A nicer message should be shown that explains what is going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an error internal to the backend service family in the sense that the galley end-point always responds with a body that can be parsed as LegalHoldTeamConfig. If it doesn't, this constitutes an invalid response from an upstream server. 502 is the correct status code for that.

If you read this, why do you expect galley to legitimately return anything I'm not handling? And if you suspect that might be the case and want to be extra-thorough (which is great, since you're trying to find mistakes I made), why don't you go and check?

I think it is reasonable to expect the reader of the code to assume the author did the right thing. What you are asking feels like the author should explicitly state for everything he does that he actually thought about it. That's annoying for both author and reader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I made two comments on this PR, and the comment that there is a possible 403 that is not correctly handled is actually related to the features/sso, not features/legalhold, see comment below. The comment here is not applicable, as this endpoint seems to indeed always return a 200, even if the teamId passed in the request doesn't exist. That is another bug, though. (if galley were to properly refuse setting a feature flag for non-existent teams, galley would return a 4xx, and that 4xx would need to be handled in stern here, though - but that is a topic for a future PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is reasonable to expect the reader of the code to assume the author did the right thing. What you are asking feels like the author should explicitly state for everything he does that he actually thought about it. That's annoying for both author and reader.

I admit I was lazy and asked a question before making sure if there is actually a problem. Sorry for that.

Right (SSOTeamConfig SSODisabled) -> pure False
Right (SSOTeamConfig SSOEnabled) -> pure True
Right (SSOTeamConfig SSOEnabled) -> pure SetSSOEnabled
Right (SSOTeamConfig SSODisabled) -> pure SetSSODisabled
Left errmsg -> throwE (Error status502 "bad-upstream" ("bad response; error message: " <> pack errmsg))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - can there be any "expected" responses other than 2xx? If so, please handle them and don't convert them to 502.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an open issue whereby backoffice returns a 500. This comment indicates that this is due to an unhandled 403.That not being handled is in the code path I'm indicating here.

In the current version of deployed stern, this is indeed the case:

curl -XPUT localhost:8080/teams/224de122-d4ae-11e9-8e59-00163e5e6c23/features/sso -d 'false' -H "Content-Type: application/json"

leads to a 500. This 500 is due to, as already @tiago-loureiro pointed out, due to an unhandled 403.

As you can see, in the logs, a 403 from galley is returned:

2019-09-11T16:20:28.11020 |srv=stern|1:I,7:request,1:=,42:stern-d42ada9d-6a59-4f9d-a87a-cbfcf2819345,18:Setting SSO status,
2019-09-11T16:20:29.11107 |srv=stern|1:E,7:request,1:=,42:stern-d42ada9d-6a59-4f9d-a87a-cbfcf2819345,6:remote,1:=,6:galley,4:path,1:=,58:/i/teams/224de122-d4ae-11e9-8e59-00163e5e6c23/features/sso,10:Request-Id,1:=,42:stern-d42ada9d-6a59-4f9d-a87a-cbfcf2819345,12:Content-Type,1:=,16:application/json,1745:HttpExceptionRequest Request {   host                 = "127.0.0.1"   port                 = 8005   secure               = False   requestHeaders       = [("Content-Type","application/json"),("Request-Id","stern-d42ada9d-6a59-4f9d-a87a-cbfcf2819345")]   path                 = "/i/teams/224de122-d4ae-11e9-8e59-00163e5e6c23/features/sso"   queryString          = ""   method               = "PUT"   proxy                = Nothing   rawBody              = False   redirectCount        = 10   responseTimeout      = ResponseTimeoutDefault   requestVersion       = HTTP/1.1 }  (StatusCodeException (Response {responseStatus = Status {statusCode = 403, statusMessage = "Forbidden"}, responseVersion = HTTP/1.1, responseHeaders = [("Transfer-Encoding","chunked"),("Date","Wed, 11 Sep 2019 16:20:27 GMT"),("Server","Warp/3.2.25"),("Content-Type","application/json")], responseBody = (), responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}) "{\"code\":403,\"message\":\"The SSO feature flag is disabled by default.  It can only be enabled once for any team, never disabled.\\n\\nRationale: there are two services in the backend that need to keep their status in sync: galley (teams),\\nand spar (SSO).  Galley keeps track of team features.  If galley creates an idp, the feature flag is\\nchecked.  For authentication, spar avoids this expensive check and assumes that the idp can only have\\nbeen created if the SSO is enabled.  This assumption does not hold any more if the switch is turned off\\nagain, so we do not support this.\\n\\nIt is definitely feasible to change this.  If you have a use case, please contact customer support, or\\nopen an issue on https://github.com/wireapp/wire-server.\",\"label\":\"not-implemented\"}"),
2019-09-11T16:20:29.11136 |srv=stern|1:D,4:code,1:=,3:500,5:label,1:=,8:io-error,7:request,1:=,3:N/A,11:"I/O Error",

On the code on develop, this is converted by the expect2xx to a 500, which is then shown on backoffice.

With the changes in this PR, you removed the expect2xx, and instead convert any responsebody that can't be parsed as a SSOStatus to a 502. Now a 502 will be shown on the backoffice.

The underlying issue, namely that stern doesn't handle the case where galley returns a 403, remains. This is what I meant to point out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry :)

This vaguely makes sense. I'll look into it once more, and come back with a better answer...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I just checked out this branch and ran

curl -XPUT localhost:8091/teams/224de122-d4ae-11e9-8e59-00163e5e6c23/features/sso -d '"SetSSODisabled"' -H "Content-Type: application/json" -v

and I get a 200. Now I'm confused, as I don't understand why this suddenly works now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, okay, there's a 200, but the behaviour is unexpected still:

curl -XPUT localhost:8091/teams/224de122-d4ae-11e9-8e59-00163e5e6c23/features/sso -d '"SetSSODisabled"' -H "Content-Type: application/json" -v
curl  localhost:8091/teams/224de122-d4ae-11e9-8e59-00163e5e6c23/features/sso
"SetSSOEnabled"

So it returns 200 but it hasn't applied the change.

# initially
curl  localhost:8091/teams/9707aad6-d4b6-11e9-9f9f-00163e5e6c23/features/sso
"SetSSODisabled"

# enable
curl -XPUT localhost:8091/teams/9707aad6-d4b6-11e9-9f9f-00163e5e6c23/features/sso -d '"SetSSOEnabled"' -H "Content-Type: application/json" -v
                 
# isEnabled, yes
curl  localhost:8091/teams/9707aad6-d4b6-11e9-9f9f-00163e5e6c23/features/sso
"SetSSOEnabled"

# now disable
curl -XPUT localhost:8091/teams/9707aad6-d4b6-11e9-9f9f-00163e5e6c23/features/sso -d '"SetSSODisabled"' -H "Content-Type: application/json" -v
                  
# now query again
curl  localhost:8091/teams/9707aad6-d4b6-11e9-9f9f-00163e5e6c23/features/sso
"SetSSOEnabled"

Hm, perhaps disabling an already-enabled sso should not return a 200 then, since it's not taken into account?

@fisx fisx requested a review from jschaul September 11, 2019 16:00
@@ -407,11 +404,10 @@ setSSOStatus tid status = do
. paths ["/i/teams", toByteString' tid, "features", "sso"]
. lbytes (encode $ toRequestBody status)
. contentJson
. expect2xx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is the problematic line: galley can throwM disableSsoNotImplemented here, which is now simply ignored with the removal of expect2xx and fails silently, then stern returns 200 even though the status could not be set.

R4m4n and others added 6 commits September 12, 2019 13:50
* Documentation changes for Twilio configurations and TURN set up.
Conflicts:
	services/galley/src/Galley/API/Teams.hs
This reverts commit ed28786.

(it was a bad idea, i think, need to be more sophisticated.)
@fisx
Copy link
Contributor Author

fisx commented Sep 16, 2019

After some testing, I can testify to stern and galley behaving as follows:

Both LH and SSO disabled by default in galley.yaml, I can enable LH, disable it again, and always get the expected setting. When attempting to disable SSO, one gets this error:

{
  "code": 500,
  "message": "RPCException {remote = \"galley\", path = \"/i/teams/b628d64e-93b7-4f92-aa25-16a52ca54c7b/features/sso\", headers = [(\"Content-Type\",\"application/json\"),(\"Request-Id\",\"stern-6a3dbc5e-9c91-4264-be50-37f39dd0a43d\")], cause = HttpExceptionRequest Request {\n  host                 = \"zb1\"\n  port                 = 8085\n  secure               = False\n  requestHeaders       = [(\"Content-Type\",\"application/json\"),(\"Request-Id\",\"stern-6a3dbc5e-9c91-4264-be50-37f39dd0a43d\")]\n  path                 = \"/i/teams/b628d64e-93b7-4f92-aa25-16a52ca54c7b/features/sso\"\n  queryString          = \"\"\n  method               = \"PUT\"\n  proxy                = Nothing\n  rawBody              = False\n  redirectCount        = 10\n  responseTimeout      = ResponseTimeoutDefault\n  requestVersion       = HTTP/1.1\n}\n (StatusCodeException (Response {responseStatus = Status {statusCode = 403, statusMessage = \"Forbidden\"}, responseVersion = HTTP/1.1, responseHeaders = [(\"Transfer-Encoding\",\"chunked\"),(\"Date\",\"Mon, 16 Sep 2019 15:13:36 GMT\"),(\"Server\",\"Warp/3.2.25\"),(\"Content-Type\",\"application/json\")], responseBody = (), responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}) \"{\\\"code\\\":403,\\\"message\\\":\\\"The SSO feature flag is disabled by default.  It can only be enabled once for any team, never disabled.\\\\n\\\\nRationale: there are two services in the backend that need to keep their status in sync: galley (teams),\\\\nand spar (SSO).  Galley keeps track of team features.  If galley creates an idp, the feature flag is\\\\nchecked.  For authentication, spar avoids this expensive check and assumes that the idp can only have\\\\nbeen created if the SSO is enabled.  This assumption does not hold any more if the switch is turned off\\\\nagain, so we do not support this.\\\\n\\\\nIt is definitely feasible to change this.  If you have a use case, please contact customer support, or\\\\nopen an issue on https://github.com/wireapp/wire-server.\\\",\\\"label\\\":\\\"not-implemented\\\"}\")}",
  "label": "io-error"
}

Introduced in 196ac6d. This is not pretty, but arguably helpful (if you are wiling to scroll to the very right of the output). I'm not sure how to improve on this, and am tempted to leave it as is. This should be the behavior for all non-2xx status codes from galley: whatever galley says is exposed to customer support, who can show it to some developer if it doesn't make sense to them.

Enabling SSO also works as expected:

2019-09-16-171947_2048x1152_scrot
2019-09-16-171955_2048x1152_scrot

I think this is fine. It's trivial enough if you look at the overall changes now. @jschaul could you take a (hopefully) last look?

@jschaul
Copy link
Member

jschaul commented Sep 16, 2019

The diff github shows is odd, something might have gone wrong with the merging of develop, not sure, I see all the diffs of the last few commits to develop. Instead of merging develop, rebase makes for cleaner history usually. Anyhow.

  • This PR improves upon the types, which it set out to do (good! I like that part)
  • This PR does not fix issue wearezeta/backend-issues#877 (stated in the description above), as 500s keep being thrown. 500 means a bug in the code, indicate "something is wrong with our servers!" to anyone looking at metrics, and has the potential to create alerts. Some backoffice actions are currently not supported - like disabling SSO - and galley throws a 403 here. Why do you insist to convert that into a 500? I'd suggest a case statusCode response and on a 403, give back a 403 with a nice message saying "You can't do this, not implemented".

@jschaul
Copy link
Member

jschaul commented Sep 16, 2019

(if legalhold is disabled-permanently, galley will also throw an error that then gets converted to a 500 when stern tries to enable it. Also here, catching that, and giving a nice error message, and a non-5xx error message, would be appreciated).

@fisx
Copy link
Contributor Author

fisx commented Sep 16, 2019

The diff github shows is odd, something might have gone wrong with the merging of develop, not sure, I see all the diffs of the last few commits to develop. Instead of merging develop, rebase makes for cleaner history usually. Anyhow

I think that's another bug in github. When you look at git diff -r develop..fisx/repair-sterm, you should see the correct change set. (Or am I missing something?)

Yes to your other suggestions, will fix. Sorry, I can't get excited about status codes, but I'm trying! :)

@fisx fisx requested a review from jschaul September 17, 2019 08:36
(oops: it's been in bilge all along, just not in the exports.)
Copy link
Member

@jschaul jschaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@fisx fisx merged commit ad5cad7 into develop Sep 17, 2019
@fisx fisx deleted the fisx/repair-stern branch September 17, 2019 09:41
@arianvp arianvp mentioned this pull request Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants