-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup stern #845
Cleanup stern #845
Conversation
Actually, as also mentioned on the related issue, the reason why this was not a type error in the first place is because booleans were used. However, booleans are not a great way to reflect the semantics of what is happening, as it's more complicated than yes/no. (is the feature generally available on this installation? Is the feature turn on or off by default for new teams? Can the feature be globally overridden via configuration, overriding per-team settings?) |
This helps customer support to understand errors they run into without consulting the server logs. Stern is an internal tool, so it's ok to be a little more generous with information on internal errors.
This may have prevented the bug fixed in #844.
This will even show in the UI, not just in the code.
eb68632
to
1371976
Compare
(updated description) |
Right (LegalHoldTeamConfig LegalHoldDisabled) -> pure False | ||
Right (LegalHoldTeamConfig LegalHoldEnabled) -> pure True | ||
Right (LegalHoldTeamConfig LegalHoldDisabled) -> pure SetLegalHoldDisabled | ||
Right (LegalHoldTeamConfig LegalHoldEnabled) -> pure SetLegalHoldEnabled | ||
Left errmsg -> throwE (Error status502 "bad-upstream" ("bad response; error message: " <> pack errmsg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you get a 4xx error upstream, that should not lead to a 502 here. This would trigger alerts. What do you get when legalhold is permanently-disabled
on this endpoint? a 200? Or a 4xx? Is that case properly handled? A nicer message should be shown that explains what is going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an error internal to the backend service family in the sense that the galley end-point always responds with a body that can be parsed as LegalHoldTeamConfig
. If it doesn't, this constitutes an invalid response from an upstream server. 502 is the correct status code for that.
If you read this, why do you expect galley to legitimately return anything I'm not handling? And if you suspect that might be the case and want to be extra-thorough (which is great, since you're trying to find mistakes I made), why don't you go and check?
I think it is reasonable to expect the reader of the code to assume the author did the right thing. What you are asking feels like the author should explicitly state for everything he does that he actually thought about it. That's annoying for both author and reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I made two comments on this PR, and the comment that there is a possible 403 that is not correctly handled is actually related to the features/sso
, not features/legalhold
, see comment below. The comment here is not applicable, as this endpoint seems to indeed always return a 200, even if the teamId passed in the request doesn't exist. That is another bug, though. (if galley were to properly refuse setting a feature flag for non-existent teams, galley would return a 4xx, and that 4xx would need to be handled in stern here, though - but that is a topic for a future PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is reasonable to expect the reader of the code to assume the author did the right thing. What you are asking feels like the author should explicitly state for everything he does that he actually thought about it. That's annoying for both author and reader.
I admit I was lazy and asked a question before making sure if there is actually a problem. Sorry for that.
Right (SSOTeamConfig SSODisabled) -> pure False | ||
Right (SSOTeamConfig SSOEnabled) -> pure True | ||
Right (SSOTeamConfig SSOEnabled) -> pure SetSSOEnabled | ||
Right (SSOTeamConfig SSODisabled) -> pure SetSSODisabled | ||
Left errmsg -> throwE (Error status502 "bad-upstream" ("bad response; error message: " <> pack errmsg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - can there be any "expected" responses other than 2xx? If so, please handle them and don't convert them to 502.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an open issue whereby backoffice returns a 500. This comment indicates that this is due to an unhandled 403.That not being handled is in the code path I'm indicating here.
In the current version of deployed stern, this is indeed the case:
curl -XPUT localhost:8080/teams/224de122-d4ae-11e9-8e59-00163e5e6c23/features/sso -d 'false' -H "Content-Type: application/json"
leads to a 500. This 500 is due to, as already @tiago-loureiro pointed out, due to an unhandled 403.
As you can see, in the logs, a 403
from galley
is returned:
2019-09-11T16:20:28.11020 |srv=stern|1:I,7:request,1:=,42:stern-d42ada9d-6a59-4f9d-a87a-cbfcf2819345,18:Setting SSO status,
2019-09-11T16:20:29.11107 |srv=stern|1:E,7:request,1:=,42:stern-d42ada9d-6a59-4f9d-a87a-cbfcf2819345,6:remote,1:=,6:galley,4:path,1:=,58:/i/teams/224de122-d4ae-11e9-8e59-00163e5e6c23/features/sso,10:Request-Id,1:=,42:stern-d42ada9d-6a59-4f9d-a87a-cbfcf2819345,12:Content-Type,1:=,16:application/json,1745:HttpExceptionRequest Request { host = "127.0.0.1" port = 8005 secure = False requestHeaders = [("Content-Type","application/json"),("Request-Id","stern-d42ada9d-6a59-4f9d-a87a-cbfcf2819345")] path = "/i/teams/224de122-d4ae-11e9-8e59-00163e5e6c23/features/sso" queryString = "" method = "PUT" proxy = Nothing rawBody = False redirectCount = 10 responseTimeout = ResponseTimeoutDefault requestVersion = HTTP/1.1 } (StatusCodeException (Response {responseStatus = Status {statusCode = 403, statusMessage = "Forbidden"}, responseVersion = HTTP/1.1, responseHeaders = [("Transfer-Encoding","chunked"),("Date","Wed, 11 Sep 2019 16:20:27 GMT"),("Server","Warp/3.2.25"),("Content-Type","application/json")], responseBody = (), responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}) "{\"code\":403,\"message\":\"The SSO feature flag is disabled by default. It can only be enabled once for any team, never disabled.\\n\\nRationale: there are two services in the backend that need to keep their status in sync: galley (teams),\\nand spar (SSO). Galley keeps track of team features. If galley creates an idp, the feature flag is\\nchecked. For authentication, spar avoids this expensive check and assumes that the idp can only have\\nbeen created if the SSO is enabled. This assumption does not hold any more if the switch is turned off\\nagain, so we do not support this.\\n\\nIt is definitely feasible to change this. If you have a use case, please contact customer support, or\\nopen an issue on https://github.com/wireapp/wire-server.\",\"label\":\"not-implemented\"}"),
2019-09-11T16:20:29.11136 |srv=stern|1:D,4:code,1:=,3:500,5:label,1:=,8:io-error,7:request,1:=,3:N/A,11:"I/O Error",
On the code on develop, this is converted by the expect2xx
to a 500, which is then shown on backoffice.
With the changes in this PR, you removed the expect2xx
, and instead convert any responsebody that can't be parsed as a SSOStatus to a 502. Now a 502 will be shown on the backoffice.
The underlying issue, namely that stern doesn't handle the case where galley returns a 403, remains. This is what I meant to point out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry :)
This vaguely makes sense. I'll look into it once more, and come back with a better answer...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I just checked out this branch and ran
curl -XPUT localhost:8091/teams/224de122-d4ae-11e9-8e59-00163e5e6c23/features/sso -d '"SetSSODisabled"' -H "Content-Type: application/json" -v
and I get a 200. Now I'm confused, as I don't understand why this suddenly works now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, okay, there's a 200, but the behaviour is unexpected still:
curl -XPUT localhost:8091/teams/224de122-d4ae-11e9-8e59-00163e5e6c23/features/sso -d '"SetSSODisabled"' -H "Content-Type: application/json" -v
curl localhost:8091/teams/224de122-d4ae-11e9-8e59-00163e5e6c23/features/sso
"SetSSOEnabled"
So it returns 200 but it hasn't applied the change.
# initially
curl localhost:8091/teams/9707aad6-d4b6-11e9-9f9f-00163e5e6c23/features/sso
"SetSSODisabled"
# enable
curl -XPUT localhost:8091/teams/9707aad6-d4b6-11e9-9f9f-00163e5e6c23/features/sso -d '"SetSSOEnabled"' -H "Content-Type: application/json" -v
# isEnabled, yes
curl localhost:8091/teams/9707aad6-d4b6-11e9-9f9f-00163e5e6c23/features/sso
"SetSSOEnabled"
# now disable
curl -XPUT localhost:8091/teams/9707aad6-d4b6-11e9-9f9f-00163e5e6c23/features/sso -d '"SetSSODisabled"' -H "Content-Type: application/json" -v
# now query again
curl localhost:8091/teams/9707aad6-d4b6-11e9-9f9f-00163e5e6c23/features/sso
"SetSSOEnabled"
Hm, perhaps disabling an already-enabled sso should not return a 200 then, since it's not taken into account?
@@ -407,11 +404,10 @@ setSSOStatus tid status = do | |||
. paths ["/i/teams", toByteString' tid, "features", "sso"] | |||
. lbytes (encode $ toRequestBody status) | |||
. contentJson | |||
. expect2xx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is the problematic line: galley can throwM disableSsoNotImplemented
here, which is now simply ignored with the removal of expect2xx
and fails silently, then stern returns 200 even though the status could not be set.
* Documentation changes for Twilio configurations and TURN set up.
Conflicts: services/galley/src/Galley/API/Teams.hs
This reverts commit ed28786. (it was a bad idea, i think, need to be more sophisticated.)
After some testing, I can testify to stern and galley behaving as follows: Both LH and SSO disabled by default in galley.yaml, I can enable LH, disable it again, and always get the expected setting. When attempting to disable SSO, one gets this error: {
"code": 500,
"message": "RPCException {remote = \"galley\", path = \"/i/teams/b628d64e-93b7-4f92-aa25-16a52ca54c7b/features/sso\", headers = [(\"Content-Type\",\"application/json\"),(\"Request-Id\",\"stern-6a3dbc5e-9c91-4264-be50-37f39dd0a43d\")], cause = HttpExceptionRequest Request {\n host = \"zb1\"\n port = 8085\n secure = False\n requestHeaders = [(\"Content-Type\",\"application/json\"),(\"Request-Id\",\"stern-6a3dbc5e-9c91-4264-be50-37f39dd0a43d\")]\n path = \"/i/teams/b628d64e-93b7-4f92-aa25-16a52ca54c7b/features/sso\"\n queryString = \"\"\n method = \"PUT\"\n proxy = Nothing\n rawBody = False\n redirectCount = 10\n responseTimeout = ResponseTimeoutDefault\n requestVersion = HTTP/1.1\n}\n (StatusCodeException (Response {responseStatus = Status {statusCode = 403, statusMessage = \"Forbidden\"}, responseVersion = HTTP/1.1, responseHeaders = [(\"Transfer-Encoding\",\"chunked\"),(\"Date\",\"Mon, 16 Sep 2019 15:13:36 GMT\"),(\"Server\",\"Warp/3.2.25\"),(\"Content-Type\",\"application/json\")], responseBody = (), responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}) \"{\\\"code\\\":403,\\\"message\\\":\\\"The SSO feature flag is disabled by default. It can only be enabled once for any team, never disabled.\\\\n\\\\nRationale: there are two services in the backend that need to keep their status in sync: galley (teams),\\\\nand spar (SSO). Galley keeps track of team features. If galley creates an idp, the feature flag is\\\\nchecked. For authentication, spar avoids this expensive check and assumes that the idp can only have\\\\nbeen created if the SSO is enabled. This assumption does not hold any more if the switch is turned off\\\\nagain, so we do not support this.\\\\n\\\\nIt is definitely feasible to change this. If you have a use case, please contact customer support, or\\\\nopen an issue on https://github.com/wireapp/wire-server.\\\",\\\"label\\\":\\\"not-implemented\\\"}\")}",
"label": "io-error"
} Introduced in 196ac6d. This is not pretty, but arguably helpful (if you are wiling to scroll to the very right of the output). I'm not sure how to improve on this, and am tempted to leave it as is. This should be the behavior for all non-2xx status codes from galley: whatever galley says is exposed to customer support, who can show it to some developer if it doesn't make sense to them. Enabling SSO also works as expected: I think this is fine. It's trivial enough if you look at the overall changes now. @jschaul could you take a (hopefully) last look? |
The diff github shows is odd, something might have gone wrong with the merging of develop, not sure, I see all the diffs of the last few commits to develop. Instead of merging develop, rebase makes for cleaner history usually. Anyhow.
|
(if legalhold is disabled-permanently, galley will also throw an error that then gets converted to a 500 when stern tries to enable it. Also here, catching that, and giving a nice error message, and a non-5xx error message, would be appreciated). |
I think that's another bug in github. When you look at Yes to your other suggestions, will fix. Sorry, I can't get excited about status codes, but I'm trying! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
This should not change behavior in any observable way except for better error messages. But it does add more type information to the code that should make mistakes like this one harder in this particular place. We should do that more in general!.
Fixes https://github.com/wearezeta/backend-issues/issues/877 (mostly with ed28786)