-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kibana authentication troubleshooting guide #83914
Comments
Okay, I described all the cases I could remember so far. I'll get back to this issue in a few weeks so that everyone has time to share any other ideas/issues. |
Thanks for putting this together! I agree with a lot of what you said here, and I don't see any glaring omissions.
Would it be possible to use the new
As much as I'd love to deprecate this, I worry that we will end up having to support this in some capacity. |
Yeah, it should allow you to pick any provider.
Right, my suspicion is that many users upgrade Kibana and just keep their legacy authc config and hence don't leverage Login Selector by default. And right now our Telemetry cannot tell us whether it's the case or users explicitly disabled Login Selector. In 8.0.0 when we drop legacy config completely we'll be able to see how many users explicitly disable it. |
We are currently encountering with this issue. Recently we have integrated Azure AD OIDC realms for authentication in elasticsearch and kibana. Our end users who is using kibana is frequently logging out within 5 minutes 1 hour . They need to re-login again providing email id and password and passcode. If we disable/commented session settings from kibana.yml file it is still logging out. We have escalated this issue to elastic engineers. |
If you've escalated this issue to our support team, we'll look into it soon. If you don't have access to our support, then please post this question at our Discuss forum. There much more users like you that can help and probably already solved the problem you have. The GitHub issue isn't the right place to debug issues like that. Having said that, I'm almost sure that you have multiple Kibana instances connected to the same cluster that have different security configurations or something along these lines: https://www.elastic.co/guide/en/kibana/current/production.html#load-balancing-kibana |
Kibana authentication sub-system includes quite a bit of functionality these days and it's not always easy to troubleshoot problems or misconfiguration in this area. This issue is intended to gather the most common issues our users are experiencing with our authentication layer and ideas on how we can help them to troubleshoot these issues.
We can tackle this from two different angles:
Provide a proper troubleshooting guide similar to Common SAML issues guide Elasticsearch team created.
Try to detect possible configuration issues in the code and log appropriate warnings.
Most frequent issues
Inconsistent (autogenerated)
xpack.security.encryptionKey
in Kibana HA setupThis is by far the most common source of confusion. If one instance of Kibana cannot decrypt cookie that was created by another instance the cookie will be cleared.
What we already do:
Generating a random key for xpack.security.encryptionKey. To prevent sessions from being invalidated on restart, please set xpack.security.encryptionKey in the kibana.yml or use the bin/kibana-encryption-keys command.
[debug][server][Kibana][cookie-session-storage][http] Error: Unauthorized
What we can do:
Inconsistent session and authentication settings in Kibana HA setup
Every instance of Kibana schedules a regular session cleanup job to remove sessions that weren't explicitly invalidated. There are number of criteria we use to determine that session can be safely removed, but the most notable are:
That means that if multiple Kibana instances that rely on the same
.kibana-x
index have different session or providers settings then a cleanup job scheduled by one Kibana instance may invalidate sessions created by another instance. By default, a cleanup job is run on startup and every hour after that, so users may experience sporadic logouts that may be hard to debug.What we already do:
Cleaned up 5 invalid or expired session values.
that we can potentially correlate with user logouts. But it's too vague and may not indicate any problem.What we can do:
Multi-tenancy using the same host name, but different ports
Per RFC6265 cookies for a given host are shared across all the ports on that host, even though the usual "same-origin policy" used by web browsers isolates content retrieved via different ports. That means that if you have multiple Kibana tenants (Kibana instances that use different
.kibana-x
indices) that are using the same host name, but different ports then the session cookies will be shared between them.This will lead to sporadic logouts if both tenants are opened in the same browsing context (same browser window) since if one tenant receives a session cookie that references to a session that lives in another tenant then the cookie will be treated as invalid and Kibana will clear it.
The most correct solution is to never host different applications on the same hostname because of a cookie leak. If that's not possible then the workaround is to configure different session cookie names for every tenant with
xpack.security.cookieName
setting.What we already do:
Inconsistent (autogenerated) xpack.security.encryptionKey in Kibana HA setup
. Otherwise, we detect this case and log the following debug message:Session value is not available in the index, session cookie will be invalidated.
What we can do:
Multiple authentication providers without Login Selector
It's still possible to use multiple authentication providers even if Login Selector is disabled. The support is very limited though and we generally discourage our users from that setup. The main reason why we still support this is BWC. There is nothing we can do here, so I'll just outline few notable thing about this setup:
order
will try to authenticate thembasic
ortoken
right now) is configured in addition to other providers (e.g.saml
oroidc
), but theorder
is higher than that of another provider it's still possible to use it to log in even though it's not used automatically. To do this one should log out and go to the/login
route directly.order
instead.What we can do:
Kibana session settings vs access/refresh token expiration
Many of the Kibana authentication providers use Elasticsearch access/refresh tokens under the hood: SAML, OpenID Connect, PKI, Kerberos and Token. And these tokens also have their own expiration settings, that are separate from Kibana's own session expiration settings:
If Kibana's session idle timeout is higher than the expiration time of the underlying access token Kibana will automatically refresh the access token once user becomes active again. But if admin disabled or set Kibana session idle timeout or lifespan higher than 24 hours and user isn't active during this period then underlying refresh token expires and access token cannot be refreshed anymore. Such setup effectively limits Kibana session timeouts to 24 hours.
For example, if Kibana is configured to work with one of the token based authentication providers, and admin wants to disable idle timeout they would do something like this:
But in reality, because of hard-coded 24 hours lifetime of the refresh token, idle timeout will be approximately equal to only 24 hours.
It's even more problematic for the PKI authentication since Elasticsearch doesn't provide refresh token in this case at all effectively limiting idle timeout for the PKI authentication provider to the lifetime of the access token (max 1 hour).
What we already do:
What we can do:
Misconfigured role mappings
It's more of an Elasticsearch issue, but it's usually Kibana where user is finally stuck, so we can try to help to debug this.
What we already do:
You do not have permission to access the requested page
screen that is already a good enough solution.What we can do:
/internal/security/me
endpoint in a troubleshooting guide so that admins can see what roles where exactly applied to a particular user.Misconfigured refresh interval for the security-related indices
Security-related indices (and many other system indices) are very sensible to refresh intervals higher than
1s
as most update operations are issued with await_for_refresh
in order to guarantee concurrent edits.Changing default refresh intervals for the security-related indices is highly discouraged. Typical causes of this are
match all
index templates which set some common settings or mappings to all indices, or if user mistakenly sets a common refresh interval to ALL indices.note This should happen less frequently once #134900 merges (target
8.4.0
).This can lead to significant delays and failures during request authentication. To make sure the security-related indices have proper refresh intervals, you can check
settings
file in the Elastic support diagnostics bundle:@elastic/kibana-security I'll be gradually filling this issue with info I remember, but please feel free to comment here or edit issue description to include issues you know about that I missed.
The text was updated successfully, but these errors were encountered: