A Spring Boot application checking the health of various components of the ForgeRock system.
- idam-health-checker
To trigger Travis CI to build and publish the idam-health-checker, tag a commit in Git.
http://<fqdn>:9292/admin/health
Status |
---|
UP |
DOWN |
UNKNOWN |
idam-health-checker
runs via supervisord
on ForgeRock Virtual Machines. The ini
file can be found at
/etc/supervisord.d/healthcheck.ini
. Images created by Packer and Ansible will configure the health
checker to use the live
Spring Profile in addition to the other system specific profiles. This
profile uses logback
with only the Application Insights Appender, aiAppender
. See
src/main/resources/logback-spring.xml for more.
To view console output it's advisable to set the log levels to DEBUG and the Spring Profile as insightconsole
.
# Update /etc/supervisord.d/healthcheck.ini
sudo sed -i'.bak' \
-e 's/WARN/DEBUG/g' \
-e 's/,live/,insightconsole/' /etc/supervisord.d/healthcheck.ini
sudo systemctl restart supervisord
# Attach to process and parse logs
sudo strace -p$(pgrep -f supervisord) -s1000 -e write 2>&1 \
| sed -ur 's/^.+\"(.+)\\n\".+$/\1/;s/\\n//g;s/\[?\\[0-9]{2}\[[0-9]?;?[[0-9]{2}m\]?//g'
# To restore the healthcheck.ini
sudo mv /etc/supervisord.d/healthcheck.ini /etc/supervisord.d/healthcheck.ini.debug
sudo mv /etc/supervisord.d/healthcheck.ini.bak /etc/supervisord.d/healthcheck.ini
sudo systemctl restart supervisord
Application Insights: idam-idam-${environment}
HealthProbe Names for customDimensions:
- AmIsAliveHealthProbe
- AmPasswordGrantHealthProbe
- FileFreshnessProbe
- ReplicationCommandProbe
- IdmPingHealthProbe
- LdapReplicationHealthProbe
- UserStoreAuthenticationHealthProbe
Example for AM isAlive Health Probe.
traces
| where cloud_RoleName contains "health"
| where customDimensions contains "AmIsAliveHealthProbe"
You can enable the details in the healthchecker with -Dmanagement.endpoint.health.show-details="ALWAYS"
.
Details can include information on why something is DOWN and the current healthchecker's version in its JSON output.
External Infrastructure Dependencies
- KeyVault secrets
- openidm-username
- used for idm ldap connectivity check
- openidm-password
- used for idm ldap connectivity check
- test-owner-username
- test-owner-password
- web-admin-client-secret
- BINDPASSWD
- adminUID
- adminPassword
- appinsights-instrumentationkey
- openidm-username
- Managed Identity
Secret Vault Support: Azure KeyVault
Status begins as UNKNOWN
.
The refresh
function is scheduled with checkInterval
.
The initial probe is triggered.
When the probe result is true
, the status will be set to UP
.
When the probe result is false
, the status will be set to DOWN
.
Note: Status changes can be ignored by providing
HealthProbeFailureHandling.IGNORE
to theScheduledHealthProbeIndicator
initialisation. To change the status you must useHealthProbeFailureHandling.MARK_AS_DOWN
.
If the current probe status is UP
or DOWN
and current datetime is after the statusDateTime
+ freshnessInterval
then the probe has expired.
The probe is automatically expired if the Status is UNKNOWN
.
If the current probe status is UP and the probe has not expired, no action is taken.
Example Logs
ScheduledHealthProbeIndicator change of status and status ignored log messages.
2019-12-06 17:29:16,375 INFO ProbeScheduler3 uk.gov.hmcts.reform.idam.health.probe.ScheduledHealthProbeIndicator: <PROBENAME ie. UserStoreAuthenticationHealthProbe>: Status changing from UNKNOWN to UP
2019-12-06 17:29:16,375 INFO ProbeScheduler3 uk.gov.hmcts.reform.idam.health.probe.ScheduledHealthProbeIndicator: <PROBENAME>: Status changing from UP to DOWN
2019-12-06 17:29:16,375 INFO ProbeScheduler3 uk.gov.hmcts.reform.idam.health.probe.ScheduledHealthProbeIndicator: <PROBENAME>: Status changing from DOWN to UP
2019-12-06 17:29:16,375 WARN ProbeScheduler3 uk.gov.hmcts.reform.idam.health.probe.ScheduledHealthProbeIndicator: <PROBENAME>: DOWN state ignored
A description of the health check probes for ForgeRock AM.
Health check description of isAlive status for ForgeRock AM. This probe is enabled when the Spring Profile, am
, is active.
External Service Dependencies
- DS Userstore
- DS Config/Tokenstore.
- IDM
Probe Actions
- Request
isAlive.jsp
. - Assert the response contains
Server is ALIVE.
.
Probe Configuration
Spring Configuration Property: am.root
& am.healthprobe.isAlive
- Response does not contain
Server is ALIVE.
. - Java exception.
Example Logs
{"LoggerName":"uk.gov.hmcts.reform.idam.health.am.AmIsAliveHealthProbe","ThreadName":"ProbeScheduler2","LoggingLevel":"ERROR","SourceType":"LOGBack","TimeStamp":"Wed, 11 Dec 2019 14:55:58 GMT","message":"AM IsAlive: response did not contain expected value"}
Health check description of password grant probe for ForgeRock AM, which asserts that the password grant returns an access token. This probe is enabled when am
is active however it will ignore status changes due to HealthProbeFailureHandling.IGNORE
.
External Service Dependencies
- DS Userstore
- DS Config/Tokenstore.
- IDM
External Configuration Dependencies
- AgentProperties
name
andsecret
which can be found in Spring Propertiesweb.admin.client
. - DS Config/Tokenstore.
- IDM
Probe Actions
- WIP
Probe Configuration
Spring Configuration Property: am.root
, am.healthprobe.passwordGrant
& am.healthprobe.identity
.
- WIP
Example Logs
WIP
A description of the health check probes for ForgeRock IDM.
Health check description of annonymous ping status for ForgeRock IDM. This probe is enabled when the Spring Profile, am
, is active.
External Service Dependencies
- DS Userstore
- DS Config/Tokenstore.
- IDM
Probe Actions
- Request
isAlive.jsp
. - Assert the response contains
Server is ALIVE.
. - Request
/openidm/config/provisioner.openicf/ldap?_fields=enabled
- Assert the response contains
enabled: true
assuring IDM is connected to LDAP
Probe Configuration
Spring Configuration Property: am.root
& am.healthprobe.isAlive
- Response does not contain
Server is ALIVE.
. - Java exception.
Example Logs
{"LoggerName":"uk.gov.hmcts.reform.idam.health.am.AmIsAliveHealthProbe","ThreadName":"ProbeScheduler2","LoggingLevel":"ERROR","SourceType":"LOGBack","TimeStamp":"Wed, 11 Dec 2019 14:55:58 GMT","message":"AM IsAlive: response did not contain expected value"}
Health check description of password grant probe for ForgeRock AM, which asserts that the password grant returns an access token. This probe is enabled when am
is active however it will ignore status changes due to HealthProbeFailureHandling.IGNORE
.
External Service Dependencies
- DS Userstore
- DS Config/Tokenstore.
- IDM
External Configuration Dependencies
- AgentProperties
name
andsecret
which can be found in Spring Propertiesweb.admin.client
. - DS Config/Tokenstore.
- IDM
Probe Actions
- WIP
Probe Configuration
Spring Configuration Property: am.root
, am.healthprobe.passwordGrant
& am.healthprobe.identity
.
- WIP
Example Logs
WIP
A description of the health check probes for ForgeRock DS.
Health check description for LDAP replication using LDAP attributes query for ForgeRock DS. This probe is enabled when userstore
or tokenstore
is active and single
is inactive.
External Config Dependencies
cn=Directory Manager
cn=Replication,cn=monitor
- KeyVault secrets
- BINDPASSWD
Probe Actions
- Query LDAP Replication Monitor for List of Attributes
- Parse and Categorise LDAP Responses into Types
- Check Typed Responses for Replay Errors
- Check Typed Responses for Missing Changes
- Check Typed Responses for Delays
Probe Configuration
Spring Configuration Property: ldap
Attributes
LDAP Attribute | Log Mapping | Description |
---|---|---|
status | status | |
pending-updates | pending | |
missing-changes | missing | |
approximate-delay | delay | |
sent-updates | sent | |
recieved-updates | recieved | |
replayed-updates | replayed |
Record Types
Record Type | Description |
---|---|
LOCAL_DS | Record describes the local Directory Server |
LOCAL_RS | Record describes the local Replication Server |
LOCAL_RS_CONN_DS | Record describes the local Replication Server and connected Directory Server |
REMOTE_CONN_RS | Record describes the local Replication Server and connected Replication Server |
REMOTE_CONN_RS_CONN_DS | Record describes the local Replication Server, connected Replication Server and connected Directory Server |
UNKNOWN | Record does not match any of the above |
- Failed replay check
- If the
received-updates
minus thereplayed-updates
is less than or equal to the missing updates threshold.
- If the
- Missing changes
- If the
missing-changes
are greater than the missing updates threshold.
- If the
- Failed delay check
- If the
approximate-delay
is less than or equal to the approximate delay threshold.
- If the
Example Logs
2019-12-09 14:08:26,443 INFO ProbeScheduler3 uk.gov.hmcts.reform.idam.health.command.ReplicationCommandProbe: Configuring with command /opt/opendj/bin/dsreplication status -X --adminUID %s --adminPassword %s --port 4444 -s -n and password value from properties
2019-12-09 14:08:26,539 INFO ProbeScheduler2 uk.gov.hmcts.reform.idam.health.ldap.LdapReplicationHealthProbe: LDAP Replication: LOCAL_DS okay, ds:forgerock-ds-userstore.service.core-compute-idam-perftest.internal:50096,status:Normal,pending:0,missing:-1,sent:31206,received:30015,replayed:30015,delay:-1
2019-12-09 14:08:26,540 INFO ProbeScheduler2 uk.gov.hmcts.reform.idam.health.ldap.LdapReplicationHealthProbe: LDAP Replication: LOCAL_RS okay, rs:forgerock-ds-userstore-idam-perftest000002.service.core-compute-idam-perftest.internal:8989,status:null,pending:-1,missing:4696949,sent:-1,received:-1,replayed:-1,delay:-1
2019-12-09 14:08:26,541 INFO ProbeScheduler2 uk.gov.hmcts.reform.idam.health.ldap.LdapReplicationHealthProbe: LDAP Replication: LOCAL_RS_CONN_DS okay, rs:forgerock-ds-userstore-idam-perftest000002.service.core-compute-idam-perftest.internal:8989,connected-ds:forgerock-ds-userstore.service.core-compute-idam-perftest.internal:50096,status:null,pending:-1,missing:0,sent:30015,received:31206,replayed:-1,delay:0
2019-12-09 14:08:26,542 INFO ProbeScheduler2 uk.gov.hmcts.reform.idam.health.ldap.LdapReplicationHealthProbe: LDAP Replication: REMOTE_CONN_RS okay, rs:forgerock-ds-userstore-idam-perftest000002.service.core-compute-idam-perftest.internal:8989,connected-rs:forgerock-ds-userstore-idam-perftest000001.service.core-compute-idam-perftest.internal:8989,status:null,pending:-1,missing:4696949,sent:31205,received:30012,replayed:-1,delay:-1
2019-12-09 14:08:26,544 INFO ProbeScheduler2 uk.gov.hmcts.reform.idam.health.ldap.LdapReplicationHealthProbe: LDAP Replication: REMOTE_CONN_RS_CONN_DS okay, rs:forgerock-ds-userstore-idam-perftest000002.service.core-compute-idam-perftest.internal:8989,connected-rs:forgerock-ds-userstore-idam-perftest000001.service.core-compute-idam-perftest.internal:8989,connected-ds:forgerock-ds-userstore.service.core-compute-idam-perftest.internal:33920,status:null,pending:-1,missing:0,sent:-1,received:-1,replayed:-1,delay:0
Health check description for ForgeRock DS replication through the dsreplication
command line tool. This probe is enabled when userstore
, tokenstore
or replication
is active and single
is inactive.
External Config Dependencies
- KeyVault secrets
- adminUID
- adminPassword
ForgeRock 5.5 Example Command Response
Suffix DN : Server : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : M.C. (2) : A.O.M.C. (3) : Security (4)
--------------------------:-------------------------------------------------:---------:---------------------:-------:-------:-------------:----------:--------------:-------------
dc=reform,dc=hmcts,dc=net : forgerock-ds-userstore-idam-perftest000001:4444 : 498270 : true : 27069 : 13539 : 8989 : 0 : : true
dc=reform,dc=hmcts,dc=net : forgerock-ds-userstore-idam-perftest000002:4444 : 498270 : true : 10519 : 5416 : 8989 : 0 : : true
dc=reform,dc=hmcts,dc=net : forgerock-ds-userstore-idam-perftest000004:4444 : 498270 : true : 15309 : 16214 : 8989 : 0 : : true
[1] The port used to communicate between the servers whose contents are being
replicated.
[2] The number of changes that are still missing on this server (and that have
been applied to at least one of the other servers).
[3] Age of oldest missing change: the date on which the oldest change that has
not arrived on this server was generated.
[4] Whether the replication communication through the replication port is
encrypted or not.
ForgeRock 6.5 Example Command Response
Suffix DN : Server : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : Delay (ms) : Security (2)
--------------------------:--------------------------------------------------------------------------------------:---------:---------------------:-------:-------:-------------:------------:-------------
dc=reform,dc=hmcts,dc=net : forgerock-ds-tokenstore-idam-saat000002.service.core-compute-idam-saat.internal:4444 : 827 : true : 2615 : 21251 : 8989 : 0 : true
dc=reform,dc=hmcts,dc=net : forgerock-ds-tokenstore-idam-saat000004.service.core-compute-idam-saat.internal:4444 : 827 : true : 10324 : 21828 : 8989 : 0 : true
dc=reform,dc=hmcts,dc=net : forgerock-ds-tokenstore-idam-saat000005.service.core-compute-idam-saat.internal:4444 : 827 : true : 3056 : 6463 : 8989 : 0 : true
[1] The port used to communicate between the servers whose contents are being
replicated.
[2] Whether the replication communication through the replication port is
encrypted or not.
Probe Actions
- Get the replication status by running the specified command.
- Identify the current host and other replication servers in the replication information.
- Verify host replication by checking the missing changes are not greater than the missing updates threshold.
- Find the max entries count from the command output stream. Compare this to the current number of entries on the current host.
- Handle command execution errors & Java exceptions.
Probe Configuration
Spring Configuration Property Object: replication.healthprobe
Setting the Missing Updates Threshold
replication.healthprobe.command.missing-updates-threshold
- Missing Changes are Greater than the Threshold
- Compare Entries between Replication Servers
- If the host entries count is less than the max or less than the (max entries -
entryDifferenceThreshold
)
- If the host entries count is less than the max or less than the (max entries -
- Replication Command Errors
- Java Exception
Example Logs
2019-12-09 14:08:29,677 INFO ProbeScheduler3 uk.gov.hmcts.reform.idam.health.command.ReplicationCommandProbe: ReplicationCommand: Host replication info: ReplicationInfo(suffix=dc=reform,dc=hmcts,dc=net, hostName=forgerock-ds-userstore-idam-perftest000002.service.core-compute-idam-perftest.internal:4444, entries=498269, replicationEnabled=true, dsID=10519, rsId=5416, rsPort=8989, missingChanges=0, ageOfMissingChanges=null, securityEnabled=true)
Replicated Host ReplicationInfo
2019-12-09 14:08:29,679 INFO ProbeScheduler3 uk.gov.hmcts.reform.idam.health.command.ReplicationCommandProbe: ReplicationCommand: Replicated host: ReplicationInfo(suffix=dc=reform,dc=hmcts,dc=net, hostName=forgerock-ds-userstore-idam-perftest000001.service.core-compute-idam-perftest.internal:4444, entries=498266, replicationEnabled=true, dsID=27069, rsId=13539, rsPort=8989, missingChanges=0, ageOfMissingChanges=null, securityEnabled=true)
Health check description for ForgeRock DS User Store using LDAP queries. This probe is enabled when userstore
is active.
External Config Dependencies
cn=Directory Manager
- KeyVault
- test-owner-username
- test-owner-password
Probe Actions
Probe Configuration
Spring Property Configuration Object: userstore.healthprobe
Spring Profile | ScheduledHealthProbeIndicator Configuration |
---|---|
single | HealthProbeFailureHandling.IGNORE |
userstore | HealthProbeFailureHandling.MARK_AS_DOWN |
- Test User Does Not Exist
- Confirm KeyVault secret
test-owner-username
.
- Confirm KeyVault secret
- Failed Authentication with Test User
- Confirm KeyVault secret
test-owner-password
. - Update the password from IDM if required.
Note: You may need to set the ds-userstore healthcheck profile to
optimist,live
to make the LDAP MODIFY successful.
- Confirm KeyVault secret
- Exceptions
- Connection.
- etc.
Example Logs
UserStoreAuthenticationHealthProbe was successful.
2019-12-09 11:30:25,926 INFO ProbeScheduler3 uk.gov.hmcts.reform.idam.health.userstore.UserStoreAuthenticationHealthProbe: UserStore Auth: success
UserStoreAuthenticationHealthProbe LDAP query for test-owner-username
returned empty. The user does not exist.
2019-12-09 11:30:25,926 WARN ProbeScheduler3 uk.gov.hmcts.reform.idam.health.userstore.UserStoreAuthenticationHealthProbe: UserStore Auth: test user does not exist
UserStoreAuthenticationHealthProbe LDAP authentication failed for credentials test-owner-username
& test-owner-password
.
2019-12-06 17:29:16,375 ERROR ProbeScheduler3 uk.gov.hmcts.reform.idam.health.userstore.UserStoreAuthenticationHealthProbe: UserStore Auth: authentication failed for filter (uid=idam@test.localhost)
UserStoreAuthenticationHealthProbe encountered a communication exception. Connection to LDAPS was refused.
2019-12-06 17:29:12,854 ERROR ProbeScheduler3 uk.gov.hmcts.reform.idam.health.userstore.UserStoreAuthenticationHealthProbe: UserStore Auth: forgerock-ds-userstore.service.core-compute-idam-saat.internal:1639; nested exception is javax.naming.CommunicationException: forgerock-ds-userstore.service.core-compute-idam-saat.internal:1639 [Root exception is java.net.ConnectException: Connection refused (Connection refused)] [CommunicationException]
Health check description for ForgeRock DS Token Store using LDAP queries. This probe is enabled when tokenstore
is active.
External Config Dependencies
cn=schema providers,cn=config
- KeyVault
- test-owner-username
- test-owner-password
Probe Actions
- Query LDAP for any object in
cn=schema providers,cn=config
.
Probe Configuration
Spring Property Configuration Object: tokenstore.healthprobe
Spring Profile | ScheduledHealthProbeIndicator Configuration |
---|---|
single | HealthProbeFailureHandling.IGNORE |
tokenstore | HealthProbeFailureHandling.MARK_AS_DOWN |
- Empty Response for LDAP Search
- Exception
Example Logs
2019-12-09 10:30:04,541 INFO ProbeScheduler3 uk.gov.hmcts.reform.idam.health.tokenstore.TokenStoreSearchHealthProbe: TokenStore Search: success
2019-12-09 10:30:04,541 ERROR ProbeScheduler3 uk.gov.hmcts.reform.idam.health.tokenstore.TokenStoreSearchHealthProbe: TokenStore Search: response is empty
Health check description for ForgeRock DS backups. This probe is enabled when backup
is active.
External Config Dependencies
Probe Actions
Probe Configuration
Example Logs