Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate and Address Time Drift on pdc-describe-prod3 #5512

Open
1 of 3 tasks
kayiwa opened this issue Nov 5, 2024 · 0 comments
Open
1 of 3 tasks

Investigate and Address Time Drift on pdc-describe-prod3 #5512

kayiwa opened this issue Nov 5, 2024 · 0 comments
Assignees

Comments

@kayiwa
Copy link
Member

kayiwa commented Nov 5, 2024

Level of urgency

  • High
  • Moderate
  • Low

Description:

Check_MK has raised a CRITICAL alert for time drift on the pdc-describe-prod3 server. This indicates a potential issue with time synchronization, which could affect various system functions and applications that rely on accurate time.

Alert Details:

  • Host: pdc-describe-prod3 (IP: 128.112.202.144)
  • Service: Systemd Timesyncd Time
  • State: CRITICAL
  • Additional Info:
    • Offset: 559 microseconds
    • Time since last sync: 55 seconds
    • Time since last NTPMessage: 55 seconds
    • Stratum: 2.00
    • Jitter: 648 milliseconds (warn/crit at 200 milliseconds/500 milliseconds) (!!).
    • Synchronized on 128.112.129.7

Tasks:

  1. Investigate the Cause:
    • Check the system logs on pdc-describe-prod3 for any errors related to time synchronization (timesyncd, ntp).
    • Verify network connectivity between pdc-describe-prod3 and the NTP server (128.112.129.7).
    • Examine the configuration of timesyncd (/etc/systemd/timesyncd.conf) to ensure it's correctly configured to use the specified NTP server.
    • Investigate potential resource issues (CPU, memory, network) on pdc-describe-prod3 that might be interfering with time synchronization.
  2. Address the Issue:
    • Based on the investigation, take appropriate corrective action. This may involve:
      • Restarting the timesyncd service.
      • Adjusting the timesyncd configuration.
      • Addressing network connectivity problems.
      • Resolving resource constraints on the server.
      • Switching to a different NTP server or pool if the current one is unreliable.
  3. Monitor and Verify:
    • After taking corrective action, monitor the server to ensure that time synchronization is stable and the alert does not reoccur.
    • Verify the accuracy of the system time using the timedatectl timesync-status command.

Impact:

Inaccurate time can lead to various problems, including:

  • Authentication issues (Kerberos, SSL certificates)
  • Log file inconsistencies
  • Scheduled task failures
  • Application errors
  • Data integrity issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants