Skip to content

Commit

Permalink
Implement option to output information as OpenMetrics time series
Browse files Browse the repository at this point in the history
This new `-o` option will make needrestart output information in a
format that can be scraped by Prometheus or any other daemon that
ingests OpenMetrics format.

The -l, -w and -k options can be used in combination with -o in order to
choose what information gets exported.

Note that the combination of options -ol needs root access in order to
correctly determine which services use outdated libraries.

The kernel and microcode statuses are output as StateSet type metrics
since there are more than one states for each one. This way users can
track the state with more granularity and for example decide to ignore
"unknown" microcode state or "version_upgrade" (e.g. non ABI-compatible
upgrade) kernel state.
For kernel and microcode, there's one Info type metric each that informs
of the currently running vs. the expected newer version.

(Closes: #291)
  • Loading branch information
Gabriel Filion committed Jul 31, 2024
1 parent 94795e5 commit 07fb744
Show file tree
Hide file tree
Showing 2 changed files with 71 additions and 10 deletions.
5 changes: 4 additions & 1 deletion man/needrestart.1
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ needrestart checks which daemons need to be restarted after library upgrades.
.SH USAGE
Usage:
.IP
needrestart [\-(v|q)] [\-n] [\-c <cfg>] [\-r <mode>] [\-f <fe>] [\-u <ui>] [\-(b|p)] [\-kl]
needrestart [\-(v|q)] [\-n] [\-c <cfg>] [\-r <mode>] [\-f <fe>] [\-u <ui>] [\-(b|p|o)] [\-kl]
.TP
\fB\-v\fR
be more verbose
Expand Down Expand Up @@ -49,6 +49,9 @@ enable batch mode
\fB\-p\fR
nagios plugin mode: makes output and exit codes nagios compatible
.TP
\fB\-o\fR
OpenMetrics output mode: output information that can be scraped by OpenMetrics-compatible services. Implies batch mode. By combining with any of `-l`, `-k` and `-w` you can decide whether the metrics will expose outdated libraries, kernel and microcode, respectively. Note that in order to list system-wide outdated libraries, needrestart needs to be run as root. When listing outdated libraries, a gauge-type metric exposes the number of running processes with such outdated libraries. When exposing kernel or microcode metrics, each one will expose a StateSet-type metric which indicates the current status. Kernel and microcode will also each expose an additional Info-type metric which informs of the versions of the current vs expected kernel or microcode.
.TP
\fB\-f\fR <fe>
override debconf(7) frontend, sets the DEBIAN_FRONTEND environment variable to <fe>
.TP
Expand Down
76 changes: 67 additions & 9 deletions needrestart
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ sub HELP_MESSAGE {
print <<USG;
Usage:
needrestart [-vn] [-c <cfg>] [-r <mode>] [-f <fe>] [-u <ui>] [-bkl]
needrestart [-vn] [-c <cfg>] [-r <mode>] [-f <fe>] [-u <ui>] [-(b|p|o)] [-klw]
-v be more verbose
-q be quiet
Expand All @@ -99,6 +99,7 @@ Usage:
a (a)utomatically restart
-b enable batch mode
-p enable nagios plugin mode
-o enable OpenMetrics output mode, implies batch mode, cannot be used simultaneously with -p
-f <fe> override debconf frontend (DEBIAN_FRONTEND, debconf(7))
-t <seconds> tolerate interpreter process start times within this value
-u <ui> use preferred UI package (-u ? shows available packages)
Expand Down Expand Up @@ -183,11 +184,12 @@ our $opt_f;
our $opt_k;
our $opt_l;
our $opt_p;
our $opt_o;
our $opt_q;
our $opt_t;
our $opt_u;
our $opt_w;
unless(getopts('c:vr:nm:bf:klpqt:u:w')) {
unless(getopts('c:vr:nm:bf:klpoqt:u:w')) {
HELP_MESSAGE;
exit 1;
}
Expand Down Expand Up @@ -240,7 +242,8 @@ $opt_r = 'l' if($opt_m eq 'e');
$opt_t = $nrconf{tolerance} unless(defined($opt_t));

$nrconf{defno}++ if($opt_n);
$opt_b++ if($opt_p);
die "Options -p and -o cannot be defined simultaneously\n" if ($opt_p && $opt_o);
$opt_b++ if($opt_p || $opt_o);

needrestart_interp_configure({
perl => {
Expand All @@ -260,6 +263,12 @@ if($uid) {
}

print STDERR "$LOGPREF running in user mode\n" if($nrconf{verbosity} > 1);

# we need to run as root in order to list system-wide outdated libraries
if ($opt_o && $opt_l) {
print STDERR "$LOGPREF OpenMetrics output needs root access to list processes with outdated libraries\n";
exit 1;
}
}
else {
print STDERR "$LOGPREF running in root mode\n" if($nrconf{verbosity} > 1);
Expand Down Expand Up @@ -456,7 +465,13 @@ my %nagios = (
uret => 3,
uperf => q(U),
);
print "NEEDRESTART-VER: $NeedRestart::VERSION\n" if($opt_b && !$opt_p);
print "NEEDRESTART-VER: $NeedRestart::VERSION\n" if($opt_b && !$opt_p && !$opt_o);

my %ometric_kernel_values = (
kresult => q(unknown),
krunning => q(unknown),
kexpected => q(unknown),
);

my %restart;
my %sessions;
Expand Down Expand Up @@ -866,12 +881,12 @@ if(defined($opt_k)) {

if(defined($kresult)) {
if($opt_b) {
unless($opt_p) {
unless($opt_p || $opt_o) {
print "NEEDRESTART-KCUR: $kvars{KVERSION}\n";
print "NEEDRESTART-KEXP: $kvars{EVERSION}\n" if(defined($kvars{EVERSION}));
print "NEEDRESTART-KSTA: $kresult\n";
}
else {
elsif ($opt_p) {
$nagios{kstr} = $kvars{KVERSION};
if($kresult == NRK_VERUPGRADE) {
$nagios{kstr} .= "!=$kvars{EVERSION}";
Expand All @@ -894,6 +909,11 @@ if(defined($opt_k)) {
$nagios{kstr} .= " (!!)";
}
}
elsif ($opt_o) {
$ometric_kernel_values{kresult} = $kresult;
$ometric_kernel_values{krunning} = $kvars{KVERSION};
$ometric_kernel_values{kexpected} = $kvars{EVERSION};
}
}
else {
if($kresult == NRK_NOUPGRADE) {
Expand Down Expand Up @@ -941,7 +961,7 @@ if(defined($opt_k)) {

if($opt_w) {
if($opt_b) {
unless($opt_p) {
unless($opt_p || $opt_o) {
print "NEEDRESTART-UCSTA: $ucode_result\n";
if($ucode_result != NRM_UNKNOWN) {
print "NEEDRESTART-UCCUR: $ucode_vars{CURRENT}\n";
Expand Down Expand Up @@ -1030,7 +1050,7 @@ if(defined($opt_l) && !$uid) {
local $nrconf{systemctl_combine} = 1 unless($opt_r eq 'l');

if($opt_b) {
print "NEEDRESTART-SVC: $rc\n" unless($opt_p);
print "NEEDRESTART-SVC: $rc\n" unless($opt_p || $opt_o);
next;
}

Expand Down Expand Up @@ -1205,7 +1225,7 @@ if(defined($opt_l) && !$uid) {

foreach my $cont (sort { lc($a) cmp lc($b) } keys %conts) {
if($opt_b) {
print "NEEDRESTART-CONT: $cont\n" unless($opt_p);
print "NEEDRESTART-CONT: $cont\n" unless($opt_p || $opt_o);
next;
}

Expand Down Expand Up @@ -1397,6 +1417,44 @@ if($opt_p) {

exit $ret;
}
if ($opt_o) {
print "# TYPE needrestart_build info\n";
print "# HELP needrestart_build information about needrestart's runtime build\n";
print "needrestart_build_info{version=$NeedRestart::VERSION,perl_version=$^V} 1\n";

if ($opt_k) {
my @ometric_kernel_status = map { $_ == $ometric_kernel_values{kresult} ? 1 : 0 } (NRK_NOUPGRADE, NRK_ABIUPGRADE, NRK_VERUPGRADE);
print "# TYPE needrestart_kernel_status stateset\n";
print "# HELP needrestart_kernel_status status of kernel as reported by needrestart\n";
print "needrestart_kernel_status{needrestart_kernel_status=\"current\"} $ometric_kernel_status[0]\n";
print "needrestart_kernel_status{needrestart_kernel_status=\"abi_upgrade\"} $ometric_kernel_status[1]\n";
print "needrestart_kernel_status{needrestart_kernel_status=\"version_upgrade\"} $ometric_kernel_status[2]\n";
print "# TYPE needrestart_kernel info\n";
print "# HELP needrestart_kernel version information for currenly running and most up to date kernels\n";
print "needrestart_kernel_info{running=\"$ometric_kernel_values{krunning}\",expected=\"$ometric_kernel_values{kexpected}\"} 1\n";
}
if ($opt_w) {
my $ometric_ucode_current = $ucode_result != NRM_UNKNOWN ? $ucode_vars{CURRENT} : "unknown";
my $ometric_ucode_expected = $ucode_result != NRM_UNKNOWN ? $ucode_vars{AVAIL} : "unknown";
my @ometric_ucode_status = map { $_ == $ucode_result ? 1 : 0 } (NRM_CURRENT, NRM_OBSOLETE, NRM_UNKNOWN);
print "# TYPE needrestart_ucode_status stateset\n";
print "# HELP needrestart_ucode_status status of the host's CPU microcode as reported by needrestart\n";
print "needrestart_ucode_status{needrestart_ucode_status=\"current\"} $ometric_ucode_status[0]\n";
print "needrestart_ucode_status{needrestart_ucode_status=\"obsolete\"} $ometric_ucode_status[1]\n";
print "needrestart_ucode_status{needrestart_ucode_status=\"unknown\"} $ometric_ucode_status[2]\n";
print "# TYPE needrestart_ucode info\n";
print "# HELP needrestart_ucode version informaion for currently used and available microcode\n";
print "needrestart_ucode_info{running=\"$ometric_ucode_current\",expected=\"$ometric_ucode_expected\"} 1\n";
}
if ($opt_l) {
my $ometric_num_services = scalar %restart;
print "# TYPE needrestart_outdated_services gauge\n";
print "# HELP needrestart_outdated_services number of services requiring a restart\n";
print "needrestart_outdated_services $ometric_num_services\n";
}
print "# EOF\n";
exit 0;
}

if ($opt_b and scalar %sessions) {
for my $sess (@sessions_list) {
Expand Down

0 comments on commit 07fb744

Please sign in to comment.