Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default metrics #24

Merged
merged 7 commits into from
Nov 24, 2023
Merged

Conversation

ToMe25
Copy link
Contributor

@ToMe25 ToMe25 commented Sep 22, 2023

This PR adds the default collectors to the metrics endpoint.
These collectors are added to the default registry by default, and collect some data about the current process.
However since this exporter uses a custom registry, it didn't use them until now.

Using these collectors can be disabled using --web.disable-exporter-metrics, like in the node_exporter.

I would have liked to solve rpi_exporter.go:128-145 somewhat more elegantly, but I couldn't find a way to do that.

This PR is based on #23.
Since they both required the same base changes, and I didn't want to implement those twice, all commits from #23 are also included in this PR.
I hope this means I wont have to rebase this if #23 gets merged, but maybe I will have to, idk.

@ToMe25
Copy link
Contributor Author

ToMe25 commented Nov 22, 2023

By now my Node Exporter PR(prometheus/node_exporter#2808) containing the same changes as rpi-exporter.go:128-145 is merged(and released), so I think that solution is "good enough".

@lukasmalkmus
Copy link
Owner

Can you rebase on main master, pls? :)

@lukasmalkmus lukasmalkmus self-assigned this Nov 24, 2023
@lukasmalkmus lukasmalkmus self-requested a review November 24, 2023 10:56
@ToMe25
Copy link
Contributor Author

ToMe25 commented Nov 24, 2023

Whoops, looks like GitHubs web tool merges, rather than rebasing.
I will rebase the original Commits and force-push later today, since I don't have access to the device I used for this RN.

@lukasmalkmus
Copy link
Owner

No stress. If the diff view is cleaned up, that's fine as well.

Looks like this PR also updated a dependency? Prometheus SDK apparently. At some point, I have to give this repo a makeover that utilises Go modules... 😅

Copy link
Owner

@lukasmalkmus lukasmalkmus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ToMe25
Copy link
Contributor Author

ToMe25 commented Nov 24, 2023

I don't think this PR updates a dependency, it does however add one.
The collectors part of the prometheus client lib is now a required dependency.

@lukasmalkmus lukasmalkmus merged commit 781baa6 into lukasmalkmus:master Nov 24, 2023
@lukasmalkmus
Copy link
Owner

lukasmalkmus commented Nov 24, 2023

@ToMe25 Do you mind giving the main branch a run? If you're happy, I'll stage a release (that unfortunately is a rather manual process that involves the Prometheus tooling (promu) which I rarely use. So need to figure that out first.

Really time to port this to Go modules and GitHub actions...

@lukasmalkmus
Copy link
Owner

I don't think this PR updates a dependency, it does however add one. The collectors part of the prometheus client lib is now a required dependency.

Ups yeah. Totally fine, tho.

@ToMe25
Copy link
Contributor Author

ToMe25 commented Nov 24, 2023

@ToMe25 Do you mind giving the main branch a run? If you're happy, I'll stage a release (that unfortunately is a rather manual process that involves the Prometheus tooling (promu) which I rarely use. So need to figure that out first.

Really time to port this to Go modules and GitHub actions...

I assume you mean the master branch?
Sure, I'll do some testing, give me a few minutes.
Anything specific I should check, or just the basic functionality?

@ToMe25
Copy link
Contributor Author

ToMe25 commented Nov 24, 2023

@lukasmalkmus i did some testing, and I didn't find any problems.

It took me this long in part because I'm an idiot and tried to run a x86 build on my RPi.

Here a few things I feel like i should mention, which I didn't mention so far because I thought you would test it yourself anyway:

  1. Unless they are disabled using the commandline argument the collector metrics are always active.
    That means the collect[]=cpu page is almost as long as the default metrics page.
  2. Its pretty easy to test this program on x86, as long you either disable the gpu collector or ignore that it fails.
    The cpu collector works on my device, and should work on most x86 devices with a mainline linux kernel.

Here is an example metrics page from a RPi3:

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000120572
go_gc_duration_seconds{quantile="0.25"} 0.001532132
go_gc_duration_seconds{quantile="0.5"} 0.002596921
go_gc_duration_seconds{quantile="0.75"} 0.004134053
go_gc_duration_seconds{quantile="1"} 0.015673036
go_gc_duration_seconds_sum 0.691850555
go_gc_duration_seconds_count 222
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 34
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.20.11"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.109264e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 1.85228316e+09
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 4609
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 9.947407e+06
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0.004279033624621717
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 4.17912e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 1.109264e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 2.2454272e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 3.629056e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 4677
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 2.2265856e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 2.6083328e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.7008280817057803e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 9.952084e+06
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 4800
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 15600
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 142720
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 310080
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.194304e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 978223
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 3.2768e+06
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 3.2768e+06
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 3.484776e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 25
# HELP node_textfile_scrape_error 1 if there was an error opening or reading a file, 0 otherwise
# TYPE node_textfile_scrape_error gauge
node_textfile_scrape_error 0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 120.65
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 21
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.6560128e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.7008272138e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 7.3732096e+08
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
# TYPE promhttp_metric_handler_errors_total counter
promhttp_metric_handler_errors_total{cause="encoding"} 0
promhttp_metric_handler_errors_total{cause="gathering"} 0
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 5
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 4869
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
# HELP rpi_cpu_frequency_hertz CPU Frequency in hertz (Hz).
# TYPE rpi_cpu_frequency_hertz gauge
rpi_cpu_frequency_hertz{cpu="0"} 800000
rpi_cpu_frequency_hertz{cpu="1"} 800000
rpi_cpu_frequency_hertz{cpu="2"} 800000
rpi_cpu_frequency_hertz{cpu="3"} 800000
# HELP rpi_cpu_temperature_celsius CPU temperature in degrees celsius (°C).
# TYPE rpi_cpu_temperature_celsius gauge
rpi_cpu_temperature_celsius 50.464
# HELP rpi_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which rpi_exporter was built.
# TYPE rpi_exporter_build_info gauge
rpi_exporter_build_info{branch="master",goversion="go1.20.11",revision="781baa69ab31dbfd5a490167ec6afd0f48b6ded0",version="0.8.0"} 1
# HELP rpi_gpu_frequency_hertz GPU frequency in hertz (Hz).
# TYPE rpi_gpu_frequency_hertz gauge
rpi_gpu_frequency_hertz{component="core"} 3e+08
rpi_gpu_frequency_hertz{component="h264"} 0
rpi_gpu_frequency_hertz{component="v3d"} 3e+08
# HELP rpi_gpu_temperature_celsius GPU temperature in degrees celsius (°C).
# TYPE rpi_gpu_temperature_celsius gauge
rpi_gpu_temperature_celsius 50.5
# HELP rpi_scrape_collector_duration_seconds rpi_exporter: Duration of a collector scrape.
# TYPE rpi_scrape_collector_duration_seconds gauge
rpi_scrape_collector_duration_seconds{collector="cpu"} 0.003638949
rpi_scrape_collector_duration_seconds{collector="gpu"} 0.045345613
rpi_scrape_collector_duration_seconds{collector="textfile"} 4.0104e-05
# HELP rpi_scrape_collector_success rpi_exporter: Whether a collector succeeded.
# TYPE rpi_scrape_collector_success gauge
rpi_scrape_collector_success{collector="cpu"} 1
rpi_scrape_collector_success{collector="gpu"} 1
rpi_scrape_collector_success{collector="textfile"} 1

@ToMe25
Copy link
Contributor Author

ToMe25 commented Jan 16, 2024

@lukasmalkmus

Hi.

Since some time passed since this PR, I was just wondering: Is there an ETA for the 0.9.0 release?

@lukasmalkmus
Copy link
Owner

@ToMe25 Apologies but maintaining this hasn't been the greatest experience, recently. If I ever have some time and motivation I refactor this and establish proper CI with GitHub Actions so we can have automatic releases and I don't have to fight with promu that I only use for this exporter. That should speed up the process and consume less of my time.

That being said, I managed to cut a pre-release. Do you mind giving it a test run before I cut a real release? https://github.com/lukasmalkmus/rpi_exporter/releases/tag/v0.9.0-rc.0.

/cc @carlosedp Looks like we'll release v0.9.0 soon. Are you able to publish that to DockerHub, once it's released?

@ToMe25
Copy link
Contributor Author

ToMe25 commented Jan 19, 2024

@lukasmalkmus If you want I can make a PR to set up GitHub CI for that for you.
Just let me know if that would help :)

If you have any special promu config/args you use for the official builds please let me know so I can integrate them in the CI.
Otherwise I will just make it run GOARM=X GO ARCH=arm make build for the different arm versions.

Also I would make it create a build for every push to the master branch, but not automatically add them to a release, unless you wish for something different.
I would do it this way, because automatically creating a release when pushing a tag would mean having to Auto-Generate the release description/change log.

If you would like anything different from my suggestion, please let me know :)

@lukasmalkmus
Copy link
Owner

I'm actually thinking of pulling out all the Promu and/or Prometheus specific pieces. I have another exporter here, that got an overhaul last year: https://github.com/lukasmalkmus/tankerkoenig_exporter.

The Makefile and CI there are way more sane if you wanna give it a look. I guess I'm opting for something like this.

@ToMe25
Copy link
Contributor Author

ToMe25 commented Jan 19, 2024

@lukasmalkmus That's probably the better solution, but I don't have the know how to help with that.

I will test the Rc build tomorrow and let you know if I find any issues.

@ToMe25
Copy link
Contributor Author

ToMe25 commented Jan 20, 2024

@lukasmalkmus I tested everything I could think of with one of the builds, and tested some basic things with the other three.

I did all of these tests on my RPi 3b with Raspbian lite 64 bit.

After this testing there are three things I think are worth mentioning, though I don't think any of them are that much of an issue.

  1. The h264 frequency was always shown as 0 in my tests.
    Running vcgencmd measure_clock h264 returns frequency(28)=0, which means h264 is a recognized component, but has a clock rate of 0.
    This could be because my RPi has no attached display and is not en-/de-coding any video at the moment.
    Either way, I don't think this is an issue with the exporter.
  2. The textfile exporter metrics are prefixed with node_, rather than rpi_. Is this intended?
  3. If a web.telemetry-path or web.healthcheck-path is not prefixed with a slash, it does not work at all.

I'll make a separate issue about the third of those, but I don't see it as anything urgent.

@carlosedp
Copy link

/cc @carlosedp Looks like we'll release v0.9.0 soon. Are you able to publish that to DockerHub, once it's released?

Hi Lukas, I can take a look at it once it's released... It's been a long time and I don't remember very well how I used to release it :)

@lukasmalkmus
Copy link
Owner

@ToMe25 Awesome. Thanks for testing. Will look through your issues/PRs by the end of the week. Also gonna look into modernising the CI/CD for this repo.

@carlosedp That would be great! On the other hand... If I give CI/CD an overhaul anyhow, we could probably release to GHCR and DockerHub simultaneously. If you share the Dockerfile with me we could probably set something up that releases under you DockerHub username. Probably we need to put an API token down for the GitHub Action Secrets. Lemme get back to you once I make some progress on this, if that's fine with you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants