Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry Job (NGF) #1382

Closed
mpstefan opened this issue Dec 14, 2023 · 0 comments · Fixed by #1448
Closed

Telemetry Job (NGF) #1382

mpstefan opened this issue Dec 14, 2023 · 0 comments · Fixed by #1448
Assignees
Labels
area/telemetry Issues related to collected telemetry data refined Requirements are refined and the issue is ready to be implemented. size/small Estimated to be completed within ~2 days
Milestone

Comments

@mpstefan
Copy link
Collaborator

mpstefan commented Dec 14, 2023

As a maintainer of NGF
I want a telemetry job that collects data every 24 hours
So that I can eventually send collected telemetry data to a centralized collector.

Acceptance Criteria

  • The telemetry job outputs some set of dummy data every time data would be collected to the debug log.
  • The telemetry job attempts to collect data every 24 hours, starting at initial deployment of NGF.
  • The time period at which data is collected can be configured by a developer for testing.
  • Only the leader pod runs the job
@mpstefan mpstefan added the area/telemetry Issues related to collected telemetry data label Dec 14, 2023
@mpstefan mpstefan added this to the v1.2.0 milestone Dec 14, 2023
@mpstefan mpstefan added size/small Estimated to be completed within ~2 days refined Requirements are refined and the issue is ready to be implemented. labels Dec 14, 2023
@pleshakov pleshakov mentioned this issue Jan 5, 2024
6 tasks
pleshakov added a commit to pleshakov/nginx-gateway-fabric that referenced this issue Jan 10, 2024
Problem:

We want to have a telemetry job that periodically reports product
telemetry every 24h. For now, telemetry data is empty and report is sent
to the debug log.

Solution:

- Refactor leader election to use controller-runtime manager
capabilities. This simplifies the existing code and make it easier to
add a telemetry Job.
- Add a telemetry Job that periodically reports empty telemetry to
the debug log.
- Make the period configurable at build time via TELEMETRY_REPORT_PERIOD
Makefile variable.

Note: leader elector refactoring changes behavior of NGF process
when leadership gets lost:
Before: the Manager would shutdown waiting for the runnables to exit.
After: the Manager doesn't wait. It similar to NGF process panicing.
This should be OK, as NGF container will restart and recover any
potentially broken state (update not fully populated statuses, restore
correct NGINX configuration).

Testing:
- Unit tests
- Manual testing:
  - Ensure leader election works as expected - both leader and
    non-pods run successfully.
  - Ensure NGF container exits when stop being leader.
  - Ensure an upgrade from Release 1.1.0 is successful for leader
    election - the leader gets elected among the new pods.
  - Ensure the telemetry Job reports telemetry multiple times, using
  a small value of ELEMETRY_REPORT_PERIOD

CLOSES nginx#1382
pleshakov added a commit that referenced this issue Jan 10, 2024
Problem:

We want to have a telemetry job that periodically reports product
telemetry every 24h. For now, telemetry data is empty and report is sent
to the debug log.

Solution:

- Refactor leader election to use controller-runtime manager
capabilities. This simplifies the existing code and make it easier to
add a telemetry Job.
- Add a telemetry Job that periodically reports empty telemetry to
the debug log.
- Make the period configurable at build time via TELEMETRY_REPORT_PERIOD
Makefile variable.

Note: leader elector refactoring changes behavior of NGF process
when leadership gets lost:
Before: the Manager would shutdown waiting for the runnables to exit.
After: the Manager doesn't wait. It similar to NGF process panicing.
This should be OK, as NGF container will restart and recover any
potentially broken state (update not fully populated statuses, restore
correct NGINX configuration).

Testing:
- Unit tests
- Manual testing:
  - Ensure leader election works as expected - both leader and
    non-pods run successfully.
  - Ensure NGF container exits when stop being leader.
  - Ensure an upgrade from Release 1.1.0 is successful for leader
    election - the leader gets elected among the new pods.
  - Ensure the telemetry Job reports telemetry multiple times, using
  a small value of ELEMETRY_REPORT_PERIOD

CLOSES #1382

Co-authored-by: Saylor Berman <s.berman@f5.com>
miledxz added a commit to miledxz/nginx-gateway-fabric that referenced this issue Jan 14, 2025
Problem:

We want to have a telemetry job that periodically reports product
telemetry every 24h. For now, telemetry data is empty and report is sent
to the debug log.

Solution:

- Refactor leader election to use controller-runtime manager
capabilities. This simplifies the existing code and make it easier to
add a telemetry Job.
- Add a telemetry Job that periodically reports empty telemetry to
the debug log.
- Make the period configurable at build time via TELEMETRY_REPORT_PERIOD
Makefile variable.

Note: leader elector refactoring changes behavior of NGF process
when leadership gets lost:
Before: the Manager would shutdown waiting for the runnables to exit.
After: the Manager doesn't wait. It similar to NGF process panicing.
This should be OK, as NGF container will restart and recover any
potentially broken state (update not fully populated statuses, restore
correct NGINX configuration).

Testing:
- Unit tests
- Manual testing:
  - Ensure leader election works as expected - both leader and
    non-pods run successfully.
  - Ensure NGF container exits when stop being leader.
  - Ensure an upgrade from Release 1.1.0 is successful for leader
    election - the leader gets elected among the new pods.
  - Ensure the telemetry Job reports telemetry multiple times, using
  a small value of ELEMETRY_REPORT_PERIOD

CLOSES nginx#1382

Co-authored-by: Saylor Berman <s.berman@f5.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/telemetry Issues related to collected telemetry data refined Requirements are refined and the issue is ready to be implemented. size/small Estimated to be completed within ~2 days
Projects
None yet
2 participants