Telemetry Job (NGF) #1382

mpstefan · 2023-12-14T16:33:18Z

As a maintainer of NGF
I want a telemetry job that collects data every 24 hours
So that I can eventually send collected telemetry data to a centralized collector.

Acceptance Criteria

The telemetry job outputs some set of dummy data every time data would be collected to the debug log.
The telemetry job attempts to collect data every 24 hours, starting at initial deployment of NGF.
The time period at which data is collected can be configured by a developer for testing.
Only the leader pod runs the job

Problem: We want to have a telemetry job that periodically reports product telemetry every 24h. For now, telemetry data is empty and report is sent to the debug log. Solution: - Refactor leader election to use controller-runtime manager capabilities. This simplifies the existing code and make it easier to add a telemetry Job. - Add a telemetry Job that periodically reports empty telemetry to the debug log. - Make the period configurable at build time via TELEMETRY_REPORT_PERIOD Makefile variable. Note: leader elector refactoring changes behavior of NGF process when leadership gets lost: Before: the Manager would shutdown waiting for the runnables to exit. After: the Manager doesn't wait. It similar to NGF process panicing. This should be OK, as NGF container will restart and recover any potentially broken state (update not fully populated statuses, restore correct NGINX configuration). Testing: - Unit tests - Manual testing: - Ensure leader election works as expected - both leader and non-pods run successfully. - Ensure NGF container exits when stop being leader. - Ensure an upgrade from Release 1.1.0 is successful for leader election - the leader gets elected among the new pods. - Ensure the telemetry Job reports telemetry multiple times, using a small value of ELEMETRY_REPORT_PERIOD CLOSES nginx#1382

Problem: We want to have a telemetry job that periodically reports product telemetry every 24h. For now, telemetry data is empty and report is sent to the debug log. Solution: - Refactor leader election to use controller-runtime manager capabilities. This simplifies the existing code and make it easier to add a telemetry Job. - Add a telemetry Job that periodically reports empty telemetry to the debug log. - Make the period configurable at build time via TELEMETRY_REPORT_PERIOD Makefile variable. Note: leader elector refactoring changes behavior of NGF process when leadership gets lost: Before: the Manager would shutdown waiting for the runnables to exit. After: the Manager doesn't wait. It similar to NGF process panicing. This should be OK, as NGF container will restart and recover any potentially broken state (update not fully populated statuses, restore correct NGINX configuration). Testing: - Unit tests - Manual testing: - Ensure leader election works as expected - both leader and non-pods run successfully. - Ensure NGF container exits when stop being leader. - Ensure an upgrade from Release 1.1.0 is successful for leader election - the leader gets elected among the new pods. - Ensure the telemetry Job reports telemetry multiple times, using a small value of ELEMETRY_REPORT_PERIOD CLOSES #1382 Co-authored-by: Saylor Berman <s.berman@f5.com>

Problem: We want to have a telemetry job that periodically reports product telemetry every 24h. For now, telemetry data is empty and report is sent to the debug log. Solution: - Refactor leader election to use controller-runtime manager capabilities. This simplifies the existing code and make it easier to add a telemetry Job. - Add a telemetry Job that periodically reports empty telemetry to the debug log. - Make the period configurable at build time via TELEMETRY_REPORT_PERIOD Makefile variable. Note: leader elector refactoring changes behavior of NGF process when leadership gets lost: Before: the Manager would shutdown waiting for the runnables to exit. After: the Manager doesn't wait. It similar to NGF process panicing. This should be OK, as NGF container will restart and recover any potentially broken state (update not fully populated statuses, restore correct NGINX configuration). Testing: - Unit tests - Manual testing: - Ensure leader election works as expected - both leader and non-pods run successfully. - Ensure NGF container exits when stop being leader. - Ensure an upgrade from Release 1.1.0 is successful for leader election - the leader gets elected among the new pods. - Ensure the telemetry Job reports telemetry multiple times, using a small value of ELEMETRY_REPORT_PERIOD CLOSES nginx#1382 Co-authored-by: Saylor Berman <s.berman@f5.com>

mpstefan mentioned this issue Dec 14, 2023

Adoption Telemetry #793

Closed

mpstefan added the area/telemetry Issues related to collected telemetry data label Dec 14, 2023

mpstefan added this to the v1.2.0 milestone Dec 14, 2023

mpstefan added size/small Estimated to be completed within ~2 days refined Requirements are refined and the issue is ready to be implemented. labels Dec 14, 2023

ja20222 assigned pleshakov Dec 14, 2023

This was referenced Dec 14, 2023

Add telemetry job - option 1 #1391

Closed

Add telemetry job - option 2 #1392

Closed

kate-osborn mentioned this issue Dec 19, 2023

Collect Cluster Identifier (NGF) #1303

Closed

pleshakov mentioned this issue Jan 5, 2024

Add telemetry job #1448

Merged

6 tasks

pleshakov closed this as completed in #1448 Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Telemetry Job (NGF) #1382

Telemetry Job (NGF) #1382

mpstefan commented Dec 14, 2023 •

edited by pleshakov

Loading

Telemetry Job (NGF) #1382

Telemetry Job (NGF) #1382

Comments

mpstefan commented Dec 14, 2023 • edited by pleshakov Loading

Acceptance Criteria

mpstefan commented Dec 14, 2023 •

edited by pleshakov

Loading