Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement periodic gathering as a job in tech preview #764

Merged
merged 15 commits into from
May 26, 2023

Conversation

tremes
Copy link
Contributor

@tremes tremes commented Apr 5, 2023

This changes the periodic data gathering to run as a Kubernetes jobs. This includes using and implementing a new datagather.insights.openshift.io custom resource. A new NewGatherAndUpload command was added to pkg/cmd/start/start.go, but it's really implemented in pkg/controller/gather_commands.go (which is not really new, but renamed from gather_job.go.

The Kubernetes job is created in the pkg/controller/periodic/job.go and is responsible for the following:

  • run the data gathering
  • recording the data as an archive - this is the change in pkg/recorder/diskrecorder/diskrecorder.go - we are only interested in the last archive (simply because there won't be more)
  • uploading the data - these are the changes in pkg/insights/insightsuploader/insightsuploader.go and pkg/insights/insightsclient/requests.go, because we need to know and use the insightsRequestID

Job is executed periodically by the new method GatherJob() in pkg/controller/periodic/periodic.go. Each job also requires a new DataGather CR (created in pkg/controller/periodic/periodic.go by method createNewDataGatherCR(ctx context.Context, disabledGatherers []string, dataPolicy insightsv1alpha1.DataPolicy)). This custom resource reflects the same values, which user can configure in insightsdatagather.config.openshift.io resource) and this is the responsibility of the createDataGatherAttributeValues() ([]string, insightsv1alpha1.DataPolicy) method in pkg/controller/periodic/periodic.go.

Finally the gatherers status needs to "copied" from the corresponding DataGather resource to the insightsoperator.operator.openshift.io resource - this is done in copyDataGatherStatusToOperatorStatus(ctx context.Context, dgName string) (*v1.InsightsOperator, error) in pkg/controller/periodic/periodic.go.

Categories

  • Bugfix
  • Data Enhancement
  • Feature
  • Backporting
  • Others (CI, Infrastructure, Documentation)

Sample Archive

No new data

Documentation

No docs update yet

Unit Tests

  • pkg/controller/periodic/job_test.go - a new test
  • pkg/controller/status/gatherer_status_test.go - a new test
  • pkg/controller/periodic/periodic_test.go - updated and extended
  • pkg/gather/gather_test.go - updated

Privacy

Yes. There are no sensitive data in the newly collected information.

Changelog

Breaking Changes

Yes/No

References

https://issues.redhat.com/browse/???
https://access.redhat.com/solutions/???

@openshift-ci openshift-ci bot requested review from ncaak and rluders April 5, 2023 10:41
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 5, 2023
Reason: "AsExpected",
LastTransitionTime: metav1.Now(),
},
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This status updating is temporary and will be addressed in https://issues.redhat.com/browse/CCXDEV-10590

@tremes tremes changed the title Implement periodic gathering as a job in tech preview WIP Implement periodic gathering as a job in tech preview Apr 5, 2023
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 5, 2023
@tremes tremes changed the title WIP Implement periodic gathering as a job in tech preview Implement periodic gathering as a job in tech preview Apr 6, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 6, 2023
@rluders
Copy link
Contributor

rluders commented Apr 18, 2023

Checked the code and tested it.
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 18, 2023
@openshift-ci
Copy link

openshift-ci bot commented Apr 18, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rluders, tremes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ikerreyes
Copy link
Contributor

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Apr 21, 2023
},
Spec: batchv1.JobSpec{
// backoff limit is 0 - we dont' want to restart the gathering immediately in case of failure
BackoffLimit: new(int32),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not actually a problem, but I think that we may also want to add TTL... I was testing locally, and at least on my cluster, the completed jobs started to stack up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean this https://github.com/kubernetes/api/blob/v0.27.1/batch/v1/types.go#L302 ? Yes we keep the finished jobs on purpose and that's why there's this https://github.com/openshift/insights-operator/pull/764/files#diff-dec12f0cf28a91a837b46e755b46c081c9014121cdb770b2cd8ac916af29c8deR393 pruning method. Jobs older than 24h are removed....so I think there's currently no reason to set the TTL attribute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why, but I tried to run the job twice in less than 30 min and it gave a message saying that it couldn't create the job because it already existed. I think that I should try to reproduce it again. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User is not supposed to create or run the job manually. Creating the periodic-gather jobs is the operator responsibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I tested it again, and it is working as expected. I think that the cluster that I was playing with was broken.

@rluders
Copy link
Contributor

rluders commented May 15, 2023

I want to clarify one thing about the jobs first.
/remove-lgtm

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 15, 2023
@rluders
Copy link
Contributor

rluders commented May 15, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 15, 2023
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 17, 2023
@tremes
Copy link
Contributor Author

tremes commented May 22, 2023

/retest

@DennisPeriquet
Copy link
Contributor

/lgtm
/label jira/valid-bug

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 26, 2023
@DennisPeriquet
Copy link
Contributor

manually tagging this PR for "Feature Merge Queue"

@DennisPeriquet
Copy link
Contributor

/override ci/prow/e2e-metal-ipi-ovn-ipv6
/override ci/prow/insights-operator-e2e-tests

@openshift-ci
Copy link

openshift-ci bot commented May 26, 2023

@DennisPeriquet: Overrode contexts on behalf of DennisPeriquet: ci/prow/e2e-metal-ipi-ovn-ipv6, ci/prow/insights-operator-e2e-tests

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-ipv6
/override ci/prow/insights-operator-e2e-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@DennisPeriquet
Copy link
Contributor

/override ci/prow/e2e
/override ci/prow/e2e-agnostic-upgrade

@openshift-ci
Copy link

openshift-ci bot commented May 26, 2023

@DennisPeriquet: Overrode contexts on behalf of DennisPeriquet: ci/prow/e2e, ci/prow/e2e-agnostic-upgrade

In response to this:

/override ci/prow/e2e
/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit bae9698 into openshift:master May 26, 2023
@openshift-ci
Copy link

openshift-ci bot commented May 26, 2023

@tremes: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@stbenjam
Copy link
Member

This is causing all techpreview jobs to fail. https://issues.redhat.com/browse/OCPBUGS-14270

stbenjam added a commit to stbenjam/insights-operator that referenced this pull request May 30, 2023
@tremes
Copy link
Contributor Author

tremes commented May 30, 2023

yeah I just found out that. Sorry. Let me revert it

tremes added a commit to tremes/insights-operator that referenced this pull request May 30, 2023
@tremes
Copy link
Contributor Author

tremes commented May 30, 2023

Revert PR is #785

tremes added a commit to tremes/insights-operator that referenced this pull request May 31, 2023
* Run gathering as separate Pod in tech preview

* Move downloading of reports to operator part & propagate Insights Request ID

* Minor updates

* DataGather CR - very first  commit

* create a new DataGather CR and prune periodically

* read and apply gatherers config from the new CR

* Fix anonymizer

* do not duplicate gatherer status creation

* extract the job responsibility and fix contexts

* Copy Gatherer status & tests

* diskrecorder_test - narrow down the testing archive path

* Fix error reporting in periodic.go

* reorder manifest creation experiment

* status reporter must be always started for now

* rebase
tremes added a commit to tremes/insights-operator that referenced this pull request Jun 2, 2023
* Run gathering as separate Pod in tech preview

* Move downloading of reports to operator part & propagate Insights Request ID

* Minor updates

* DataGather CR - very first  commit

* create a new DataGather CR and prune periodically

* read and apply gatherers config from the new CR

* Fix anonymizer

* do not duplicate gatherer status creation

* extract the job responsibility and fix contexts

* Copy Gatherer status & tests

* diskrecorder_test - narrow down the testing archive path

* Fix error reporting in periodic.go

* reorder manifest creation experiment

* status reporter must be always started for now

* rebase
tremes added a commit to tremes/insights-operator that referenced this pull request Jun 13, 2023
* Run gathering as separate Pod in tech preview

* Move downloading of reports to operator part & propagate Insights Request ID

* Minor updates

* DataGather CR - very first  commit

* create a new DataGather CR and prune periodically

* read and apply gatherers config from the new CR

* Fix anonymizer

* do not duplicate gatherer status creation

* extract the job responsibility and fix contexts

* Copy Gatherer status & tests

* diskrecorder_test - narrow down the testing archive path

* Fix error reporting in periodic.go

* reorder manifest creation experiment

* status reporter must be always started for now

* rebase
openshift-merge-robot pushed a commit that referenced this pull request Jun 21, 2023
* Implement periodic gathering as a job in tech preview (#764)

* Run gathering as separate Pod in tech preview

* Move downloading of reports to operator part & propagate Insights Request ID

* Minor updates

* DataGather CR - very first  commit

* create a new DataGather CR and prune periodically

* read and apply gatherers config from the new CR

* Fix anonymizer

* do not duplicate gatherer status creation

* extract the job responsibility and fix contexts

* Copy Gatherer status & tests

* diskrecorder_test - narrow down the testing archive path

* Fix error reporting in periodic.go

* reorder manifest creation experiment

* status reporter must be always started for now

* rebase

* add resource requests to the new job

* rebase

* pass linting

* rebase
@tremes tremes deleted the gather-job-tp branch July 10, 2023 10:14
JoaoFula pushed a commit to JoaoFula/insights-operator that referenced this pull request Jan 23, 2024
* Run gathering as separate Pod in tech preview

* Move downloading of reports to operator part & propagate Insights Request ID

* Minor updates

* DataGather CR - very first  commit

* create a new DataGather CR and prune periodically

* read and apply gatherers config from the new CR

* Fix anonymizer

* do not duplicate gatherer status creation

* extract the job responsibility and fix contexts

* Copy Gatherer status & tests

* diskrecorder_test - narrow down the testing archive path

* Fix error reporting in periodic.go

* reorder manifest creation experiment

* status reporter must be always started for now

* rebase
JoaoFula pushed a commit to JoaoFula/insights-operator that referenced this pull request Jan 23, 2024
JoaoFula pushed a commit to JoaoFula/insights-operator that referenced this pull request Jan 23, 2024
* Implement periodic gathering as a job in tech preview (openshift#764)

* Run gathering as separate Pod in tech preview

* Move downloading of reports to operator part & propagate Insights Request ID

* Minor updates

* DataGather CR - very first  commit

* create a new DataGather CR and prune periodically

* read and apply gatherers config from the new CR

* Fix anonymizer

* do not duplicate gatherer status creation

* extract the job responsibility and fix contexts

* Copy Gatherer status & tests

* diskrecorder_test - narrow down the testing archive path

* Fix error reporting in periodic.go

* reorder manifest creation experiment

* status reporter must be always started for now

* rebase

* add resource requests to the new job

* rebase

* pass linting

* rebase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants