-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement periodic gathering as a job in tech preview #764
Implement periodic gathering as a job in tech preview #764
Conversation
Reason: "AsExpected", | ||
LastTransitionTime: metav1.Now(), | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This status updating is temporary and will be addressed in https://issues.redhat.com/browse/CCXDEV-10590
Checked the code and tested it. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rluders, tremes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label qe-approved |
}, | ||
Spec: batchv1.JobSpec{ | ||
// backoff limit is 0 - we dont' want to restart the gathering immediately in case of failure | ||
BackoffLimit: new(int32), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not actually a problem, but I think that we may also want to add TTL... I was testing locally, and at least on my cluster, the completed jobs started to stack up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean this https://github.com/kubernetes/api/blob/v0.27.1/batch/v1/types.go#L302 ? Yes we keep the finished jobs on purpose and that's why there's this https://github.com/openshift/insights-operator/pull/764/files#diff-dec12f0cf28a91a837b46e755b46c081c9014121cdb770b2cd8ac916af29c8deR393 pruning method. Jobs older than 24h are removed....so I think there's currently no reason to set the TTL attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why, but I tried to run the job twice in less than 30 min and it gave a message saying that it couldn't create the job because it already existed. I think that I should try to reproduce it again. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User is not supposed to create or run the job manually. Creating the periodic-gather
jobs is the operator responsibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I tested it again, and it is working as expected. I think that the cluster that I was playing with was broken.
I want to clarify one thing about the jobs first. |
/lgtm |
/retest |
/lgtm |
manually tagging this PR for "Feature Merge Queue" |
/override ci/prow/e2e-metal-ipi-ovn-ipv6 |
@DennisPeriquet: Overrode contexts on behalf of DennisPeriquet: ci/prow/e2e-metal-ipi-ovn-ipv6, ci/prow/insights-operator-e2e-tests In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/override ci/prow/e2e |
@DennisPeriquet: Overrode contexts on behalf of DennisPeriquet: ci/prow/e2e, ci/prow/e2e-agnostic-upgrade In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@tremes: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This is causing all techpreview jobs to fail. https://issues.redhat.com/browse/OCPBUGS-14270 |
yeah I just found out that. Sorry. Let me revert it |
…preview (openshift#764)" This reverts commit bae9698.
Revert PR is #785 |
* Run gathering as separate Pod in tech preview * Move downloading of reports to operator part & propagate Insights Request ID * Minor updates * DataGather CR - very first commit * create a new DataGather CR and prune periodically * read and apply gatherers config from the new CR * Fix anonymizer * do not duplicate gatherer status creation * extract the job responsibility and fix contexts * Copy Gatherer status & tests * diskrecorder_test - narrow down the testing archive path * Fix error reporting in periodic.go * reorder manifest creation experiment * status reporter must be always started for now * rebase
* Run gathering as separate Pod in tech preview * Move downloading of reports to operator part & propagate Insights Request ID * Minor updates * DataGather CR - very first commit * create a new DataGather CR and prune periodically * read and apply gatherers config from the new CR * Fix anonymizer * do not duplicate gatherer status creation * extract the job responsibility and fix contexts * Copy Gatherer status & tests * diskrecorder_test - narrow down the testing archive path * Fix error reporting in periodic.go * reorder manifest creation experiment * status reporter must be always started for now * rebase
* Run gathering as separate Pod in tech preview * Move downloading of reports to operator part & propagate Insights Request ID * Minor updates * DataGather CR - very first commit * create a new DataGather CR and prune periodically * read and apply gatherers config from the new CR * Fix anonymizer * do not duplicate gatherer status creation * extract the job responsibility and fix contexts * Copy Gatherer status & tests * diskrecorder_test - narrow down the testing archive path * Fix error reporting in periodic.go * reorder manifest creation experiment * status reporter must be always started for now * rebase
* Implement periodic gathering as a job in tech preview (#764) * Run gathering as separate Pod in tech preview * Move downloading of reports to operator part & propagate Insights Request ID * Minor updates * DataGather CR - very first commit * create a new DataGather CR and prune periodically * read and apply gatherers config from the new CR * Fix anonymizer * do not duplicate gatherer status creation * extract the job responsibility and fix contexts * Copy Gatherer status & tests * diskrecorder_test - narrow down the testing archive path * Fix error reporting in periodic.go * reorder manifest creation experiment * status reporter must be always started for now * rebase * add resource requests to the new job * rebase * pass linting * rebase
* Run gathering as separate Pod in tech preview * Move downloading of reports to operator part & propagate Insights Request ID * Minor updates * DataGather CR - very first commit * create a new DataGather CR and prune periodically * read and apply gatherers config from the new CR * Fix anonymizer * do not duplicate gatherer status creation * extract the job responsibility and fix contexts * Copy Gatherer status & tests * diskrecorder_test - narrow down the testing archive path * Fix error reporting in periodic.go * reorder manifest creation experiment * status reporter must be always started for now * rebase
…preview (openshift#764)" This reverts commit bae9698.
* Implement periodic gathering as a job in tech preview (openshift#764) * Run gathering as separate Pod in tech preview * Move downloading of reports to operator part & propagate Insights Request ID * Minor updates * DataGather CR - very first commit * create a new DataGather CR and prune periodically * read and apply gatherers config from the new CR * Fix anonymizer * do not duplicate gatherer status creation * extract the job responsibility and fix contexts * Copy Gatherer status & tests * diskrecorder_test - narrow down the testing archive path * Fix error reporting in periodic.go * reorder manifest creation experiment * status reporter must be always started for now * rebase * add resource requests to the new job * rebase * pass linting * rebase
This changes the periodic data gathering to run as a Kubernetes jobs. This includes using and implementing a new
datagather.insights.openshift.io
custom resource. A newNewGatherAndUpload
command was added topkg/cmd/start/start.go
, but it's really implemented inpkg/controller/gather_commands.go
(which is not really new, but renamed fromgather_job.go
.The Kubernetes job is created in the
pkg/controller/periodic/job.go
and is responsible for the following:pkg/recorder/diskrecorder/diskrecorder.go
- we are only interested in the last archive (simply because there won't be more)pkg/insights/insightsuploader/insightsuploader.go
andpkg/insights/insightsclient/requests.go
, because we need to know and use theinsightsRequestID
Job is executed periodically by the new method
GatherJob()
inpkg/controller/periodic/periodic.go
. Each job also requires a newDataGather
CR (created inpkg/controller/periodic/periodic.go
by methodcreateNewDataGatherCR(ctx context.Context, disabledGatherers []string, dataPolicy insightsv1alpha1.DataPolicy)
). This custom resource reflects the same values, which user can configure ininsightsdatagather.config.openshift.io
resource) and this is the responsibility of thecreateDataGatherAttributeValues() ([]string, insightsv1alpha1.DataPolicy)
method inpkg/controller/periodic/periodic.go
.Finally the gatherers status needs to "copied" from the corresponding
DataGather
resource to theinsightsoperator.operator.openshift.io
resource - this is done incopyDataGatherStatusToOperatorStatus(ctx context.Context, dgName string) (*v1.InsightsOperator, error)
inpkg/controller/periodic/periodic.go
.Categories
Sample Archive
No new data
Documentation
No docs update yet
Unit Tests
pkg/controller/periodic/job_test.go
- a new testpkg/controller/status/gatherer_status_test.go
- a new testpkg/controller/periodic/periodic_test.go
- updated and extendedpkg/gather/gather_test.go
- updatedPrivacy
Yes. There are no sensitive data in the newly collected information.
Changelog
Breaking Changes
Yes/No
References
https://issues.redhat.com/browse/???
https://access.redhat.com/solutions/???