Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a persistence agent for logs, and Garbage Collection for Kubernetes Resources. #844

Closed
vicaire opened this issue Feb 21, 2019 · 11 comments
Labels
area/backend area/persist-agent help wanted The community is welcome to contribute. kind/feature lifecycle/stale The issue / pull request is stale, any activities remove this label. priority/p1

Comments

@vicaire
Copy link
Contributor

vicaire commented Feb 21, 2019

No description provided.

@IronPan
Copy link
Member

IronPan commented Mar 1, 2019

related to #93

@IronPan
Copy link
Member

IronPan commented Mar 1, 2019

FYI this bug is visible in two cases so far

  • auto scaling cluster
  • redeploy new cluster with a old PD

@animeshsingh
Copy link
Contributor

On our deployment of pipelines as well, logs get wiped away once in few weeks- what will be needed is configuration to persist the logs to a persistent storage, as well as support for archival

@vicaire
Copy link
Contributor Author

vicaire commented Mar 8, 2019

SG. This is open for contributions if you are interested. An object store would be great to persist the logs. If the implementation uses Minio Client, we would be able to persist the logs on GCS/S3/GCP.

@vicaire vicaire added priority/p0 help wanted The community is welcome to contribute. area/backend labels Mar 23, 2019
@vicaire
Copy link
Contributor Author

vicaire commented Mar 23, 2019

Description of the solution:

  • We need to persist logs.
  • We can then automatically delete Argo workflow resources and scheduled workflow resources from the K8 etcd store once the workflows are terminated.
  • We need to verify that the OWNER fields are properly set in all the resources created by the Argo workflow (for instance, if an Argo workflow creates a TF-job, the TF-job should have the Argo workflow as its owner).
  • This will automatically delete all dependent PODs and K8 resources.

@vicaire
Copy link
Contributor Author

vicaire commented Mar 25, 2019

Short term workaround from @amygdala

To clean up PODs:

  • Install the Argo CLI
  • Use the command: argo delete -n kubeflow --all
  • A downside, of course, is that you can no longer look at those pod logs in the UI.

@vicaire
Copy link
Contributor Author

vicaire commented Mar 26, 2019

As @animeshsingh suggested in #940, we could use the latest Argo executor to save logs to S3/GCS persistent volume when the archiving flag is enabled.

@Snapple49
Copy link

What is the status on this issue? Has there been any updates or contributions? Or is there any documentation on how to enable Argo archiving as suggested above?

@IronPan
Copy link
Member

IronPan commented Nov 29, 2019

GC is implemented now. You can set the TTL as persistence agent env here https://github.com/kubeflow/pipelines/pull/1802/files#diff-f4326ec4a2f4f6b219c2aab8887f6c85R21

The log can be persisted with ARGO_ARCHIVE_LOGS
#2081

@stale
Copy link

stale bot commented Jun 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 25, 2020
@Bobgy
Copy link
Contributor

Bobgy commented Jun 30, 2020

Both are supported

@Bobgy Bobgy closed this as completed Jun 30, 2020
Linchin pushed a commit to Linchin/pipelines that referenced this issue Apr 11, 2023
Merging for fast roll-back
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend area/persist-agent help wanted The community is welcome to contribute. kind/feature lifecycle/stale The issue / pull request is stale, any activities remove this label. priority/p1
Projects
None yet
Development

No branches or pull requests

6 participants