Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD shortfin CI fails when apt is locked #407

Closed
renxida opened this issue Nov 1, 2024 · 5 comments · Fixed by #415
Closed

SD shortfin CI fails when apt is locked #407

renxida opened this issue Nov 1, 2024 · 5 comments · Fixed by #415

Comments

@renxida
Copy link
Contributor

renxida commented Nov 1, 2024

Should probably first check if cmake & ninja are installed & of the proper version before attempting to install. I don't think checking requires acquiring the apt lock.

image

@ScottTodd
Copy link
Member

Workflow history: https://github.com/nod-ai/SHARK-Platform/actions/workflows/ci-sdxl.yaml
Sample logs: https://github.com/nod-ai/SHARK-Platform/actions/runs/11621536945/job/32366319347#step:2:13

Text:

Run sudo apt update -y

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Reading package lists...
E: Could not get lock /var/lib/apt/lists/lock. It is held by process 2271375 (apt)
E: Unable to lock directory /var/lib/apt/lists/

(screenshots aren't searchable of copy/paste-able ;P)

@monorimet
Copy link
Contributor

Added a fix to #411, but this still will throw an error if we don't have those installed and cannot acquire the lock. May not be an issue for a while.

output logs:
https://github.com/nod-ai/SHARK-Platform/actions/runs/11637357380/job/32410398564?pr=411#step:2:1

@ScottTodd
Copy link
Member

Added a fix to #411

Please keep changes small and send them as individual PRs. We shouldn't need to wait for a larger patch to be reviewed and landed to fix a broken CI build.

Something is suspicious here about the runner environment. Basic dependency setup / installation shouldn't be a sticky source of failures. Might make sense to run these jobs in containers or limit what other code (if any) is running on these machines.

@monorimet
Copy link
Contributor

Added a fix to #411

Please keep changes small and send them as individual PRs. We shouldn't need to wait for a larger patch to be reviewed and landed to fix a broken CI build.

Something is suspicious here about the runner environment. Basic dependency setup / installation shouldn't be a sticky source of failures. Might make sense to run these jobs in containers or limit what other code (if any) is running on these machines.

Agree, will send up separate PR.

Containers are probably a bit heavy for this, but as long as we have env/machine setup flexibility validated within reason, it might be the most robust solution.

@ScottTodd
Copy link
Member

I'm specifically wondering if apt update within a container is free from the hosting system's /var/lib/apt/lists/lock. We wouldn't need to use a container for packaging deps (we could)... just for isolation from other users on the runner machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants