Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No space left on device blocks Provider upgrades #554

Closed
t0yv0 opened this issue Sep 8, 2023 · 8 comments
Closed

No space left on device blocks Provider upgrades #554

t0yv0 opened this issue Sep 8, 2023 · 8 comments
Assignees
Labels
kind/engineering Work that is not visible to an external user p1 A bug severe enough to be the next item assigned to an engineer resolution/fixed This issue was fixed
Milestone

Comments

@t0yv0
Copy link
Member

t0yv0 commented Sep 8, 2023

Some recent changes in Go SDK generation pushed the builds over the limit of disk space.

build_sdk (go)](https://github.com/pulumi/pulumi-azure/actions/runs/6116569557/job/16603257286)
Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/runners/2.308.0/_diag/Worker_20230908-024953-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at System.Diagnostics.TraceSource.Flush()
   at GitHub.Runner.Common.TraceManager.Dispose(Boolean disposing)
   at GitHub.Runner.Common.TraceManager.Dispose()
   at GitHub.Runner.Common.HostContext.Dispose(Boolean disposing)
   at GitHub.Runner.Common.HostContext.Dispose()
   at GitHub.Runner.Worker.Program.Main(String[] args)
System.IO.IOException: No space left on device : '/home/runner/runners/2.308.0/_diag/Worker_20230908-024953-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
System.IO.IOException: No space left on device : '/home/runner/runners/2.308.0/_diag/Worker_20230908-024953-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Common.Tracing.Error(Exception exception)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)

I've not investigated deeply but this can be also related to Go build dependency caching. If changes in dependencies invalidate the cache but we don't track cache key accurately, it can download the previous cache, and then download new packages again anyway during go build, this can psh things over the line. Workaround to try here is tinkering with cache keys to force a miss.

Or it can be excessively chatty logging we might need to compress here.

@t0yv0 t0yv0 added this to the 0.94 milestone Sep 8, 2023
@mikhailshilkov mikhailshilkov added the kind/engineering Work that is not visible to an external user label Sep 8, 2023
@iwahbe
Copy link
Member

iwahbe commented Sep 8, 2023

pulumi/pulumi-aws@0301de4 fixed the build error. This is a bandaid more then a cure, but it will get us going again.

@t0yv0
Copy link
Member Author

t0yv0 commented Sep 8, 2023

Yeah this is what @aq17 and I landed on trying based on @thomas11 idea earlier in the day. More power.

@t0yv0
Copy link
Member Author

t0yv0 commented Sep 11, 2023

Unfortunately K8S repo now fails in test(go) target with the same OOD. This is in on the way to my P1s fixes so I'd like to take this and chase it down a bit deeper.

@t0yv0 t0yv0 added the p1 A bug severe enough to be the next item assigned to an engineer label Sep 11, 2023
@t0yv0
Copy link
Member Author

t0yv0 commented Sep 12, 2023

We've leaned heavily into scheduling workloads on the pulumi-ubuntu-8core runner. This solution seems ok for now but may cause problems if the custom runner is out of capacity. In that case recommendation is to use the GitHub runners with more disk space. I was not able to full root cause for lack of time but I was able to measure K8s disk draw in the Go test job:

free: 231G
$ install stuff
free: 226G (-5G)
$ go test -run COMPILEONLY
free: 218G (-8G)
$ go test
free: 213G (-5G)

This runs out of 14G available on stock runners.

There are multiple reasons Go is very resource hungry here but I don't have exact data. 1. Azure SDK and 2 AWS SDKs and GCP SDKs are pulled into the compilation unit via program test pulumi/pkg spurious dependencies on Pulumi state backends; 2. when tests are run, more disk space is used by ProgramTest creating project copies; there might be some cleanup opportunity that's being missed.

@t0yv0
Copy link
Member Author

t0yv0 commented Sep 12, 2023

I'll close for now as I'm not sure it affects us anymore atm with the workarounds in place.

@t0yv0 t0yv0 closed this as completed Sep 12, 2023
@pulumi-bot pulumi-bot reopened this Sep 12, 2023
@pulumi-bot
Copy link

Cannot close issue:

  • does not have required labels: resolution/

Please fix these problems and try again.

@t0yv0 t0yv0 added the resolution/fixed This issue was fixed label Sep 12, 2023
@t0yv0 t0yv0 closed this as completed Sep 12, 2023
@iwahbe
Copy link
Member

iwahbe commented Sep 12, 2023

It's worth noting that pulumi-aws fails when building the complete go SDK, before it reaches go integration tests.

@t0yv0
Copy link
Member Author

t0yv0 commented Sep 12, 2023

Yes most providers just fail to build but K8S also fails to test. In both cases the compilation burden of either SDK or the tests with all the transitive dependencies is what I think sinks the runner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/engineering Work that is not visible to an external user p1 A bug severe enough to be the next item assigned to an engineer resolution/fixed This issue was fixed
Projects
None yet
Development

No branches or pull requests

5 participants