Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

want better runtime isolation of sled components from ENOSPC #7227

Open
davepacheco opened this issue Dec 10, 2024 · 1 comment
Open

want better runtime isolation of sled components from ENOSPC #7227

davepacheco opened this issue Dec 10, 2024 · 1 comment

Comments

@davepacheco
Copy link
Collaborator

See #7221. Currently, Crucible regions have both a quota and a reservation. Control plane zones have neither. This means that if the control plane makes a mistake and puts too much on the same system, or if anything on the system doesn't have a quota or if the sum of quotas exceeds the total available space, then the control plane components can wind up running out of disk space even when they're not using very much at all.

I think the easiest thing to do here would be to pick generous limits and use both a quota and reservation for both zone root filesystems and persistent data filesystems. This should avoid overprovisioning entirely. These values would probably have to be deployment-specific, since the values for production probably won't work for, say, a4x2.

The downside of this is that it might lead to lower total disk utilization when the system is full. If that becomes a problem (it does not seem remotely like one of our major problems right now) we could explore smarter options that involve more work.

@davepacheco
Copy link
Collaborator Author

Related: #1630.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant