- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable
- (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- [] (R) Production readiness review completed
- [] (R) Production readiness review approved
- "Implementation History" section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
This KEP is about enhancing kubelet to be aware if a container runtime splits the image filesystem.
Aware in this case means that garbage collecting images, containers and reporting disk usage is all functional.
kubelet has two distinct filesystems: Node and Image. In typical deployments, users deploy Kubernetes where both the Node and Image filesystems are on the same disk. There are some requests to separate the storage into separate disks. The most common request is to separate the writable layer from the read-only layer. kubelet and Container data would be stored on the same disk while images would have their own disk. This could be beneficial because images occupy a lot of disk space while the writeable layer is typically smaller.
Container IO can impact kubelet and adding the ability for more disks could increase performance of kubelet.
However, it is not possible to separate the image layers and container writable layers on different disks.
In the current implementation of separate disks, containers and images must be stored on the same disk. So garbage collection, in case of node pressure (really image disk pressure) would GC images/containers on the image filesystem.
If one separates writable layer (containers) from readable layer (images), then garbage collection and statistics must account for this separation. Today this could potentially break kubelet if the container runtime configures storage in this way.
One downside of the separate disk is that pod data can be written in multiple locations. The writeable layer of a container would go on the image filesystem and volume storage would go to the root fs. There is another request to separate the root and the image filesystem to be writeable and read-only respective. This means that pod data can be written on one disk while the other disk can be read-only. Separting the writeable layer and the read-only layer will achieve this.
- kubelet should still work if images/containers are separated into different disks
- Support writable layer being on same disk as kubelet
- Images can be on the separate filesystem
kubelet, Images and Containers on all separate disks.
This case is possible with this implementation as ContainerFS will be set up to read file statistics from a separate filesystem. However, this is not in scope for Alpha.
If there is interest in this, this KEP could be extended to support this use case. Main areas to add would be testing.
- Multiple nodes can not share the same filesystem
- Separating kubelet data into different filesystems
- Multiple image and/or container filesystems
- This KEP will start support for this but more work needs to be done to investigate CAdvisor/CRIStats/Eviction to support this
As a user, I would like to have my node configured so that I have a writeable filesystem and a readable filesystem.
kubelet will write volume data and the container runtime will write writeable layers to the writeable filesystem while the container runtime will write the images to the read-only filesystem.
It is not a common pattern to separate the filesystems in most Kubernetes deployments. We will summarize the existing configurations that are possible today.
sda0: [writeable layer, emptyDir, logs, read-only layer, ephemeral storage]
This is the default configuration for Kubernetes. If container runtime is not configured in any special way, then NodeFs and ImageFs are assumed to be the same.
If the node only has a NodeFs filesystem that meets eviction thresholds, the kubelet frees up disk space in the following order:
- Garbage collect dead pods and containers
- Delete unused images
The way that pods are ranked for eviction also changes based on the filesystem.
kubelet sorts pods based on their total disk usage (local volumes + logs & writable layer of all containers)
Node Pressure Eviction lists the possible options for how to reclaim resources based on filesystem configuration.
Ephemeral-Storage explains how ephemeral-storage tracking works with different filesystem configurations.
sda0: [emptyDir, logs, ephemeral storage] sda1: [writeable layer, read-only layer]
If the node has a dedicated ImageFs filesystem for container runtimes to use, the kubelet does the following:
- If the node filesystem meets the eviction thresholds, the kubelet garbage collects dead pods and logs
- If the ImageFs filesystem meets the eviction thresholds, the kubelet deletes all unused images and containers
- If ImageFs has disk pressure we will mark node as unhealthy and not allow new pods to be admitted until image disk pressure is gone
In case of disk pressure on each filesystem, what is garbage collected/stored on the disk?
Node Filesystem:
- Logs
- Pods
- Ephemeral Storage
Image Filesystem:
- Images
- Containers
CAdvisor detects the different disks based on mountpoints. So if a user mounts a separate disk to /var/lib/containers, kubelet will think that the filesystem is split.
Users can write the writeable layer of a container and that would be stored on the image filesystem while data written in volumes can be written to the node filesystem.
Since this split case has two different filesystems that can have disk pressure, Pods are ranked differently based on what is experencing disk pressure.
Node Pressure:
- Local volumes + logs of all containers
Image Pressure:
- Sorts pods based on the writeable layer usage of all containers
sda0: [writable layer, emptyDir, logs, ephemeral storage] sda1: [read-only layer]
A goal is to allow kubelet to have separate disks for read-only layer and everything else could be stored on the same disk as kubelet.
In case of disk pressure on each filesystem, what is garbage collected/stored on the disk?
Node Fileystem:
- Pods
- Logs
- Containers
- Ephemeral Storage
Image Filesystem:
- Images
Node Filesystem should monitor storage for containers in addition to ephemeral storage.
We foresee interest in the future for other use cases. So we want to comment on what work would be required to support these usecases.
One extension can be multiple filesystems for images and containers.
The API allows for a list of filesystem usage per images and containers but there has been no work done to support
this in the container runtimes or in kubelet.
CAdvisor and Stats would need to be enhanced to allow for configurable amount of filesystems.
Currently, eviction manager is harded code to support a 1-to-1 relationship with a filesystem and a eviction signal.
The following cases could be configured but we are not targeting these at the moment.
a. Node, Writeable layer and Image on separate filesystems. b. Node and Images on same filesystem while Writeable layer on separate filesystem.
By splitting the filesystem we allow more cases than what we currently support in kubelet. To avoid bugs, we will validate on cases we don't currently support in kubelet and return an error.
The following cases will be validated and we will return an error if container runtime is set up for this:
- More than one filesystem for images and containers
We will validate if the CRI implementation is returning more than 1 filesystem and log a warning.
A major risk of this feature will be increased evictions due to the addition of a new filesystem.
The eviction manager monitors image filesystem, node filesystem and now container filesystem for disk pressure.
Disk pressure can be inodes or storage limits.
Once the disk is exceeds the limits set by EvictionSoft
or EvictionHard
, then that node will eventually be marked as having disk pressure.
Garbage collection of containers, images or pods will be kicked off (depending on which filesystem experiences disk pressure).
New workloads will not be accepted by that node until disk pressure resolves itself either by garbage collection removing enough or manually intervention.
A mitigation for this is to initially support the case of the writeable layer being on the node filesystem (ContainerFs same as NodeFs), so we really are only monitoring two filesystems for pressure.
We will switch to using ImageFsInfo
but this will be guarded by a feature gate.
CRI-O and Containerd return a single element in this case and kubelet does not assume that there are multiple values in this array. Regardless, we add an array to ImageFsInfoResponse.
// ImageService defines the public APIs for managing images.
service ImageService {
…
rpc ImageFsInfo(ImageFsInfoRequest) returns (ImageFsInfoResponse) {}
}
message ImageFsInfoResponse {
// Information of image filesystem(s).
repeated FilesystemUsage image_filesystems = 1;
+ // Information of container filesystem(s).
+ // This is an optional field if container and image
+ // storage are separated.
+ // Default will be to return this as empty.
+ repeated FilesystemUsage container_filesystems = 2;
}
It is expected of the CRI implementation to return a unique identifier for images and containers so the kubelet can ask CRI if the objects are stored on separate disks. In the dedicated disk for container runtime, images_filesystem and container_fileystem will be set to the same value.
The CRI implementation can set this as needed. The image and container filesystems are both arrays so this provides some extensibility in case these are stored on multiple disks.
Container runtimes will need to implement ImageFsInfo
An Alpha to Beta graduation goal would be to have an implementation of crictl imagefsinfo
that can allow for more detailed reports of the image fs info.
See PR for an example.
Stats Summary has a field called runtime and we will add a ContainerFS to the runtime field.
// RuntimeStats are stats pertaining to the underlying container runtime.
type RuntimeStats struct {
// Stats about the underlying filesystem where container images are stored.
// This filesystem could be the same as the primary (root) filesystem.
// Usage here refers to the total number of bytes occupied by images on the filesystem.
// +optional
ImageFs *FsStats `json:"imageFs,omitempty"`
+ // Stats about the underlying filesystem where container's writeable layer is stored.
+ // This filesystem could be the same as the primary (root) filesystem or the ImageFs.
+ // Usage here refers to the total number of bytes occupied by the writeable layer on the filesystem.
+ // +optional
+ ContainerFs *FsStats `json:"containerFs,omitempty"`
}
In this KEP, ContainerFs can either be the same as ImageFs or NodeFs.
We will add a more detailed function for ImageFsStats in the Provider Interface
type containerStatsProvider interface {
...
ImageFsStats(ctx context.Context) (*statsapi.FsStats, *statsapi.FsStats, error)
}
If we have a single image filesystem then ImageFs includes both writable and read-only layer. In this case, ImageFsStats
will return an identical object for ImageFS and ContainerFS.
In a case where the container runtime does not return a container filesystem, we will assume that the image_filesystem=container_filesystem.
This allows us kubelet to support container runtimes that have yet implemented the CRI implementation in ImageFsInfo
.
The CRI Stats Provider uses the ImageFsInfo
to get information about the filesystems,
but the CAdvisor Stats Provider uses ImageStats
which will list the images and computes the overall size from this list.
This switch will be guarded by a feature gate.
CRI-O uses the CAdvisor Stats provider.
CAdvisor has plugins for each container runtime under containers. CRI-O
The plugin in CRI-O relies on the endpoints info
and container/{id}
. Info is used to get information about the storage filesystem and
container gets information about the mount points. CRI-O will add a new field storage_image
to tell when we are splitting the filesystem.
This is used to gather file stats.
CAdvisor labels CRI-O images as crio-images
and that is assumed to be the mountpoint of the container. When splitting the filesystem this
ends up pointing to the writeable layer of the container.
We will propose a new label in CAdvisor: crio-containers
will point to the writeable layer and crio-images
will point to the read-only layer.
In case of no split system, crio-images
will be used for both layers.
We have created CAdvisor PR to suggest how CAdvisor's can be enhanced to support a container filesystem.
Containerd uses the CRI Stats Provider.
CRI Stats Provider calls ImageFsInfo
and uses the FsId
to get the filesystem information from CAdvisor
. One could label the FsId
for the writeable layer and this will be used to get the file system information for the container filesystem.
No changes should be necessary in CAdvisor for this provider.
A new signal will be added to the eviction manager to reflect the filesystem for the writeable layer.
For the first release on this KEP, this will be either NodeFs or ImageFs.
In separate disks, this could be a separate filesystem.
// SignalContainerFsAvailable is amount of storage available on filesystem that container runtime uses for container writable layers.
SignalContainerFsAvailable Signal = "containerfs.available"
// SignalContainerFsInodesFree is amount of inodes available on filesystem that container runtime uses for container writable layers.
SignalContainerFsInodesFree Signal = "containerfs.inodesFree"
We do need to change the garbage collection based on the split filesystem case.
(Split Filesystem) Writable and root plus ImageFs for images
- NodeFs monitors ephemeral-storage, logs and writable layer
- ImageFs monitors read-only
Eviction manager decides the priority of eviction based on which filesystem is experencing pressure.
If Node filesystem experiences pressure, ranking is done as local volumes + logs of all containers + writeable layer of all containers
If Image filesystem experiences pressure, ranking is done as storage of images
.
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
- (pkg/kubelet/eviction): Sep 11th 2023 - 69.9
- (pkg/kubelet/stats): Sep 11th 2023 - 77.9
- (pkg/kubelet/server/stat): Sep 11th 2023 - 55
This KEP will enhance coverage in the eviction manager by covering the case where dedicatedImageFs
is true.
There is currently little test coverage when a separate ImageFs is used. Issue-120061 has been created to help resolve this.
We will also provide test cases for rolling back the changes in the eviction manager.
We will add unit tests to cover using ImageFsInfo
and we will have testing around rolling back this feature.
We will add test cases for ImageStats
in case of positive and negative usage of the feature. In negative cases, we will assume containerfs=imagefs. In positive test cases, we will allow different configurations of the image filesystem.
Typically these type of tests are done with e2e tests.
This code affects stats, eviction and the summary API.
There should be e2e tests for each of these components with a split disk.
However, there are a few complications with this goal.
-
E2E tests around eviction with a single disk are currently CRI-O-eviction and containerd-eviction failing.
-
There is zero test coverage around a separate image filesystem. There is an issue to improve this at the unit test level.
1 can be addressed by investigating the eviction tests and figure out the root cause of these failures.
As part of this KEP, we should add testing around separate disks in upstream Kubernetes. Since this is already a supported use case in kubelet, there should be testing around this.
kubelet/CRI-O should be set up with configuration for a separate disk. Eviction and Summary E2E tests should be added in the case of a separate disk.
And tests for split image filesystem should be added.
E2E Test Use Cases addition:
- E2E tests for summary api with separate disk
- Separate Disk - ImageFs reports separate disk from root when disk is mounted
- Split Disk - Writeable layer on Node, read-only layer on ImageFs
- E2E tests for eviction api with separate disk
- Replicate existing disk pressure eviction e2e tests with disk
E2E tests for separate disk:
- Presubmits - Added separate ImageFs
- Presubmits - Added conformance test for ImageFs
CRI API changes are composed in containerd and CRI-O so the CRI API must be released first.
- Using
ImageFsInfo
is guarded with a feature gate - Implementation for split image filesystem in Kubernetes
- Eviction manager modifications in case of split filesystem
- Summary and Stats Provider implementations
- CRI API merged
- Unit tests
- E2E tests to cover separate image filesystem
- It is not possible to have e2e tests for split filesystem at this stage
Shortly after this release and new CRI package, projects that consume the CRI API can be updated to use the new API features.
- At least one CRI implementation supports split filesystem
- E2E tests supporting the CRI implementation with split image filesystem
- CRI tool changes for image fs
- Gather feedback on other potential use cases
- Always set
KubeletSeparateDiskGC
to true soImageFsInfo
is used instead ofImageStats
in all cases - Always set
KubeletSeparateDiskGC
to true so that eviction manager will detect split file system and handle it correctly
- More than one CRI implementation supports split filesystem
There are two cases that this feature could impact users.
Case 1: Turning the feature on with no split filesystem.
In this case, the main difference will be that clusters that use the CAdvisor Stats Provider
, we will switch to using ImageFsInfo
to report
the image filesystem statistics. Turning off this feature will use ImageStats
.
Case 2: Feature is turned on and the container runtime is set up to split filesystem. In this case, rolling back this feature is only supported if one also configures the container runtime to not split the filesystem.
Another case that is important to highlight is that some container runtimes may not support split filesystem,
We will guard against a container runtime not returing a container filesystem in ImageFsInfo
.
In this case we would assume that the image filesystem and the container filesystem are identical.
Since older versions of the container runtimes do not have the ability to split the filesystem, we don't foresee much issue with this. kubelet will not behave differently if the container and image filesystems are identical.
The initial release of this will be the CRI API and changes to kubelet.
We do not assume that container runtimes must implement this API so we will assume a single filesystem for images.
Once the container runtimes implement this API and the feature gate is enabled, then the feature would be active.
If a container runtime is configured to split the image filesystem, there is no really good way to roll these changes back. We will include a feature gate for best practices to guard against our code.
- Feature gate (also fill in values in
kep.yaml
)- Feature gate name: KubeletSeparateDiskGC
- Components depending on the feature gate: kubelet
- Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control plane? It depends. If the control plan is run on kubelet, then yes. If the control plane is not run on kubelet, then no
- Will enabling / disabling the feature require downtime or reprovisioning of a node? Yes. One needs to restart the container runtime on the node to turn on support for split image filesystem
Our recommendation to roll this change back:
- Configure your container runtime to not split the image filesystem.
- Restart the container runtime.
- Restart kubelet with feature flag off.
Yes, we will switch to using ImageFsInfo
to compute disk stats rather than call ImageStats
.
The eviction manager will monitor the container filesystem if the image filesystem is split.
There are two possibilities for this feature:
- Container runtime is configured for split disk
- Container runtime is not configured for split disk
If the feature toggle is disabled in 1, then turning off the feature will tell eviction manager that the containerfs=imagefs.
The container garbage collection will try to delete the writeable layer on the image filesystem which may not be there.
kubelet will still run but there could be a possibility that the container filesystem will grow unchecked and eventually cause disk pressure.
In case 2, rolling back this feature will be possible because we will use ImageStats
to compute the filesystem usage.
Since the container runtime is configured to not split the disk, nothing would really be changed in this case.
Nothing as long as the container runtime is set up to split again.
Yes, even though roll back is not supported, we will be switching to using ImageFsInfo
for stats on the file system.
This will be guarded by a feature gate and we will test negative and positive test cases.
If the filesystem is not split, this rollout or rollback will be a no-op.
If the filesystem is split and you want to roll back the change that will require a change to the container runtime configuration.
If one does not want to change the container runtime configuration, there could be a possibility that node pressure could happen as garbage collection will not work. The container filesystem would grow unbounded and would require users to clean up their disks to avoid disk pressure.
If a cluster is evicting a lot more pods (node_collector_evictions_total
) than normal, this could be caused by this feature.
The eviction manager monitors the image filesystem, node filesystem and the container filesystem for disk pressure.
If any of these filesystems are experencing I/O pressure, pods will start being evicted and the eviction manager will trigger garbage collection.
The metric node_collector_evictions_total
will inform operators that something is wrong because pods will be evicted and until disk pressure resolves itself, new workloads are not able to run.
Not yet.
We are testing with container runtimes not requiring this API implementation for initial alpha.
In future releases, we could test this.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No.
This feature will be hidden from the users mostly but if an operator wants to know, it is possible to use crictl.
crictl imagefsinfo
can be used to determine if the file systems are split.
crictl imagefsinfo
will return a json object of file system usage for the image filesystem and the container filesystem. If the image filesystem is not split, then the image filesystem and container filesystem will have identical statistics.
- Other (treat as last resort)
crictl imagefsinfo
will give stats information about the different filesystems.
A user could check the filesystem for containers and images.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
- Metrics
- Metric name: node_collector_evictions_total
- Components exposing the metric: kubelet
Are there any missing metrics that would be useful to have to improve observability of this feature?
No.
The container runtime needs to be able to split the writeable and the read-only layer.
No.
N/A
N/A
There is an additional field added to the CRI api for ImageFsInfoResponse
.
- API type: protobuf array of FileSystem Usage
- Estimated increase in size: 24 bytes and a variable length string for the mount point
- Estimated amount of new objects: 1 element in the array
- API type: ContainerFilesystem in Summary Stats
- Estimated increase in size: 24 bytes plus a variable length string for the mount point
- Estimated amount of new objects: 1 ContainerFilesystem for Summary Stats
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
Yes. We are adding a way to split the image filesystem so it will be possible for disk space to be used.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
Yes. We are adding a way to split the image filesystem so it will be possible for inodes/disk space to be used.
We will add a new eviction api for ContainerFS to handle a case if the container filesystem has disk pressure.
The split disk means that we will need to monitor image disk size on the ImageFs and the writeable layer on the rootfs.
This feature does not interact with the API server and/or etcd as it is isolated to kubelet.
- Pods do not start correctly
- Detection: The user notices that the desired pods are not starting correctly, and their status indicates an error or a failure related to image pull failures, which can then be traced to the Split Image Filesystem feature.
- Mitigations: The Split Image Filesystem feature can be disabled as a mitigation step. However, it is not without side effects, where any container images downloaded before would have to be downloaded again. Thus, further investigation would be recommended before a decision to disable this feature is made. The user should also ensure that if the feature is disabled, enough disk space will be available at the location where the ContainerFs filesystem is currently pointed against. A restart of kubelet will be required if this feature is to be disabled.
- Diagnostics: Kubernetes cluster events and specific pods statutes report image pull failures that are related to problems with one of the filesystem access permissions, storage volumes issues, mount points issues, etc., where none of the reported issues are related to disk space utilisation, which would otherwise trigger pods eviction. Reviewing CRI and kubelet service logs can help to determine the root cause. Additionally, reviewing operating system logs can be helpful and can be used to correlate events and any errors found in the service logs.
- Testing: A set of end-to-end tests aims to cover this scenario.
The operator should ensure that:
- The underlying node is currently not under high load due to high CPU utilisation, memory pressure or storage volume latency (with the focus on I/O wait times)
- There is sufficient disk space available on the filesystem or volume that is used for the image filesystem to use to store data
- There are a sufficient number of inodes free and available, especially if the filesystem does not support a dynamic inodes allocation, on the provisioned filesystem where the image filesystem will store data
- The volume, if backed by a local block device or network-attached storage, has been made available to the image filesystem to be used to store data
- The CRI, container runtimes and kubelet have access to the location on the filesystem or the volume (block device) where the image filesystem will be storing data
- The system user, if either CRI, container runtimes or kubelet have been configured to use a system user other than the privileged one such as root, has access to the filesystem location or volume where the image filesystem will store data
- The node components, such as the CRI, container runtimes and kubelet, are up and running, and service logs are free from errors that might otherwise impact or degrade any of the components mentioned earlier
- The CRI, container runtimes and kubelet service logs are free from error reports about the configured ContainerFs, ImageFs, and otherwise configured filesystem location or storage volumes
Additionally, the operator should also confirm that the necessary CRI and kubelet configuration has been deployed correctly and points to a correct path to a filesystem location where the image filesystem will be storing data.
While troubleshooting issues potentially related to the Split Image Filesystem feature, it's best to focus on the following areas:
- Current CPU and memory utilisation on the underlying node
- Storage volumes, disk space availability, and sufficient inodes capacity
- I/O wait times, read and write queue depths, and latency for the storage volumes
- Any expected mount points, whether bind mounts or otherwise
- Access permission issues
- SELinux, AppArmor, or POSIX ACLs set up
- The kernel message buffer (dmesg)
- Operating system logs
- Specific services logs, such as CRI, container runtimes and kubelet
- Kubernetes cluster events with a focus on evictions of pods from affected nodes
- Any relevant pods or workloads statuses
- Kubernetes cluster health with a focus on the Control Plane and any affected nodes
- Monitoring and alerting system or services, with a focus on recent and historic events (past 24 hours or so)
If the Kubernetes cluster sports an observability solution, it would be useful to look at the collected usage metrics so that any problems found could potentially be correlated to events and usage data from the last 24 hours or so.
For cloud-based deployments, it would be prudent to interrogate any available monitoring dashboards for the node and any specific storage volume and to ensure that there is enough IOPS capacity provisioned and available, that the correct storage type has been provisioned, and that metrics such as burst capacity for IOPS and throughput aren't negatively impacted, should the storage volume support such features.
- Initial Draft (September 12th 2023)
- KEP Merged (October 5th 2023)
- Alpha Milestone #1 PRs merged (October 31st 2023)
- Alpha Milestone #2 PRs merged (December 22nd 2023)
This could increase the amount of ways to configure kubelet to work and provide more difficulty in trouble shooting.
In this case, we considered bypassing CAdvisor and have CRI return node usage information entirely. This would require container runtimes to report disk usage/total stats in the ImageFsInfo endpoint.
We decided to not go this route as we intend to support only two filesystems so we don’t need a separate tracking filesystem. We already have node and image statistics so we choose to use either use node or image in this KEP.
If one wants to support the writable layer as an entirely separate disk, then either extensions to CAdvisor or CRI may be needed as one will need to know information about the writable layer disk.
In the internal API, kubelet directly uses the image filesystem array rather than the ImageFsInfoResponse
.
To keep API changes minimal, we could have all containerd/cri-o add container filesystems to the image filesystem.
This would work but it would require some additions to the file system usage with a label for images/containers.
We decided to not go this route as there could be more use cases to add to ImageFsInfoResponse that would not fit in the array type.
E2E Test configuration with separate disks. It may be possible to use a tmpfs for this KEP.