Feature Request: Relax volume constraint to support more volume types #4417

dgerd · 2019-06-18T23:25:11Z

In what area(s)?

/area API

/area autoscale
/area build
/area monitoring
/area networking
/area test-and-release

Describe the feature

We currently only allow for configmap and secret volumes to be mounted into the user container. This constraint is put in place as volumes are a source of state and can severely limit scaling. This feature request is to relax this constraint to allow a larger set of volumes to be mounted that work well with serverless functions.

I do not have a particular set of volume types in mind, but #4130 may be a good example.

yu2003w · 2019-07-17T08:20:10Z

To support PVCs, when scaling up/down apps, how could data be handled?
Will knative take the work?
Or app should handler data migration gracefully when scale up or scale down.

weikinhuang · 2019-08-29T21:45:25Z

Would it be possible to add emptyDir as an allowed type as that would not have state, should not present a scaling problem, and have rw storage outside the docker overlay. Also having emptyDir and medium: Memory will allow users to create larger tmpfs mounts where write intensive operations can happen in memory and not disk, which could wear down SSDs on selfhosted instances and would be orders of magnitude faster. As the default size for /dev/shm is only 64M. One of the suggested workarounds for increasing that is actually using a emptyDir with medium Memory.

I suppose that can be solved by writing a custom mutatingadmissionwebhook.

gustavovalverde · 2019-12-22T00:45:21Z

I have a very valid use case using Odoo, which saves all generated attachments on a NFS when using multi-tenant deployments, and this is being used on an actual k8s deployment with istio.

Knative can make things easier for us, but we can't drop the NFS (that's not even a source of state for us). There should be someway to accomplish this. If its not an issue with k8s it shouldn't be a constraint using Knative. That NFS should not impact a Knative deployment at all.

dgerd · 2019-12-23T14:39:19Z

@gustavovalverde Thanks for sharing your use-case. This is something that is on the radar of the API Working Group, but we do not have someone actively working on this right now.

The "Binding" pattern as talked about in https://docs.google.com/document/d/1t5WVrj2KQZ2u5s0LvIUtfHnSonBv5Vcv8Gl2k5NXrCQ/edit#heading=h.lnql658xmg9p could be a potential workaround to inject these into the deployment that Knative creates while we work on getting this issue resolved. See https://github.com/mattmoor/bindings for examples.

cc @mattmoor

gustavovalverde · 2019-12-29T15:05:05Z

@dgerd @mattmoor I'd really appreciate an example on how to use bindings for this use case. I'll test it and give the feedback here so others with the same restriction can use this workaround.

mattmoor · 2019-12-29T22:52:57Z

@dgerd and I spent some time discussing this idea before the holidays began. I think he wanted to try to PoC it. If not, then he and I should probably write up a design doc to capture our thoughts/discussion.

jeffgriffith · 2020-02-14T15:32:59Z

@mattmoor Do I read this correctly that i cannot use ReadWriteMany PVCs at all in a Knative service? I have a simple uploader service that needs to deposit data in an Azure files pvc volume. I understand the desire for statefulness but I don't see this as different from inserting data into a database. The "persistence" isn't in the pod in either case. Thanks for any insight. --jg

mattmoor · 2020-02-15T20:16:47Z

I don't think we've figured out how to allow this in a way that doesn't have pointy edges that folks will stick themselves on. I totally agree that the filesystem is a useful abstraction for write-many capable devices.

JRBANCEL · 2020-02-26T22:28:36Z

Bumping this issue because it something that most users I have met want to do.

I don't think we've figured out how to allow this in a way that doesn't have pointy edges that folks will stick themselves on.

True, but realistically we most likely will never be able to prevent users to shoot themselves in the foot. We have seen them bypass the limitations with WebHooks to inject sidecars, use the downward API and mount volumes anyway.

The binding pattern is really interesting but maybe too complicated for typical users who just want to have the Kn Pod Spec be 100% compatible with the k8s Pod Spec.

mdemirhan · 2020-02-27T00:01:22Z

As an example to what JR said above, both Datadog and New Relic use Unix domain sockets to collect metrics and exposing that is going to be important to support customers using these systems. In case of Datadog, the predominant way of using it is to deploy it as a Daemonset to the cluster and have customers and utilize UDS to send metrics to the agent local to the node. Another alternate is to use host IP within the user code to send the metrics to the Daemonset, but in order to ensure that the metrics are sent to the host node and not a random node in the system, user has to use k8s downward API to feed the IP of the host to the revision, but that doesn't work either because we don't support k8s downward APIs.

Would love to get everyone's opinion on two things:

Can we extend the current list and support hostPath? While this could potentially have pointy edges, lack of this is going to be an adoption blocker for a large set of scenarios - especially ones that involve use of Deamonsets (very common in logging & monitoring scenarios).
Can build an extension point here and allow vendors to extend this default list with vendor specific additions? That way, Knative can still focus on a set of core scenarios and vendors will be responsible for supporting & maintaining their additions to this list.

mattmoor · 2020-02-27T05:28:44Z

True, but realistically we most likely will never be able to prevent users to shoot themselves in the foot. We have seen them bypass the limitations with WebHooks to inject sidecars, use the downward API and mount volumes anyway.

Yep, I agree. I think my prior comment is likely easily misinterpreted as "No, we need to solve this problem", but my intent was simply to convey that this isn't a slam dunk, there are downsides/gotchas that we'll have to be sure to clearly document.

The binding pattern is really interesting but maybe too complicated for typical users who just want to have the Kn Pod Spec be 100% compatible with the k8s Pod Spec.

The position I've been advocating for is actually to expand the surface of PodSpec that we allow to enable the binding pattern to target Service (as the subject) vs. forcing folks to reach around and use it with our Deployments. Sure if can be used to reach around us, but I agree that here it is inappropriate and overkill.

Can we extend the current list

I think we should absolutely expand the list, I have mixed feelings on hostPath (aka privilege), but we should discuss on a WG call. Especially with multiple container support coming the filesystem becomes an extremely interesting channel for intra-pod communication. The Google Cloud SQL proxy comes to mind 😉

I think at this point what we need is someone to drive the feature by putting together the appropriate feature track documentation and running it through the process.

knative-housekeeping-robot · 2020-05-27T12:00:58Z

Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle stale

vagababov · 2020-05-27T19:15:10Z

I think we still want this
/remove-lifecycle stale

JRBANCEL · 2020-05-27T22:07:19Z

Yes, this could be behind a feature flag.
I'll take a look after I add support for Downward API.

gustavovalverde · 2020-07-16T10:55:00Z

Hi, is there a workaround for this or is it a WIP?

JRBANCEL · 2020-09-25T18:17:15Z

A workaround is to use a Webhook to inject what you want in the Pod Spec.
Not ideal. This is a WIP, but I don't think anyone is working on it right now.

skonto · 2020-09-28T11:26:42Z

@JRBANCEL I could have a look to this.

JRBANCEL · 2020-09-28T21:01:45Z

@JRBANCEL I could have a look to this.

Great. You can look a the various features behind feature flags for inspiration, for example: #8126

skonto · 2020-09-29T09:27:48Z

Thanks @JRBANCEL probably this needs an official design document/proposal. I will work on it.

skonto · 2020-12-21T12:26:24Z

/assign

skonto · 2021-05-24T13:50:56Z

The ft doc for this can be found here, feel free to add comments (added in the knative team drive).
It is a draft version to get the discussion going and hopefully move things forward.

dprotaso · 2021-06-04T15:31:13Z

Some thoughts

emptyDir as a scratch space makes sense - especially since it's tied to the pod's lifecycle
I'm not really convinced downloading models (via an initContainer) into an emptyDir really makes a lot of sense

To expand on 2) it seems really inefficient and you're not benefiting from any caching. Any additional pods that are scaled up (even on the same node) will always download the model? Maybe the better pattern is to bundle them in an OCI image and access them as a sidecar. Then subsequent starts would be much faster since these model images will be cached by Kubernetes

alxndr42 · 2021-06-09T17:45:12Z

Any additional pods that are scaled up (even on the same node) will always download the model?

Yes, but this is not a major concern for us, our Pods are relatively long-lived.

Maybe the better pattern is to bundle them in an OCI image and access them as a sidecar.

Maybe, but emptyDir is easy to understand and use, whereas I haven't even heard of OCI images before. 😬

dprotaso · 2021-06-14T21:04:46Z

whereas I haven't even heard of OCI images before.

OCI is the standard for container images that run in Docker, Kubernetes, etc.

skonto · 2021-07-07T08:29:07Z

@dprotaso using init containers to get models is an approach already used by Seldon (MLOPS) and others. Check here and here. So it is something not new in that domain.

Seldon Core uses Init Containers to download model binaries for the prepackaged model servers.

They also support autoscaling via HPA and KEDA. KFServing (Kubeflow) does the same: kserve/kserve#849.
This does not mean this is the only way or the best in every occasion. Actually its pretty common also to deploy a model bundled together with a server which exposes some API for predictions. Caching is one benefit as you mentioned and that means shorter scaling times. However not all envs allow or desire to pre-package everything for example for data governance reasons etc. Also if the model is not that big it should not be an issue. In addition some times people want to experiment and that could be much easier than building images so from a UX perspective it is also acceptable.

skonto · 2021-07-12T07:21:19Z

@dprotaso I will update the feature track based on latest comments and will start implementation asap if there are no objections. Some basic scenarios like emptyDir support should be there imho.

swapkh91 · 2021-12-07T11:01:00Z

any update on this?
I have a use case to increase shm-size when deploying Triton Inference Server on GKE using Kserve. The only solution I could find online was this

But this gives knative error

Warning  InternalError  14s (x2 over 29s)  v1beta1Controllers  fails to reconcile predictor: admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: expected exactly one, got neither: spec.template.spec.volumes[0].configMap, spec.template.spec.volumes[0].projected, spec.template.spec.volumes[0].secret

alxndr42 · 2021-12-07T14:13:01Z

@swapkh91 Support for emptyDir volumes is available, behind a feature flag.

swapkh91 · 2021-12-07T14:23:20Z

@swapkh91 Support for emptyDir volumes is available, behind a feature flag.

@7adietri yeah I tried that by enabling it in the configmap
Then changed my yaml as per this
The above error was resolved but now I'm getting the below error in Triton logs:

'boost::interprocess::interprocess_exception' what():  Read-only file system

Can't find anything to resolve this

mountPath has to be /cache as per Knative or anything should work?

alxndr42 · 2021-12-07T15:03:26Z

@swapkh91 I'm sorry, I don't know what the issue is on your particular setup beyond enabling emptyDir support. (And this issue is probably not the appropriate place. But Read-only file system seems worth investigating.)

swapkh91 · 2021-12-07T15:25:54Z

@swapkh91 I'm sorry, I don't know what the issue is on your particular setup beyond enabling emptyDir support. (And this issue is probably not the appropriate place. But Read-only file system seems worth investigating.)

@7adietri yeah looking into it. Anyway, thanks for the reply

skonto · 2021-12-08T11:42:55Z

Hi @swapkh91!

mountPath has to be /cache as per Knative or anything should work?

It can be anything.

swapkh91 · 2021-12-08T12:01:29Z

Hi @swapkh91!

mountPath has to be /cache as per Knative or anything should work?

It can be anything.

@skonto thanks for the confirmation. The problem I'm still facing is the read-only thing as I've stated above. Not sure if its coming from Knative side or Kserve

skonto · 2021-12-08T13:58:28Z

Probably its the triton container? I see a similar bug here, also here things seem to be read-only. Could you try test a trivial service and compare with triton:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: emptydir
  namespace: default
spec:
  template:
    spec:
      containers:
        - imagePullPolicy: Always
          image: docker.io/skonto/emptydir
          volumeMounts:
            - name: data
              mountPath: /data
          env:
            - name: DATA_PATH
              value: /data
      volumes:
        - name: data
          emptyDir: {}

kubectl describe pods in my setup shows (rw):

...
    Environment:
      DATA_PATH:        /data
      PORT:             8080
      K_REVISION:       emptydir-00001
      K_CONFIGURATION:  emptydir
      K_SERVICE:        emptydir
    Mounts:
      /data from data (rw)

What do you see at your side? The triton error happens at runtime I suppose.

swapkh91 · 2021-12-08T14:33:45Z

Probably its the triton container? I see a similar bug here, also here things seem to be read-only. Could you try test a trivial service and compare with triton:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: emptydir
  namespace: default
spec:
  template:
    spec:
      containers:
        - imagePullPolicy: Always
          image: docker.io/skonto/emptydir
          volumeMounts:
            - name: data
              mountPath: /data
          env:
            - name: DATA_PATH
              value: /data
      volumes:
        - name: data
          emptyDir: {}

kubectl describe pods in my setup shows (rw):

...
    Environment:
      DATA_PATH:        /data
      PORT:             8080
      K_REVISION:       emptydir-00001
      K_CONFIGURATION:  emptydir
      K_SERVICE:        emptydir
    Mounts:
      /data from data (rw)

What do you see at your side? The triton error happens at runtime I suppose.

@skonto yeah it works for a trivial service
I modified the yaml a bit and still it worked, to see if /dev/shm is accessible

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: emptydir
  namespace: default
spec:
  template:
    spec:
      containers:
        - imagePullPolicy: Always
          image: docker.io/skonto/emptydir
          volumeMounts:
            - name: dshm
              mountPath: /dev/shm
          env:
            - name: DATA_PATH
              value: /dev/shm
      volumes:
        - name: dshm
          emptyDir:
            medium: Memory

In the pod description I can see

Environment:
      DATA_PATH:        /dev/shm
      PORT:             8080
      K_REVISION:       emptydir-00001
      K_CONFIGURATION:  emptydir
      K_SERVICE:        emptydir
    Mounts:
      /dev/shm from dshm (rw)

Then I guess its from Kserve side! Also, Kserve currently uses 0.23.2 version of Knative (as per quick_install script), which is also incompatible for using emptyDir, I changed it to current version for my case.

My case is to increase the shm-size, which normally can be done in docker run --shm-size=1g .... The solution online to do it in Kubernetes suggests using emptyDir, link

boudo · 2022-11-08T16:05:59Z

Hi @skonto, in your comment here you said:

Hi rw access is allowed for emptydir only at the moment but full rw pvc support is coming. @Phelan164 the code you mention excludes emptydir volumes. So if emptyDir is used you should get write access unless something else is wrong in the given setup (not K8s spec related).
All other volume types will remain in read-only mode unless explicitly defined otherwise. There are thoughts though to allow completely the K8s pod spec using some flag to enable this on demand eg. for advanced users etc.

I would like to know if read and write access is currently supported for PVCs in an InferenceService?

steren · 2022-12-13T05:30:19Z

With containers allowing more than one containers, it makes sense to allow emptyDir with medium: Memory for these multiple containers to exchange data (e.g. logs) via this volume.

zmrubin · 2022-12-18T00:56:48Z

Bumping, this is a feature also need for my project! would be very useful

skonto · 2023-10-09T07:54:51Z

Hi @boudo sorry for the late reply regarding:

I would like to know if read and write access is currently supported for PVCs in an InferenceService?

There is a flag kubernetes.podspec-persistent-volume-write that when set to true allows write pvc access in a ksvc : https://knative.dev/docs/serving/configuration/feature-flags/#kubernetes-persistentvolumeclaim-pvc. For Kserve you need to check what it allows at the moment.

amarflybot · 2023-12-19T16:27:25Z

+1 for hostPath support.

skonto · 2023-12-20T10:17:01Z

@amarflybot hi, could you provide your input here: #13690 (comment).
Also you can have hostpath via pvcs, have you tried that?

amarflybot · 2023-12-20T10:56:54Z

Yes, hostpath could be achieved via pvc. I tried that yesterday it's working great. Thanks

zzuchen · 2024-01-10T01:02:43Z

@gustavovalverde Thanks for sharing your use-case. This is something that is on the radar of the API Working Group, but we do not have someone actively working on this right now.

The "Binding" pattern as talked about in https://docs.google.com/document/d/1t5WVrj2KQZ2u5s0LvIUtfHnSonBv5Vcv8Gl2k5NXrCQ/edit#heading=h.lnql658xmg9p could be a potential workaround to inject these into the deployment that Knative creates while we work on getting this issue resolved. See https://github.com/mattmoor/bindings for examples.

cc @mattmoor

hello,the knative version currently used in our project is v0.23.0. We hope to mount nfs, and pvc is only supported after knative v1.2. However, upgrading the knative version is currently unrealistic for us. Is there any other method? Or do third-party components support us using nfs mounting on knative v0.23.0?

zzuchen · 2024-01-10T01:11:44Z

@gustavovalverde Thanks for sharing your use-case. This is something that is on the radar of the API Working Group, but we do not have someone actively working on this right now.

The "Binding" pattern as talked about in https://docs.google.com/document/d/1t5WVrj2KQZ2u5s0LvIUtfHnSonBv5Vcv8Gl2k5NXrCQ/edit#heading=h.lnql658xmg9p could be a potential workaround to inject these into the deployment that Knative creates while we work on getting this issue resolved. See https://github.com/mattmoor/bindings for examples.

cc @mattmoor

I have a very valid use case using Odoo, which saves all generated attachments on a NFS when using multi-tenant deployments, and this is being used on an actual k8s deployment with istio.

Knative can make things easier for us, but we can't drop the NFS (that's not even a source of state for us). There should be someway to accomplish this. If its not an issue with k8s it shouldn't be a constraint using Knative. That NFS should not impact a Knative deployment at all.

Dear gustavovalverde, could you please tell us how did you mount nfs on knative through odoo?

secat · 2024-07-15T18:49:53Z

Do you have any plan to add support for nfs volume source and for the future oci volume source (see KEP-4639: OCI VolumeSource)?

On our side we are mostly using a nfs volume to share ML models between pods and not mount them in container images. However, we need to create a PersistentVolumeClaim at the moment because of the "volume constraint" in knative service.

In the future, we aimed to use the new oci volume source which fulfill our need (see oci volume source's user story #3).

dgerd added the kind/feature Well-understood/specified features, ready for coding. label Jun 18, 2019

knative-prow-robot added the area/API API objects and controllers label Jun 18, 2019

dgerd mentioned this issue Jun 18, 2019

Does privileged container supported? #4130

Closed

dgerd added this to the Ice Box milestone Jun 20, 2019

dgerd mentioned this issue Jul 1, 2019

ReadMany PVCs #4579

Closed

knative-prow-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 27, 2020

knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 27, 2020

JRBANCEL mentioned this issue Sep 25, 2020

Support emptyDir volumes #9561

Closed

JRBANCEL mentioned this issue Oct 7, 2020

Feature Request: Support Downward API #4190

Closed

knative-prow-robot assigned skonto Dec 21, 2020

dprotaso mentioned this issue Jun 14, 2021

Support model download via INIT containers #4307

Closed

skonto mentioned this issue Jul 13, 2021

Tracking issue for adding support for volume type EmptyDir #11664

Closed

7 tasks

dprotaso added this to Serving API Roadmap Nov 4, 2021

dprotaso moved this to Now in Serving API Roadmap Nov 4, 2021

swapkh91 mentioned this issue Dec 8, 2021

Set shm-size for Triton in yaml kserve/kserve#1937

Closed

Feature Request: Relax volume constraint to support more volume types #4417

Feature Request: Relax volume constraint to support more volume types #4417

Comments

dgerd commented Jun 18, 2019

In what area(s)?

Describe the feature

yu2003w commented Jul 17, 2019

weikinhuang commented Aug 29, 2019 • edited Loading

gustavovalverde commented Dec 22, 2019

dgerd commented Dec 23, 2019

gustavovalverde commented Dec 29, 2019

mattmoor commented Dec 29, 2019

jeffgriffith commented Feb 14, 2020

mattmoor commented Feb 15, 2020

JRBANCEL commented Feb 26, 2020

mdemirhan commented Feb 27, 2020

mattmoor commented Feb 27, 2020

knative-housekeeping-robot commented May 27, 2020

vagababov commented May 27, 2020

JRBANCEL commented May 27, 2020

gustavovalverde commented Jul 16, 2020

JRBANCEL commented Sep 25, 2020

skonto commented Sep 28, 2020

JRBANCEL commented Sep 28, 2020

skonto commented Sep 29, 2020 • edited Loading

skonto commented Dec 21, 2020

skonto commented May 24, 2021 • edited Loading

dprotaso commented Jun 4, 2021 • edited Loading

alxndr42 commented Jun 9, 2021

dprotaso commented Jun 14, 2021 • edited Loading

skonto commented Jul 7, 2021 • edited Loading

skonto commented Jul 12, 2021 • edited Loading

swapkh91 commented Dec 7, 2021

alxndr42 commented Dec 7, 2021

swapkh91 commented Dec 7, 2021 • edited Loading

alxndr42 commented Dec 7, 2021

swapkh91 commented Dec 7, 2021

skonto commented Dec 8, 2021 • edited Loading

swapkh91 commented Dec 8, 2021

skonto commented Dec 8, 2021 • edited Loading

swapkh91 commented Dec 8, 2021 • edited Loading

boudo commented Nov 8, 2022

steren commented Dec 13, 2022

zmrubin commented Dec 18, 2022

skonto commented Oct 9, 2023

amarflybot commented Dec 19, 2023

skonto commented Dec 20, 2023

amarflybot commented Dec 20, 2023

zzuchen commented Jan 10, 2024

zzuchen commented Jan 10, 2024

secat commented Jul 15, 2024 • edited Loading

weikinhuang commented Aug 29, 2019 •

edited

Loading

skonto commented Sep 29, 2020 •

edited

Loading

skonto commented May 24, 2021 •

edited

Loading

dprotaso commented Jun 4, 2021 •

edited

Loading

dprotaso commented Jun 14, 2021 •

edited

Loading

skonto commented Jul 7, 2021 •

edited

Loading

skonto commented Jul 12, 2021 •

edited

Loading

swapkh91 commented Dec 7, 2021 •

edited

Loading

skonto commented Dec 8, 2021 •

edited

Loading

skonto commented Dec 8, 2021 •

edited

Loading

swapkh91 commented Dec 8, 2021 •

edited

Loading

secat commented Jul 15, 2024 •

edited

Loading