Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watching jobs in a specific namespace #122

Closed
ghost opened this issue Nov 16, 2021 · 10 comments · Fixed by #123
Closed

Watching jobs in a specific namespace #122

ghost opened this issue Nov 16, 2021 · 10 comments · Fixed by #123
Milestone

Comments

@ghost
Copy link

ghost commented Nov 16, 2021

Hi, I'm not sure if I'm doing something wrong or if there's an issue but here's the problem I'm facing. I have jobs created in a namespace called tasks. Following the security principle of least privilege, I give my application permission to interact with jobs only in that namespace. I try to watch them with code like this:

{:ok, conn} = K8s.Conn.from_service_account()
operation = K8s.Client.get("batch/v1", "Job", name: "748f77a3-cfc5-4eb6-9597-e722898ccb41", namespace: "tasks")
K8s.Client.watch(conn, operation, [stream_to: self(), recv_timeout: :infinity])

But I receive these messages back:

%HTTPoison.AsyncChunk{ chunk: "{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"jobs.batch \\\"748f77a3-cfc5-4eb6-9597-e722898ccb41\\\" is forbidden: User \\\"system:serviceaccount:default:manager\\\" cannot watch resource \\\"jobs\\\" in API group \\\"batch\\\" at the cluster scope\",\"reason\":\"Forbidden\",\"details\":{\"name\":\"748f77a3-cfc5-4eb6-9597-e722898ccb41\",\"group\":\"batch\",\"kind\":\"jobs\"},\"code\":403}\n", id: #Reference<0.1813016975.3908304897.184515> }

The message says 'cannot watch resource "jobs" in API group "batch" at the cluster scope' so I tried to give permissions at the cluster level and indeed when I do that the code above works as expected. I enabled request tracing at the hackney level and noticed that K8s.Client.watch/3 actually triggers several HTTP requests, some of which do not include the namespace such as this one: GET /apis/batch/v1/jobs?labelSelector=&fieldSelector=metadata.name%3D748f77a3-cfc5-4eb6-9597-e722898ccb41&resourceVersion=76928358&watch=true
I suspect these are the requests causing 403 errors. I tried having a look at the code and wasn't really able to figure out what's going on but it seems that somehow the namespace gets lost at some point in the succession of requests that are triggered by watch. Note that this doesn't happen for other operations, such as creating jobs.

@mruoss
Copy link
Collaborator

mruoss commented Nov 19, 2021

This is caused by K8s.Client.Runner.Watch.get_to_list/1 where the :path_params (which hold the namespace) are emptied. Not sure why this is done. @coryodaniel is it really necessary for get_to_list/1 to be so complex? Can't it just change the verb of the operation to :list?

Anyway @amarandon you could try to use K8s.Client.list() directly as that's what watch is working with:

{:ok, conn} = K8s.Conn.from_service_account()
operation = K8s.Client.list("batch/v1", "Job", name: "748f77a3-cfc5-4eb6-9597-e722898ccb41", namespace: "tasks")
K8s.Client.watch(conn, operation, [stream_to: self(), recv_timeout: :infinity])

@coryodaniel
Copy link
Owner

coryodaniel commented Nov 19, 2021 via email

@mruoss
Copy link
Collaborator

mruoss commented Nov 19, 2021

Found this in the dungeons - still not sure why though...

I removed the whole function get_to_list/1 and changed the run for GET to

  def run(%Conn{} = conn, %Operation{method: :get, verb: :get} = operation, rv, http_opts) do
    run(conn, %Operation{operation | verb: :list}, rv, http_opts)
  end

=> all tests still green... (?)

@coryodaniel
Copy link
Owner

I’m not sure if the Watch API has changed, but previously you couldn’t watch a single resource, you had to watch a list and filter it.

kubernetes/kubernetes#43299

@mruoss
Copy link
Collaborator

mruoss commented Nov 20, 2021

Reading the Kubernetes API Concepts Docs, I'd say this must have changed.

Kubernetes uses the term list to describe returning a collection of resources to distinguish from retrieving a single resource which is usually called a get. If you sent an HTTP GET request with the ?watch query parameter, Kubernetes calls this a watch and not a get (see Efficient detection of changes for more details).

and

The Kubernetes API allows clients to make an initial request for an object or a collection, and then to track changes since that initial request: a watch. Clients can send a list or a get and then make a follow-up watch request.

Or am I misinterpreting it? Guess I'll have to make a few test requests...

@coryodaniel
Copy link
Owner

coryodaniel commented Nov 20, 2021 via email

@mruoss
Copy link
Collaborator

mruoss commented Nov 22, 2021

I think I am indeed misinterpreting the text. Also the issue you linked speaks that language. I did not run integration tests but just tried to watch a single resource. The server would return immediately, ignoring my ?watch query parameter.

curl --request GET \
  --url 'http://localhost:9998/api/v1/namespaces/asdf/configmaps/test?watch=1&resourceVersion=1379' \
  --header 'Content-Type: text/yaml'

returns immediately:

{
  "kind": "ConfigMap",
  "apiVersion": "v1",
  "metadata": {
    "name": "test",
    "namespace": "asdf",
    "uid": "aff0c91b-5086-4d02-a133-871a08412d66",
    "resourceVersion": "1379",
    "creationTimestamp": "2021-11-22T07:52:08Z",
    "annotations": {
      "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"data\":{\"some\":\"test\"},\"kind\":\"ConfigMap\",\"metadata\":{\"annotations\":{},\"name\":\"test\",\"namespace\":\"asdf\"}}\n"
    },
    "managedFields": [
      {
        "manager": "kubectl-client-side-apply",
        "operation": "Update",
        "apiVersion": "v1",
        "time": "2021-11-22T07:52:08Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {"f:data":{".":{},"f:some":{}},"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}}}
      }
    ]
  },
  "data": {
    "some": "test"
  }
}

And now, thinking about it, this makes sense. Imagine the resource does not exist the moment you start watching. You won't get a resourceVersion etc. Not technically impossible to solve but I guess just not supported by kubernetes.

Back to the actual issue at hand, I guess we can try to only remove the name from the :path_params and keep the :namespace.

@ghost
Copy link
Author

ghost commented Nov 22, 2021

@mruoss Yes using K8s.Client.list directly works. I've also tested your commit and it works for my use case!

@mruoss
Copy link
Collaborator

mruoss commented Nov 22, 2021

If you're using K8s.Client.list directly, just like I proposed, you're probably watching ALL the jobs in that namespace. When a list operation is turned into a path for the request, the name is ignored.

In order to only watch the one with the given name, you'd have to pass the fieldSelector: "metadata.name=#{name}" to the call to watch, just like get_to_list/1 does:

{:ok, conn} = K8s.Conn.from_service_account()
operation = K8s.Client.list("batch/v1", "Job", namespace: "tasks")
K8s.Client.watch(conn, operation, [stream_to: self(), recv_timeout: :infinity, fieldSelector: "metadata.name=748f77a3-cfc5-4eb6-9597-e722898ccb41"])

@ghost
Copy link
Author

ghost commented Nov 22, 2021

@mruoss Many thanks for the detailed info. For now I was going to pin the version to your commit that addresses the issue in get_to_list/1 until the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants