-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure a single general purpose and more permissive network policy #741
Comments
berkeley-dsep-infra/datahub#2825 has more details on the FTP situation, and in berkeley-dsep-infra/datahub#2825 there's discussion about just removing the outbound port restrictions. I think my inclination is that as long as we don't allow arbitrary access inside the cluster (where the user server can just hit the prometheus server, for example!) port restrictions are pointless. So if we can allow unrestricted access to the internet but not local cluster network we're all good. |
This sounds like a good approach to me - most people that users assume that "everything is possible", so if we restrict things then it'll usually be confusing to folks. So I'd be +1 on making most outbound traffic possible by default and the action would be a community rep asking for something to be restricted for some reason. |
If I understand correctly, this is z2jh's default setting. So the implementation of this would look like a PR that removes all the |
@sgibson91 I think so, but I want to see how it interacts with non Jupyterhub things in cluster. Primarily, I want to verify that we can't hit the Prometheus or grafana servers from user pods, as that can leak information. There is no password auth or anything for Prometheus, so we have to rely on network policy here. |
One way to test would be to spin up a test cluster with the support chart and a hub with all ports open and see if we can hack it |
I think the right call may be to have tight allowances on what is allowed to access prometheus rather than specific exceptions for what the user pods cant submit traffic to unless they are general, like no local ip ranges. So one idea is to set a netpol saying allow 0.0.0.0/0 (all) except: and list 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 (private ip ranges) and the cloud metadata server also listed in thr z2jh defaults (169.254.169.254). Thats a simple rule that may work reliably without exceptions, or not. z2jh defaults is to allow all egress but to that cloud metadata server ip. Note that i dont know much about that and are afraid it may be an overloaded word used for different things - but i refer to the ip blocked by z2jh by default: 169.254.169.254. |
@sgibson91 I think an easier way is to just test this on the pilot hubs cluster by locally doing a deploy just to the staging hub, starting a server there, and trying to access the prometheus server. Even better would be to run |
The config I locally deployed to 2i2c-staging to test this is represented in this draft PR: #746 I then spun up my user server, opened a terminal and tried to I also tried to run |
I guess you may need to use an image with nmap already installed... |
This is an interesting idea... |
Thanks for testing this, @sgibson91! I looked at which ports the prometheus service exposes with
so that's port 80, while ssh only tests port 22. So you can test it with We should also check if we can access the pod directly! By looking at Sorry if my use of the word 'access' caused confusion here - I just meant network access, not ssh! |
NetworkPolicies often operate at the level of pods not services, hence the need to test both. |
Is this issue also something that we could test as a part of CI/CD? (e.g., via #722 + some test suite that were run on a new hub?). It would be really nice if we could tell our user communities that things like "security policies for the hubs" are also tested as part of our deployments, so we could have tests that did thins like "spin up a server, try to do something hub admins generally don't want people to do, confirm that it isn't possible in the staging hub" |
I think we should look into this tool for security testing https://github.com/armosec/kubescape |
💯 , super interesting!! |
@yuvipanda yup, both of those are reachable |
mmm... so I guess we would need to go with @consideRatio's idea? |
Yes, I will try this now. FYI, I came across this cool resource while thinking about how to write the proposed netpol https://github.com/ahmetb/kubernetes-network-policy-recipes |
We use the Calico network policy plugin, right? So we can use something like (I'm quite new to writing network policies for Kubernetes so I'm not sure if I'm approaching this correctly?) |
NetworkPolicy as a k8s official resource (enforced by calico), does not support denying resources, only allowing. A pod's egress targetted by a netpol will make no traffic besides whats allowed in a netpol, to be allowed. If a pod's egress isnt targetted by any netpol, it is unrestricted and all is allowed. So the act of targetting a pod restricts it to whats allowed explicitly, and you cant subtract/dent permissions, only add. |
I agree with this assessment! It seems like the major next steps on this one are to document the changes implemented in #774 for hub admins and engineers. I've updated the top comment with those remaining tasks. @sgibson91 are you OK to make those docs additions as well? |
Reopening until I've tackled docs |
@sgibson91 also, in https://staging.us-central1-b.gcp.pangeo.io, the python code can't actually access |
Thanks @yuvipanda - this is the block I need to add, right? yuvipanda/datahub@e8544b0#diff-c7ff155658f0273e3fd2df49dee1551363b934660d0cb6e169938288442eba05R97-R105 |
Hmmm? Python code from within the k8s cluster can't curl |
I opened #782 |
Documentation PRs:
I wasn't really sure what to write for the engineering docs 🤷🏻♀️ |
@sgibson91 yep, that block adds support for talking to the hub public IP when using ingress. We'd probably also need a another block for talking to the public IP when it's using autohttps rather than nginx-ingress tho. @consideRatio I think the packets somehow get routed internally and don't go out into the internet and back? Not sure :) |
Perhaps the logic is along the lines that...
|
IIRC, autohttps is deployed into it's own pod in the hub namespace, yes? So we could achieve this with a podSelector rule like the below:
|
@sgibson91 yeah, I eventually ended up with something like that in https://github.com/berkeley-dsep-infra/datahub/blob/6f6506d12ecdd9799ba599d3bf878a00815a6c89/hub/values.yaml#L113. However, I also needed to specify the namespace, which is a bit annoying - ideally i'd want to be able to say 'pods in any namespace with these labels', and that's what the docs suggest we should be able to do, but alas that doesn't seem to work? |
Huh, even though the autohttps and user pods are in the same namespace? Or am I wrong in that assumption? |
@sgibson91 ah, in the berkeley case there are like ten hubs so each hub I want to be able to hit other hub endpoints too if necessary. So I think the same namespace case works, but I need additional namespaces to work too |
I opened #788 Hopefully this is the last piece! 🤞🏻 |
I think I've hit all the major points of this now so I'm going to close this and we can open more issues as things arise 🙌🏻 |
Issue update
We currently look to try block all outbound egress besides to local IP ranges and the cloud metadata server.
@sgibson91 and I debugged a failure to make an outbound access from a user pod to an FTP server exposed via port 21.
We pinpointed the issue to be that we override the z2jh Helm chart's default value for
singleuser.networkPolicy.egress
. The Z2JH default is very permissive, it allows all outbound traffic except for traffic to a specific IP (169.254.169.254) associated with a so called cloud metadata server.By overriding
singleuser.networkPolicy.egress
, we are reducing the egress permissions to exclude outbound FTP access.Discussion points
Reference
Tasks
Discussion captured from video conv
The text was updated successfully, but these errors were encountered: