Linux capabilities are one of those security layers that I wasn’t aware of before working with Kubernetes and even then would only (generally) see in some lengthy, overwhelming podspec examples.
What are linux capabilities?
Per the linux man pages:
[capabilities are] the privileges traditionally associated with superuser into distinct units…which can be independently enabled and disabled.
Capabilities are a per-thread attribute.
Interestingly enough I haven’t actually seen any examples of using capabilities outside of Docker/Kuberenetes. So with regards to containers and container runtimes, capabilities allow you to selectively assign fine-grained permissions to specific container images without having to give up full root permissions. Alright, that sounds pretty cool and directly in-line with the whole concept of minimizing the attack surface area.
So what capabilities are available and what are the defaults?
The Docker docs have a great breakdown of the capability key names and associated descriptions with separate lists for the default and additional available capabilities. Edit: the linux manpages go into more detail about what functionality each capability provides.
Why would I need to change capabilities and how do I use them?
Maybe you took a look at the aforementioned capabilities Docker docs and thought “yup this looks fine”. Or maybe you thought “wow there’s a lot going on here, does every container need all these permissions?”.
Either way is technically alright, but if you know exactly what your software needs access to there is definitely room for improvement from a least-privileged access model point of view. You can look at this as one more optimization available to make, akin to optimizing container image layer size or the process of moving from a full-blown OS base image to minimal OS base image to an OS-less base.
Alternatively, sometimes an application needs to make a few system modifications that are outside of the scope of normal permissions. That’s OK too, and it is nice that these defaults are in place so that we have to explicitly enable that privilege escalation.
So now for the how.
Docker provides the --cap-add
and --cap-drop
flags to explicitly manage capabilities as needed. Both the add and drop directives support the ALL
alias.
docker run -it --rm --cap-add=NET_ADMIN ubuntu:14.04 ip link add dummy0 type dummy
The Kubernetes podspec contains the container.securityContext.capabilities.add
and container.securityContext.capabilities.drop
arrays where specific capabilities can be added and dropped for any given container.
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-4
spec:
containers:
- name: sec-ctx-4
image: gcr.io/google-samples/node-hello:1.0
securityContext:
capabilities:
add: ["NET_ADMIN", "SYS_TIME"]
drop: ["CHOWN"]
A keen eye will notice that docker has the --privileged
flag and kubernetes provides the container.securityContext.privileged
podspec options.
Those are basically a way to get around the default capabilities limits set in place by the container runtime by handing over all capabilities and are the modern-day equivalents to “just run it as root” (which you could also actually do). Technically you can follow that advice, but you probably shouldn’t unless you have a good reason to and know exactly what you’re doing.