NSA Kubernetes Hardening Guidance [pdf]

NSA Kubernetes Hardening Guidance [pdf](media.defense.gov)

635 points by kennethko 4 years ago | 165 comments

- Scan containers and Pods for vulnerabilities or misconfigurations.

- Run containers and Pods with the least privileges possible.

- Use network separation to control the amount of damage a compromise can cause.

- Use firewalls to limit unneeded network connectivity and encryption to protect confidentiality.

- Use strong authentication and authorization to limit user and administrator access as well as to limit the attack surface.

- Use log auditing so that administrators can monitor activity and be alerted to potential malicious activity.

- Periodically review all Kubernetes settings and use vulnerability scans to help ensure risks are appropriately accounted for and security patches are applied.

throwaway984393 4 years ago | |

> and encryption to protect confidentiality

Probably the hardest part about this. Private networks with private domains. Who runs the private CA, updates DNS records, issues certs, revokes, keeps the keys secure, embeds the keychain in every container's cert stack, and enforces validation?

That is a shit-ton of stuff to set up (and potentially screw up) which will take a small team probably months to complete. How many teams are actually going to do this, versus just terminating at the load balancer and everything in the cluster running plaintext?

awsthro00945 4 years ago | | |

For as fundamental and important as encryption-in-transit is, it's always baffled me that there isn't a simpler, easier solution to accomplishing it on private networks. Everyone knows its important, and everyone wants to do it, but it's just such a pain in the ass and so prone to error that even some top security leaders will tell you not to bother because it's such a footgun.

We really need something to help make the process simpler, like how Let's Encrypt made public HTTPS so much easier to do for even the smallest of websites.

3np 4 years ago | | |

Service mesh solutions like Istio / Consul Connect+Vault can help a lot with this.

Depending on the existing size and complexity of your stack those months can be cut down to weeks or even days.

I don't mean to trivialize the time and expertise needed to set up and manage, but if you can afford to run a microservice architecture on k8s already it's definitely not untenable.

zaat 4 years ago | | |

Encryption today is pretty much requirement for any regulated businesses and required practice for any sane shop, with or without Kubernetes. The only difference is that those services are communicating within the cluster internal network and not across different machines in the servers vlans.

If anything, setting up the whole things within Kubernetes ecosystem can be much easier with the available operators and automation frameworks like cert manager and/or Istio.

devy 4 years ago | | |

> That is a shit-ton of stuff to set up (and potentially screw up) which will take a small team probably months to complete.

Agree! This is why that "Kubernetes Hardening Guidance" is for NSA, not for startups.

Resource needs aside, keeping basic AppSec/InfoSec hygiene is a strong recommendation. Also there are tons of startups that are trying to provide solutions/services to solve that also. A lot of times, it's worth the money.

spectre013 4 years ago | | |

We do this right now in a totally disconnected env. We have process in place to get images and manifests into our env. All containers have to go through scanning pipelines and have to be approved through a process.

We also for any container that makes requests that does not have the mechanisms for adding certificates we have to rebuild the containers in the disconnected env to insert certificates to allow communication.

Makes daily life really interesting

xyst 4 years ago | | |

TLS terminating is fine for most use cases (basic web services), but if you are protecting PII information you need to/should protect from external and internal threats.

Maybe a disgruntled sys admin decides to capture data coming in from the load balancer between the service(s) and sell it to the highest bidder. If traffic is encrypted between the load balancer and underlying service, it makes it much harder to do.

sdenton4 4 years ago | | |

For historical context:

https://www.washingtonpost.com/world/national-security/nsa-i...

(Google has since added encryption to its internal entworks.)

calvinmorrison 4 years ago | | |

Isn't this exactly what hashicorps "consul" can do? Specific services, and setup keys/certs so that all internal traffic is also 'blindly' encrypted? End point services don't care or know about it because it's transparent, but over the internal network it's encrypted?

__app_dev__ 4 years ago | | |

I agree there is a shit-ton of stuff to set up but based on the recent Pipeline and other Hacks where the Companies had to pay millions I would expect more companies to take stuff like this seriously.

_spoonman 4 years ago | | |

That sounds like work for a prime contractor with at least two subs under it. And consultants to help them implement SAFe AGILE.

adolph 4 years ago | | |

Opportunity is knocking

haolez 4 years ago | |

Who scans the vulnerability scanners? Genuine question. How does the community/ecosystem solve this problem of auditability?

nonameiguess 4 years ago | | |

We deal with this by having multiple vulnerability scanners. Product A and Product B both scan your active environment. Product A scans Product B. Product B scans Product A. Additionally, make the vendors of those products sign NDAs so your threat actors, other than insiders, don't necessarily even know who they are. An attacker then needs to not only compromise both, but figure out who they are in the first place.

knownjorbist 4 years ago | | |

For anyone who hasn't read it:

https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...

tinco 4 years ago | | |

If your threat profile says you need to audit your vulnerability scanners, you audit your vulnerability scanners. There's not really a problem there right?

jhawk28 4 years ago | | |

that was the issue in the solar winds hack: https://www.npr.org/2021/04/16/985439655/a-worst-nightmare-c...

sound1 4 years ago | |

I am mostly non technical person but why do we need to resort to firewalls etc. if we can employ UNIX like file permission system for network access? Wouldn't it be awesome if we can allow any installed software to contact ONLY whitelisted domains? Of course this excludes web browsers but you get the idea.

How about our mainstream OSes incorporate that kind of permission system similar to what we have in mobile OSes already have today?

bennysaurus 4 years ago | | |

It's a fair question and certainly is possible to have firewalls on a per-server basis. We do that for incoming traffic primarily. The catch is if that server itself gets compromised then you can't count on those rules still being enforced.

Having dedicated network appliances acting as firewalls means from a security perspective you need to compromise the local machine and then also compromise a dedicated, hardened external system as well. It vastly ups the difficulty barrier.

pavs 4 years ago | | |

Firewalls does a lot more than block ports and services.

zeroxfe 4 years ago | | |

Think of them as a defence-in-depth that protect from accidental misconfiguration, software bugs, local exploits, etc.

Meandering 4 years ago | | |

SELinux

remram 4 years ago | | |

You're describing a firewall? How could it be more "UNIX-like"?

gizdan 4 years ago | |

> - Use network separation to control the amount of damage a compromise can cause.

> - Use firewalls to limit unneeded network connectivity and encryption to protect confidentiality.

Are we still on this? Why isn't anyone pushing for zero trust? A concept made significantly easier to achieve thanks to container orchestration.

graderjs 4 years ago | |

I do all those things in the pro version of my RBI (remote browser isolation) product, but i don't use k8s.

- Scan for vulns and misconfigs: I regularly update the underlying distro images, and use security scanning software to monitor dependencies, and regularly update them.

- Run with least privilege: I create a separate, temporary user account (no login, no shell) for each browser and service which has no elevated privileges, as well as run that browser and its service in a group and cgroup that restricts disk, bandwidth, CPU, and memory using block quotas, cgroups, tc, iptables and active monitoring and termination.

- Use network separation to isolate: RBI is basically a network isolation layer between the client (where the human interacts), and the server (where the browser actually runs.) I also don't have any privileges (service accounts, SSH keys, trusted IPs) on any of the machines and they're all single tenant and run inside GCE.

- Use firewalls to lock down connectivity + encryption: I use GCE firewall rules and iptables drop rules to block access to GCP metadata endpoints, as well as to other machines in the subnet. Also, every network request is encrypted (HTTP is https/TLS, WebSocket is wss/TLS, WebRTC is encrypted by default).

- Use strong auth to limit user access: For running the processes I use temporary users. For persistent browser sessions I use persistent users (either system native, or in a DB, always with bcrypt salted hashed passwords). For SaaS and resource control I use high entropy random API keys between each service layer. But I could improve my game for keeping secrets out of private git repos and separating code and config, ideally automatically. I could also improve my game to limit administrator access (right now I just have a single role, with God power, but I should create an admin role with power limited to a project, ideally even on a per-customer level).

- Use log autditing: I do this, but only manually, using various grepping and inspection of various logs, including last and lastb, as well as the service internal logs. This is likely something I could improve as well.

- Review all k8s settings: I don't use k8s or docker, just run services in this custom sandbox on GCE instances. I see that as both a way to limit attack service and complexity as well as minimize some overheads for maintenance and performance. In the longer term these things are worth exploring.

Thanks a lot for the TLDR. For more info on my RBI work check out https://github.com/i5ik/ViewFinder

raesene9 4 years ago |

Some useful guidance here, although worth noting that some of it is a bit dated (k8s security can move quickly).

Most notably from a scan through, they're mentioning PodSecurityPolicy, but that's deprecated and scheduled to be removed in 1.25.

There will be an in-tree replacement but it won't work the same way. Out of tree open source options would be things like OPA, Kyverno, jsPolicy, k-rail or Kubewarden.

nonameiguess 4 years ago | |

We've actually already moved the official guidance from PSPs to OPA and that's what the primary DevSecOps reference implementation has used for about two months now.

"We" being the DoD, but our guidance is the NSA guidance. I'm not sure why it hasn't made it into the policy pdf, but the actual official IAC has been using OPA since April.

ramoz 4 years ago | | |

That's awesome. I know a lot of work is going into things like P1.

I scale some large K8s in fed (not DOD)... ATO is fun. Actually unsure how I'd position something like OPA (actually envisioned them being key back in '17 when working in the Kessel Run relm... called and they hadnt been exposed to fed at the time).

Legit question / maybe dumb - where is DOD at in general perimeter security. Outside looking in & everything before a container runs - network and to OS primarily, cloud envs as well. A lot of Fed needs help here before they can comprehend even a basic Kubernetes authorization. It's also generally more important (at list from controls perspective) in non DOD environments, than something like security context in pods.

xxpor 4 years ago | |

> Some useful guidance here, although worth noting that some of it is a bit dated.

Is there any digital security guidance from the feds that doesn't apply to? :)

sslayer 4 years ago | | |

Everybody wants small gov, until they don't.

eatmyshorts 4 years ago | |

In-tree replacement is coming in v1.22...as in, just a few weeks away. It uses admission controllers, just like OPA/Kyverno et al, hence the current guidance to use one of those.

sleepybrett 4 years ago | |

PaaS solutions can't cover everything that PSP was covering though.

raesene9 4 years ago | | |

Out of curiousity, which bits were you thinking of? OPA, Kyverno et all have policies which (AFAIK) hit all the bits of Kubernetes PSS.

__app_dev__ 4 years ago |

I used to study and focus on security a lot more and keep up with trends. After several interviews this year I realize a lot of jobs prioritize leetcode over everything else. It's pretty annoying and makes me wonder if the focus for tech works is leetcode above all else then no wonder so many companies have insecure apps and servers.

nwmcsween 4 years ago |

The elephant in the room here is almost all containers according to artifacthub.io, etc are a complete tire fire

nonameiguess 4 years ago | |

The DoD maintains its own registry of hardened container images they call the Iron Bank. I guess they can't issue guidelines to the general public that you should use these, but the DoD has to use them. Which kind of sucks, because they may be hardened, but they also break all the time because the people responsible for hardening them can't possibly understand all the myriad subleties involved in building and deploying software packaged with dependencies in the same way the actual software vendors do. They make some serious rookie mistakes, like just straight copying executables out of a Fedora image into a UBI images, which works perfectly fine when a brand-new UBI release happens and it's on the same glibc as Fedora, then immediately stops working and all your containers break when Fedora updates.

throwaway984393 4 years ago | | |

They may suck at building containers, but this also sounds like a release management issue. Both the producers and consumers of the release need a test suite to validate the new artifacts before they can make it into a pipeline to eventually deliver to a customer use case. (But also they should 100% not be copying random binaries)

For what it's worth I've seen worse from corporations. Bad hires lead to bad systems.

spectre013 4 years ago | | |

I work on Platform one and we use and deploy new versions of these containers weekly and have never had them break in that way. In the Beginning when I was on the Kubernetes team we struggled with the containers just not working at all but they have gotten better.

Now I work on deploying and we run every container from IB and have few issues. If you find them report the images and they will fix them pretty quick.

kenm47 4 years ago | |

there are good free/oss container scanners. check out Trivy.—no reason not to use one.

rob_c 4 years ago |

A lot of this applies to containers in general. Not complaining, it's well written but wish they would break out the none kube container stuff into a general container-sec advice for people.

asymptosis 4 years ago | |

This is a great point. And containers don't even really exist in the first place, so really there should be (at least one of) a family of docs about securing the various namespaces, cgroups etc in modern Linux releases, and a doc about how to secure them in combination with each other.

kgarten 4 years ago |

How do I know that this advice is useful and does not put me in danger?

Example: NSA recommends to use RSA encryption.

https://www.theverge.com/2013/12/20/5231006/nsa-paid-10-mill...

m1keil 4 years ago | |

You don't use this guide as a bible but take it into account and compare with other common security advice in the field. If you get similar results it most likely a good list of advice.

jdubs 4 years ago | |

This isn't for regular people, this is telling third parties what they need, in order for them to try to sell something to the nsa.

kenm47 4 years ago |

so... a lot of this can be done with Fairwind's OSS tool Polaris... https://github.com/FairwindsOps/polaris

feels good that we've been addressing this for a bit already tbh. (disclaimer, I work for fairwinds)

herodoturtle 4 years ago | |

How did you (/they) come up with the name Polaris?

EamonnMR 4 years ago | | |

If I had to guess it's a nautical theme, following Kubes. Fair Winds (sailing), Polaris (North Star, used for navigation.)

pletnes 4 years ago |

What yields the lowest risk - spending a ton of time hardening one cluster, or building multiple clusters to reduce the blast radius of bugs and misconfigurations?

pulketo 4 years ago |

A guide from somebody who hates not knowing everything about you... Tengo mis dudas

neop1x 4 years ago |

First you should configure some kind of authentication. It is fun to remember this 3 years old Tesla example [1]: Publicly accessible Kubernetes Dashboard.

[1] https://www.zdnet.com/article/tesla-systems-used-by-hackers-...

andrewmcwatters 4 years ago |

Well that’s… curious. Not sure I’ve ever read the NSA providing hardening guidance on anything before.

sdmike1 4 years ago | |

In general the NSA functions more like 2 agencies, one focused on the "red" side (hacking, breaking crypto, sigint stuff) and one focused on the "blue" side (protecting US assets from being hacked, developing better/new crypto, providing guidance on security).

Both sides are good at their jobs and for what it's worth, my understanding is that the blue side really does want to keep your shit from being hacked.

antonvs 4 years ago | |

They've been doing that for at least a decade, but probably quite a bit longer. Here are their hardening guidelines for RHEL 5, from 2011:

https://apps.nsa.gov/iaarchive/library/ia-guidance/security-...

They have similar guidance for Windows, web browsers, industrial control systems, etc.

andrewmcwatters 4 years ago | | |

Interesting! Thank you for sharing this.

endymi0n 4 years ago | |

For what it's worth, SELinux originated from the R&D Labs of the NSA.

HWR_14 4 years ago | |

> Not sure I’ve ever read the NSA providing hardening guidance on anything before.

The NSA made SELinux, SHA-1, and SHA-256.

SHA-1 was specifically a slight change to SHA-0 that was unjustified at the time but over the next 3-5 years some attacks on SHA-0 that SHA-1 was not vulnerable to surfaced.

beprogrammed 4 years ago | |

It's fine to trust them right up until they give you a magic number.

tablespoon 4 years ago | | |

> It's fine to trust them right up until they give you a magic number.

IIRC, DES had NSA-provided magic numbers in it that made it more secure against a then-not-publicly-known cryptoanalytic attack.

soheil 4 years ago |

I keep forgetting NSA's job is to protect instead of maliciously eavesdropping on Americans. Given their prior probability of being a bad actor I'd take any security "guidance" they issue with a huge grain of salt.

wvh 4 years ago | |

It's perfectly possible, laudable even, to read things with a healthy dose of salt and still expand your understanding.

I think this document gives a good general overview, often missing in the fast-paced, crowded and noisy Kubernetes landscape.

lazide 4 years ago | |

Like many large organizations, The government has many groups, often with many of them working to some extent against each other.

AES also has been blessed by the NSA, and I bet you use that extensively too - if you want to or not?

buggeryorkshire 4 years ago | |

Reasonable. But then again they did come up with selinux and I haven't seen any backdoors in that.

reacharavindh 4 years ago | | |

Yet.

kgarten 4 years ago | |

not sure why you are downvoted. Completely agreed.

Keeping 0-days for yourself and making security standards weaker, will just weaken your standing.

Even if this information might be useful (and without backdoors or bad advice), I just cannot trust them (so I won't click on the link).

In terms of security, I trust the hacker community much more (going by ccc or other groups advice is definitely better).

debarshri 4 years ago |

This is really helpful. I wonder if there is curated list of k8s hardening guidelines for various organisations.

adolph 4 years ago | |

Just search for ‘* awesome list’

https://github.com/magnologan/awesome-k8s-security

(Unaffiliated with above, just popped up for k8s hardening awesome list)

Rd6n6 4 years ago |

Do they have a version for ordinary web app servers?

legrande 4 years ago |

We all know it's the National Insecurity Agency[0], and that the NSA hoards & stockpiles 0day. They very rarely release tools and research papers designed to strengthen our IT infra, since they sit on so much 0day. There's no balance.

I don't buy that they're 50% red team, and 50% blue team. More like 99% red team and 1% blue team.

[0] https://en.wikipedia.org/wiki/Doublespeak

kchoudhu 4 years ago |

Somehow the text is not just...

   kubectl -n my-ns delete pod,svc --all