NSA Kubernetes Hardening Guidance [pdf](media.defense.gov) |
NSA Kubernetes Hardening Guidance [pdf](media.defense.gov) |
- Run containers and Pods with the least privileges possible.
- Use network separation to control the amount of damage a compromise can cause.
- Use firewalls to limit unneeded network connectivity and encryption to protect confidentiality.
- Use strong authentication and authorization to limit user and administrator access as well as to limit the attack surface.
- Use log auditing so that administrators can monitor activity and be alerted to potential malicious activity.
- Periodically review all Kubernetes settings and use vulnerability scans to help ensure risks are appropriately accounted for and security patches are applied.
Probably the hardest part about this. Private networks with private domains. Who runs the private CA, updates DNS records, issues certs, revokes, keeps the keys secure, embeds the keychain in every container's cert stack, and enforces validation?
That is a shit-ton of stuff to set up (and potentially screw up) which will take a small team probably months to complete. How many teams are actually going to do this, versus just terminating at the load balancer and everything in the cluster running plaintext?
We really need something to help make the process simpler, like how Let's Encrypt made public HTTPS so much easier to do for even the smallest of websites.
Depending on the existing size and complexity of your stack those months can be cut down to weeks or even days.
I don't mean to trivialize the time and expertise needed to set up and manage, but if you can afford to run a microservice architecture on k8s already it's definitely not untenable.
If anything, setting up the whole things within Kubernetes ecosystem can be much easier with the available operators and automation frameworks like cert manager and/or Istio.
Agree! This is why that "Kubernetes Hardening Guidance" is for NSA, not for startups.
Resource needs aside, keeping basic AppSec/InfoSec hygiene is a strong recommendation. Also there are tons of startups that are trying to provide solutions/services to solve that also. A lot of times, it's worth the money.
We also for any container that makes requests that does not have the mechanisms for adding certificates we have to rebuild the containers in the disconnected env to insert certificates to allow communication.
Makes daily life really interesting
Maybe a disgruntled sys admin decides to capture data coming in from the load balancer between the service(s) and sell it to the highest bidder. If traffic is encrypted between the load balancer and underlying service, it makes it much harder to do.
https://www.washingtonpost.com/world/national-security/nsa-i...
(Google has since added encryption to its internal entworks.)
https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...
How about our mainstream OSes incorporate that kind of permission system similar to what we have in mobile OSes already have today?
Having dedicated network appliances acting as firewalls means from a security perspective you need to compromise the local machine and then also compromise a dedicated, hardened external system as well. It vastly ups the difficulty barrier.
> - Use firewalls to limit unneeded network connectivity and encryption to protect confidentiality.
Are we still on this? Why isn't anyone pushing for zero trust? A concept made significantly easier to achieve thanks to container orchestration.
- Scan for vulns and misconfigs: I regularly update the underlying distro images, and use security scanning software to monitor dependencies, and regularly update them.
- Run with least privilege: I create a separate, temporary user account (no login, no shell) for each browser and service which has no elevated privileges, as well as run that browser and its service in a group and cgroup that restricts disk, bandwidth, CPU, and memory using block quotas, cgroups, tc, iptables and active monitoring and termination.
- Use network separation to isolate: RBI is basically a network isolation layer between the client (where the human interacts), and the server (where the browser actually runs.) I also don't have any privileges (service accounts, SSH keys, trusted IPs) on any of the machines and they're all single tenant and run inside GCE.
- Use firewalls to lock down connectivity + encryption: I use GCE firewall rules and iptables drop rules to block access to GCP metadata endpoints, as well as to other machines in the subnet. Also, every network request is encrypted (HTTP is https/TLS, WebSocket is wss/TLS, WebRTC is encrypted by default).
- Use strong auth to limit user access: For running the processes I use temporary users. For persistent browser sessions I use persistent users (either system native, or in a DB, always with bcrypt salted hashed passwords). For SaaS and resource control I use high entropy random API keys between each service layer. But I could improve my game for keeping secrets out of private git repos and separating code and config, ideally automatically. I could also improve my game to limit administrator access (right now I just have a single role, with God power, but I should create an admin role with power limited to a project, ideally even on a per-customer level).
- Use log autditing: I do this, but only manually, using various grepping and inspection of various logs, including last and lastb, as well as the service internal logs. This is likely something I could improve as well.
- Review all k8s settings: I don't use k8s or docker, just run services in this custom sandbox on GCE instances. I see that as both a way to limit attack service and complexity as well as minimize some overheads for maintenance and performance. In the longer term these things are worth exploring.
Thanks a lot for the TLDR. For more info on my RBI work check out https://github.com/i5ik/ViewFinder
Most notably from a scan through, they're mentioning PodSecurityPolicy, but that's deprecated and scheduled to be removed in 1.25.
There will be an in-tree replacement but it won't work the same way. Out of tree open source options would be things like OPA, Kyverno, jsPolicy, k-rail or Kubewarden.
"We" being the DoD, but our guidance is the NSA guidance. I'm not sure why it hasn't made it into the policy pdf, but the actual official IAC has been using OPA since April.
I scale some large K8s in fed (not DOD)... ATO is fun. Actually unsure how I'd position something like OPA (actually envisioned them being key back in '17 when working in the Kessel Run relm... called and they hadnt been exposed to fed at the time).
Legit question / maybe dumb - where is DOD at in general perimeter security. Outside looking in & everything before a container runs - network and to OS primarily, cloud envs as well. A lot of Fed needs help here before they can comprehend even a basic Kubernetes authorization. It's also generally more important (at list from controls perspective) in non DOD environments, than something like security context in pods.
Is there any digital security guidance from the feds that doesn't apply to? :)
For what it's worth I've seen worse from corporations. Bad hires lead to bad systems.
Now I work on deploying and we run every container from IB and have few issues. If you find them report the images and they will fix them pretty quick.
Example: NSA recommends to use RSA encryption.
https://www.theverge.com/2013/12/20/5231006/nsa-paid-10-mill...
feels good that we've been addressing this for a bit already tbh. (disclaimer, I work for fairwinds)
[1] https://www.zdnet.com/article/tesla-systems-used-by-hackers-...
Both sides are good at their jobs and for what it's worth, my understanding is that the blue side really does want to keep your shit from being hacked.
https://apps.nsa.gov/iaarchive/library/ia-guidance/security-...
They have similar guidance for Windows, web browsers, industrial control systems, etc.
The NSA made SELinux, SHA-1, and SHA-256.
SHA-1 was specifically a slight change to SHA-0 that was unjustified at the time but over the next 3-5 years some attacks on SHA-0 that SHA-1 was not vulnerable to surfaced.
IIRC, DES had NSA-provided magic numbers in it that made it more secure against a then-not-publicly-known cryptoanalytic attack.
I think this document gives a good general overview, often missing in the fast-paced, crowded and noisy Kubernetes landscape.
AES also has been blessed by the NSA, and I bet you use that extensively too - if you want to or not?
Keeping 0-days for yourself and making security standards weaker, will just weaken your standing.
Even if this information might be useful (and without backdoors or bad advice), I just cannot trust them (so I won't click on the link).
In terms of security, I trust the hacker community much more (going by ccc or other groups advice is definitely better).
https://github.com/magnologan/awesome-k8s-security
(Unaffiliated with above, just popped up for k8s hardening awesome list)
I don't buy that they're 50% red team, and 50% blue team. More like 99% red team and 1% blue team.
kubectl -n my-ns delete pod,svc --allYou might be playing the long game. I think a CTO might benefit from knowing both app dev and security.
https://www.reddit.com/r/programming/comments/39d0u1/google_...
In my opinion the problem has gotten worse. I spoke with a former Microsoft Product Manager early this year and he mentioned "Highly experienced engineers give the worst interviews" .. based on the current environment. He mentioned it's because the questions that come up are stuff people learned 20 years ago and never used in real-life.
I was invited by several FAANG companies this year to interview and did one interview (and turned down others) but had to cram for 2 months doing leetcode.
I realize it's a game but have mixed feelings. I think the interviews are bad if it's the only option but part of me knows the interview process is so bad it will likely discourage people from being Software Engineers and keep salaries high for many more years.
Employers use it to loosely gauge a programmer's basic skill level, as well as their competency to think clearly and cleverly.
It has its pros and cons.
Not sure this is a valid dichotomy.
If you are spinning up multiple clusters, you are presumably doing so in an automated fashion. If so, then the effort of hardening is very similar. It doesn't really matter where you do it.
Multiple clusters may have a smaller blast radius, but will have a larger attack surface. Things may be shared between them (accounts? network tunnels? credentials to a shared service?) in which case an intrusion in one puts everyone else at risk.
Nope. If the clusters are separate it limits how damaging a compromise of the cluster is. This is why cloud providers don’t stick you on the same k8s cluster as another tenant.
> Multiple clusters may have a smaller blast radius, but will have a larger attack surface. Things may be shared between them (accounts? network tunnels? credentials to a shared service?) in which case an intrusion in one puts everyone else at risk.
It’s not really clear what you’re trying to say here. If someone compromises credentials shared between all clusters that’s the same as compromising credentials used by one mega cluster.
Having multiple clusters may help reduce the blast radius of _certain_ attacks, to some degree. However, managing multiple clusters is a lot more difficult than managing one, and you will potentially replicate bad practices, vulnerabilities to multiple places and increase maintenance burden.
Multiple clusters protects you from these types of configuration mistakes by reducing the blast radius and providing an additional landing zone to roll out changes over time.
Can you elaborate?
Of course that reduces the benefits of Kubernetes from a cost perspective, and increases administrative overhead, so it's a trade-off.
If "kubernetes" is Greek for the ship's pilot, what's the name for the captain or maybe the admiral of the armada?
Hive for OpenShift [0] - Provisioning of the K8s Fleet
Advanced Cluster Management [1] - Management of the K8s Fleet
[0] https://cloud.redhat.com/blog/openshift-hive-cluster-as-a-se...
[1] https://cloud.redhat.com/products/advanced-cluster-managemen...
[Disclosure: I'm an IBMer]
https://caylent.com/kubernetes-cluster-federation-with-admir...
Google says the Greek for that is Ναυαρχείο. Now if you see a product named Navarxeio or Navarcheio, you'll know what it is.
Sarcastically, we would say "they already have back doors in everything, they just don't want any other Bad Actors getting in their yard"
Well if the NSA does have loads of 0day then it's still better for them to give good security advice to strengthen infra, because it will limit the access adversary's have while they still have all the 0day's anyway.
i.e. they are advanced enough to not need to walk through an open door, so they might as well encourage others to close the doors because that will increase national security (while presumably not limiting their own access).
I’ve been recommending it to various localities after my security assessments for years now. https://apps.nsa.gov/iaarchive/library/ia-guidance/security-...
Having said that, the last thing I tried implementing from the NSA was a simple systemd service to disable ptrace [1]. The provided service definition had at least three errors, and the instructions themselves were incomplete. Not exactly a confidence builder, but I'll take a look at this one so thank you.
[0] https://cryptome.org/2016/01/nsa-16-0114.pdf
[1] https://media.defense.gov/2019/Jul/16/2002158062/-1/-1/0/CSI...
While we maybe could estimate the relative sizes of the groups you mention and compare them relative to each other to guess the strategy/policy/tactics it's not clear that would be accurate; or maybe we could infer based on some heuristic or metric (like budget being a proxy for headcount), and even then it's not clear how certain that guess would be, so it's not obvious how "we all know" it's 99/1 vs 50/50, vs any other permutation.
Push come to shove would probably agree with your premise and conclusion, and really have no idea, so apologies for being nitpicky; without a background on the technical details it's likely I'm wrong.
[1] https://www.logicallyfallacious.com/logicalfallacies/Alleged...
No individual can speak for all readers here on how they view this agency. Attempting to weakens the comment.
(that's another logical fallacy ...) https://www.logicallyfallacious.com/logicalfallacies/Argumen...
False-positives abound on these scanners.
[0] https://software.af.mil/dsop/documents/ [1] https://boards.greenhouse.io/raft
1. 1 cluster was awesome
2. Many clusters means I rebuild the world
3. I wish there was a way to get the benefits of one cluster across multiples.
Which I believe is a solvable problem and partially what we've been poking at at https://github.com/kcp-dev/kcp (although it's still so early that I don't want to get hopes up).
'dwheeler is now the Linux Foundation's Director of Open Source Supply Chain Security.
https://bootstrappable.org/ https://bootstrapping.miraheze.org/
Notably I know the Rust compiler has been verified in this way (or at least certain versions of it have been verified), but it shouldn't be hard to do the same for any language with multiple independent implementations.
Again, as a non technical person, why a software needs access to entire internet instead of whitelisted domains specific to its requirements is beyond me, since we already know how UNIX permission system works. Is it so hard to extend that to networks? Especially since everything is file in UNIX? Kindly pardon my ignorance :-)
But if the clusters are configured similarly, a flaw in one is likely present in the others. GPs point is that if you invest in hardening, you can easily apply it to multiple clusters.
> It’s not really clear what you’re trying to say here.
I assume they mean having more clusters present means there are more opportunities to be compromised (e.g. more credentials to leak, more API servers to target, possible version skew, etc.).
That doesn’t matter. The point is that you isolate applications/tenants into different clusters. So if someone exploits their own, they haven’t gained access to some other application.
> assume they mean having more clusters present means there are more opportunities to be compromised (e.g. more credentials to leak, more API servers to target, possible version skew, etc.).
That doesn’t even make sense though. In our strawman scenario these are cookie cutter things. Many is not more vulnerable than one in this case.
Unfortunately small government activists have recognized this and have enacted policies that promote incompetence as much as possible. "Good enough for government work" is a choice, not an inevitability.
One exception, as other have mentioned, is blast radius - with a single cluster, a problem with Kubernetes itself could take down everything.
>It includes hardening strategies to avoid common misconfigurations and guide system administrators and developers of National Security Systems on how to deploy Kubernetes...
> Purpose > NSA and CISA developed this document in furtherance of their respective cybersecurity missions, including their responsibilities to develop and issue cybersecurity specifications and mitigations. This information may be shared broadly to reach all appropriate stakeholders.
NSA has multiple mandates and many stakeholders.
So always encrypt everything unless you’re in a very rare environment with central network control that cannot be compromised or intercepted from a given machine.
AWS EKS on Fargate has a dedicated ENI and kernel per pod; the only way to intercept the traffic is when it crosses a network, or with flow control logs. Or if somebody hacked the control plane, but that's always "Game over man, game over!"
> Works cited:
> [1] Center for Internet Security, "Kubernetes," 2021. [Online]. Available: https://cisecurity.org/resources/?type=benchmark&search=kube... .
> [2] DISA, "Kubernetes STIG," 2021. [Online]. Available: https://dl.dod.cyber.mil.wp- content/uploads/stigs/zip/U_Kubernetes_V1R1_STIG.zip. [Accessed 8 July 2021]
> [3] The Linux Foundation, "Kubernetes Documentation," 2021. [Online]. Available: https://kubernetes.io/docs/home/ . [Accessed 8 July 2021].
> [4] The Linux Foundation, "11 Ways (Not) to Get Hacked," 18 07 2018. [Online]. Available: https://kubernetes.io/blog/2018/07/18/11-ways-not-to-get-hac... . [Accessed 8 July 2021].
> [5] MITRE, "Unsecured Credentials: Cloud Instance Metadata API." MITRE ATT&CK, 2021. [Online]. Available: https://attack.mitre.org/techniques/T1552/005/. [Accessed 8 July 2021].
> [6] CISA, "Analysis Report (AR21-013A): Strengthening Security Configurations to Defend Against Attackers Targeting Cloud Services." Cybersecurity and Infrastructure Security Agency, 14 January 2021. [Online]. Available:https://us- cert.cisa.gov/ncas/analysis-reports/ar21-013a [Accessed 8 July 2021].
How can k8s and zero-trust cooccur?
> CISA encourages administrators and organizations review NSA’s guidance on Embracing a Zero Trust Security Model to help secure sensitive data, systems, and services.
"Embracing a Zero Trust Security Model" (2021, as well) https://media.defense.gov/2021/Feb/25/2002588479/-1/-1/0/CSI...
In addition to "zero [trust]", I also looked for the term "SBOM". From p.32//39:
> As updates are deployed, administrators should also keep up with removing any old components that are no longer needed from the environment. Using a managed Kubernetes service can help to automate upgrades and patches for Kubernetes, operating systems, and networking protocols. *However, administrators must still patch and upgrade their containerized applications.*
"Existing artifact vuln scanners, databases, and specs?" https://github.com/google/osv/issues/55
Like I said: differently complex but it’s a general solution to the problem and doesn’t require changing more “inner” things as much.
But that's not really the issue. You still have a big plaintext network with a bunch of random stuff talking, no mutual auth and no security controls other than segmentation. That's the tricky problem that mTLS and service meshes attempt to solve.
Outside of k8s (where one has greater control over how specifically e.g. Wireguard is deployed). Again, there is no plaintext outside of localhost. Wireguard is mutual auth, I’m not sure why you think it isn’t. Wireguard + firewall is security control since, well, you have mutual auth so rules can be applied per-client.
But 1) kernel hackers won't implement it, 2) app devs are too possessive of their stack/codebase to just use one standard implementation/interface, and 3) security people are too paranoid to leave something "so important" up to the OS so they'd rather everyone implement it poorly/fragmentedly.
We learned recently that for a long time, the primary producers of cryptographic telephones was a single Swiss company. Owned by the CIA.
If security were easy, a lot of intelligence agencies would have a bad day.
Security doesn’t have to be this hard. But the powers that be seem to prefer complex, complicated systems, like DNS or SELinux.
It could be easier. Much easier.
As far as a service mesh, check out Linkerd! I find Istio much harder to setup and manage. Linkerd is super simple and has always worked pretty much out of the box for me.
This may be fully acceptable for you, but should not be glossed over.
From my experience, linkerd had the most seamless deployment to get to the most feature-complete out-of-the box experience with monitoring etc. But as it goes with these things there’s a much bigger amortized cost in terms of magic to unwind if you need to integrate it.
How are these containers getting the different secrets they need to identify themselves? Are you attaching IAM roles to them to get secrets from some secret store?