With seccompbpf it's pretty simple to have systemwide tripwires on certain files/syscalls/network operations. Even if the attacker gains root, your tripwire will probably alert you before they can disable it.
Likewise, GCP Dataflow quite trivially allows you to escape onto the worker machines and take the (huge) binaries that implement it. They have some really nice detailed status pages!
First, we did a privilege escalation.
How? They don't say.
Next, we did another privilege escalation.
And how?? They don't say.
what's the point of this
Getting access to the host OS won't give you much other than some internal binaries and config.
I am sure Cloud SQL is far more advanced since then (9 years ago), but security in depth was something we thought about a lot. Running in a VM for each database rather than a multi-tenant system was for security more than anything else. We could have multi-tenanted just as easily implementation-wise.
This would never fly at Amazon because it would cost them a few cents to have anorher VM. Microsoft would probably not even notice the issue.
"Our research began when we identified a gap in GCP’s security layer that was created for SQL Server."
It would have been interesting to see how they identified that security gap.
https://bughunters.google.com/about/rules/6625378258649088/g...
Who knows - if Google hadn't detected the intrusion, this attack might be on the black market by now.
Fascinating how sloppy some people are when they set up infrastructure even though this may be down to bad defaults.
Presenting it as a Cloud SQL problem is disingenuous.
> we identified a gap in GCP’s security layer that was created for SQL Server. This vulnerability enabled us to escalate our initial privilege and add our user to the DbRootRole role, a GCP admin role.
So Google took proprietary software not designed for this use-case and built their own security layer on top of it and ended up with bugs.
Of course that's an issue with the service. Presenting it as anything else than an issue in Cloud SQL seems disingenuous.
Considering that, Google probably has an extensive monitoring system running in the VM, looking for things happening that shouldn't happen... And they have probably also built a filtering infrastructure between the users and the SQL server so that if any vulnerability is found, they can at least filter attempts to exploit it while a fix is being made.
That is categorically false. Not only does Amazon's RDS do that (can't find where they say that, might have been at reinvent one year) but for other services like Fargate they used to waste way more resources due to instance single tenancy, until they adopted Firecracker: https://d1.awsstatic.com/events/reinvent/2019/CON423-R1_REPE...
But isn't this for dedicated containers and not VMs?
More broadly, based on the literature I've seen, I'd agree that GCP takes security seriously, but so does AWS and I haven't seen any good evidence to say one would be "better" than the other.
I would expect both to come up with a robust security model and as part of their defense in depth I'd expect both to enforce single tenancy at a hypervisor level any time they're running anything untrusted or which can be materially/declaratively influenced by customers (e.g. code, SQL, etc)
In dataflow's case, container is not treated as the boundary. And there are several important things to note:
- Dataflow's VMs are in customer projects, so there's no risk of cross-tenant access.
- When launching dataflow jobs, the launcher identity is checked to have iam.serviceAccountUser IAM role, which means that the identity should be able to launch a VM with the same service account just fine. So dataflow is not escalating the permission beyond GCE VMs.
- Just as VM launched by someone, if anyone else can log onto those VMs are controlled separately.
- Container is used in dataflow only for convenient image delivery, not for a security barrier. VM is.
There's two types of cloud providers, the ones who take security seriously and the ones who learn security the hard, public way.
I'm a bit surprised that Azure would get lumped in with the other cut-rate providers but that's becoming more and more obvious with the vulnerabilities of the past few years.
The vms are somewhat hidden in the UI iirc but otherwise you can enumerate them via API and ssh to them and debug/profile (which I was doing to get cross-language profiling on data flow pipelines with py-spy and jvm perf output).
It's just a worker vm in your project.