What can we learn from the matrix.org compromise?

What can we learn from the matrix.org compromise?(medium.com)

84 points by cyber 7 years ago | 73 comments

pm90 7 years ago |

This is such a poorly written article:

* no detailed analysis of how the attack was undertaken. Its not even clear how the attacker managed to get in (was it a publicly exposed Jenkins? vulnerable bastion? what?)

* no analysis of what the existing matrix.org security perimeter looked like or how it could be made better.

* repetition of security tropes. Use VPN. Use Github Enterprise (wait wtf? Why not private repos in Github?). Don't use Ansible, use salt.

Ridiculous. I was looking forward to a nice long read about how this breach was undertaken. Hugely disappointed.

bifrost 7 years ago | |

If you click through to the GH Issues I linked to there are some pretty good data points as to what happened. I didn't feel the need to copypasta.

But yes, publicly exposed jenkins and repos lead to the compromise, not an uncommon story unfortunately.

Perimeter - I didn't see much evidence of one existing and I didn't go probing their networks to find out.

Security tropes are real for a reason, you don't have to believe me though.

Private repos in GitHub are still publicly hosted and are orders of magnitude easier to get into than having an in perimeter repo. They've leaked before and they'll keep on leaking. GitHub even made it harder for people to fork private repos to their own public accounts but it still happens.

pm90 7 years ago | | |

> They've leaked before and they'll keep on leaking. GitHub even made it harder for people to fork private repos to their own public accounts but it still happens

Can you provide some actual instances of this happening? Genuinely curious, as my org is currently migrating from enterprise to cloud.

bobwaycott 7 years ago | | |

> But yes, publicly exposed jenkins and repos lead to the compromise …

You mean the past-tense verb led, not its metallic homonym lead. :)

nobatron 7 years ago | |

There's a lot wrong with this article.

Firstly having a private network for your infrastructure isn't a one stop solution for keeping attackers out.

Secondly using Github Enterprise or self hosted GitLab doesn't make up for storing secrets in Git.

Looking forwards to the proper write up.

bifrost 7 years ago | | |

I've never claimed it was a "one stop", but it certainly keeps the random internet users to a minimum.

And yes, using GHE or self hosted GitLab doesn't make up for storing secrets, but it at least keeps them out of the public eye so the effects are less brutal. Its still bad to store secrets in a code repository.

My whole point is that you can reduce risks easily, yet some people don't for some reason.

netsectoday 7 years ago | |

* this idiot claimed "Ansible was used to keep the attacker in the system" which in all reality Ansible did what it was supposed to by altering the correct authorized_keys file and the attacker leveraged an old default in the sshd config. This is a sshd config issue, not Ansible.

The sales-pitch for Salt (against Ansible) is ridiculous and misguided.

I just checked out the Salt SSH module and even if they used salt they would still have this issue. Then answer here is to not use the default /etc/ssh/sshd_config value of #AuthorizedKeysFile .ssh/authorized_keys .ssh/authorized_keys2. Uncomment and remove authorized_keys2.

KirinDave 7 years ago |

Why aren't people reporting the fact that Matrix.org actually lost control of their network a second time within hours of their first all clear sounding?

I feel like this is an important part of the story for anyone looking for teachable infosec moments.

bifrost 7 years ago | |

I guess I technically glossed over that but I did say "One of the more interesting pieces of this was how Ansible was used to keep the attacker in the system". The attacker was persisted via CM and their public repo, I'm actually surprised this doesn't happen more often.

bifrost 7 years ago | | |

I should clarify this comment a bit since it seems to be the most controversial.

When I say the attacker was persisted via CM, I'm pointing at his own notes, nodding to broken CM, the requirements of supporting the CM and availability of the config files.

I also sanity checked the sshd_config file on my systems, they're all set to a sane default:

"AuthorizedKeysFile .ssh/authorized_keys"

FWIW I prefer to treat CM data as "valuable" information for this reason.

driminicus 7 years ago | |

Because the second tine was a dns hijack, not a network compromise. I'm a little fuzzy on the details, but it had something to do with cloudflares API not revoking some access token.

Either way, a DNS hijack is not great, but not nearly as bad as the initial compromise.

bifrost 7 years ago | | |

It wasn't CloudFlare's API not revoking a token, they just didn't revoke all the tokens. Basically human error.

"The API key was known compromised in the original attack, and during the rebuild the key was theoretically replaced. However, unfortunately only personal keys were rotated, enabling the defacement."

KirinDave 7 years ago | | |

See, I'd like to know more too.

Arathorn 7 years ago | |

The rebuilt infra wasn’t compromised; what happened was that we rotated the cloudflare API key whilst logged into CF with a personal account but then masquerading as the master admin user. Turns out that rotating the API key rotates your personal one, not the one you’re masquerading as, and we didn’t think to manually compare the secret before confirming it had the right value. Hence the attacker was able to briefly hijack DNS to their defacement site until we fixed it.

We will write this up in a full postmortem in the next 1-2 weeks.

Arathorn 7 years ago |

If it wasn’t clear, this article wasn’t written by the Matrix.org team, nor did the author discuss any of it with us to our knowledge.

We’ll publish our own full post-mortem in the next 1-2 weeks.

Arathorn 7 years ago | |

also, reading this article more carefully, much of this just plain wrong:

> One of the more interesting pieces of this was how Ansible was used to keep the attacker in the system.

Fwiw the infra that was compromised was not managed by Ansible; if it had been we would likely have spotted the malicious changes much sooner.

nisa 7 years ago |

It's been a few years since I last used Saltstack but if you have access to the master you have instant root on all minions or did that somehow change? salt '*' cmd.run 'find / -delete' and game-over?

bifrost 7 years ago | |

Very true, however I'd rather have that problem than an ever multiplying number of user accounts on systems that can su/sudo.

verdverm 7 years ago | | |

Make golden images with packer, or something similar, and then roll your fleet over.

You should not be running package managers on production servers. Or any of the other things salt, ansible, chef, puppet can do.

_frkl 7 years ago | | |

How does saltstack do tasks that require root access? Use the root user directly?

ubercow13 7 years ago |

Why is it considered safer to expose a VPN to the internet than SSH? Is it just that there is one exposed service for the organisation rather than one per machine?

bifrost 7 years ago | |

SSH tunneling is handy but if you want to push anything else over it, its a pain for the "layperson". You're not going to have a great time supporting people with it. I've done it, it sucks. Scripts and special SSH config files are the pits. VPNs are way easier, they can support multiple access levels and roles, are often not blocked by other people's packet filters and firewalls and the good ones can even validate that a host is in "compliance" before they're allowed onto the network.

closeparen 7 years ago | |

You can expose one SSH box per organization (a “bastion”) and deploy SSH configs to clients that make it look like you have direct access to the hosts behind it.

acct1771 7 years ago | |

That'd probably be a solid question that the people implementing WireGuard in Linux kernel/supporting that can cover.

krupan 7 years ago |

Can anyone explain the Jenkins vulnerability that was used to initially gain access? Reading the CVEs didn't give me the impression that they enabled remote exploits

bifrost 7 years ago | |

My 5 second lazy summaries of the CVEs:

CVE-2019-1003001, CVE-2019-1003002 -> Anyone with read access to Jenkins can own the build environment.

CVE-2019-1003000 -> I didn't get a lot of the details on this but it basically looks like "broken sandboxing, you can run bad scripts".

This is also a good resource: https://packetstormsecurity.com/files/152132/Jenkins-ACL-Byp...

zimbatm 7 years ago |

The attacker gained network access through Jenkins.

Don't deploy a public-facing Jenkins, especially if it has credentials attached to it. It's really hard to secure, especially if pull-requests can run arbitrary code on your agents.

Jenkins / CI is the sudo access to most organizations.

bifrost 7 years ago | |

I agree with you 100% here, I would not deploy any CI publicly unless its heavily fenced off into "read only" territory.

r1ch 7 years ago |

One thing I learned was where to modify the pageant source code (Windows equivalent of ssh-agent) to make my agent prompt before signing (with the default focus on "no"). This feels much safer and is a very minor inconvenience. I wonder why more agents don't have this built in.

Example: https://twitter.com/R1CH_TL/status/1118559239084158977

forgotmypw 7 years ago |

I'd like to take this opportunity to plug my in-development decentralized, distributed, completely open forum, using PGP as the "account" system, and text files as the data store.

So any reasonably competent hacker can re-validate the entire forum's content and votes, reasonably quickly reimplement the whole thing, and/or fork the forum at any time.

http://shitmyself.com/

ficklepickle 7 years ago | |

This is very interesting! I have so many questions. If you see this, kindly send me an email. It's in my profile. I love the idea!

bifrost 7 years ago | |

Very Cool! I'll check it out!

mjevans 7 years ago |

That medium.com has a paywall and doesn't want to share content? (is what I learned)

bifrost 7 years ago | |

This might work: https://medium.com/@tomsparks/what-can-we-learn-from-the-mat...

tomupom 7 years ago | |

Not getting the same paywall trouble as you but https://outline.com/PZnDHL

inetknght 7 years ago |

I have gone on some long verbal rants about the dark patterns (bordering on malicious behavior) exhibited by key agents such as SSH agent, GPG agent, Pageant, and the like.

What can you learn from the compromise? Never use an agent. Kill it with fire^H^H^H^H -9.