Twitch is hacked, and its source code leaked(kotaku.com) |
Twitch is hacked, and its source code leaked(kotaku.com) |
Yep. From Mexico to the Pagonia and Iberia, let's screw a few millions of users.
This is a far different statement than "You can build something and compete with Steam in a couple of years". Most of the really hard problems are not technical. Success ain't gonna happen without a bunch of pain, sweat, and strategic stumbles on the part of the competition.
Was Twitch built in 10 years, or over just a few?
Steam was built since I was in FUCKING high school. Im old now, well over 30.
Apples, and blueberries.
Bluebarry, Drewbarry, tomato, ToMaHtoH.
Fuck their stupid ass streaming code, it’s a giant crud app, only their devops team can take credit for scaling, everyone else is not worth a shit, sorry, thats life, I gotta Leetcode too, and ur code isn’t worth me reading it, leaked or not).
It's just a crud app - why do they need more than 10 employees?
Building and maintaining infrastructure simply takes a lot of people, time, relationships and whatnot.
They get good at it over time which I guess could consider some secret sauce but there isn't like some secret code that makes the whole thing way better that now you'll see tons of competitors.
That doesn't stop CV-hungry engineers from finding ways to overcomplicate it.
(I do agree with you on this topic in general)
The thing I learned most from this leak is that the technology side plays very little part in the business being successful or not.
"Building apps is easy as long as you don't have millions of users. For that you have to actually think about bottlenecks, the larger architecture etc."
(I agree with that)
What I wanted to express is that lots of engineers I personally know instead say
"Building apps involves thinking about every bottleneck in advance and optimizing for every possible user scenario and a global user base, regardless if the number of users is only ~100."
I would advocate the exact opposite. If you need to scale to X users focus on making a great platform for X users, even if it’s only 100. If you try to over-engineer instead you’ll prematurely optimize and will make poor decisions that’ll come back to bite you when you actually DO need the scale and the requirements change.
Twitch is a full-featured, very mature application with many moving parts outside of just the video streaming, and building all those parts took an incredible amount of time and effort.
They hit.
It’s sort of like we all hold Golden dice, so we marveled, by our own eyes, at the gold.
Dealer: You rolling those?
Us: no, it’s gold.
They fucking risked it. It’s not a engineering feat, we’re all a bunch of pussies.
Twitch is easiest site to build, you might as well show me a todo app (which will be sieged and dismantled), scale is solved, we will eat your applications, the barbarians.
Rome falls.
I personally don't give a single fuck but I can see the appeal for some people.
It's a bit like the great pyramids, it's just a big pile of rocks but we'd be really interested in knowing exactly how the made these big piles
* Entire git histories
* Internal/Private AWS SDKs
* Encrypted Password dumps and payout reports
It's so comprehensive I'm very curious into how an attacker got that level of access. I can't think of another, large, corporate web 2.0 startup who's gotten owned in a similar fashion. Could the same attack work on Amazon? YouTube?
It's also strange that someone who has this level of access to what is presumably a multi-billion dollar company decided to just leak the data? Maybe they did try to ransom it, but I'd imagine someone with this kind of access inside Twitch must have had some creative way of making money.
Yes, that included payout data. Anyone with "staff" access to the site (which any employee can have) has access to any streamer's dashboard, which includes payout data.
I don't think this was an attack. Based on the data so far I think it was a disgruntled engineer. Obviously if more gets leaked later I may revise that opinion.
Revenue for the longest time was as simple as navigating to a streamers dashboard as staff, but they did finally gate that away from staff who don't need to see that info, however I am sure there are other ways to obtain revenue reporting info.
I am assuming all data - including personal - has been compromised but so far, the data leaked is data that most staff would have access to in some way or another. Some may find that shocking, but this was not a "high level hack"
Saying that no 'secrets' were leaked is effectively burying the lede.
Notably, the initial leak didn't actually include the password data which the leaker claims to have, just source code and payment data which has been verified by several affected streamers. It's possible that this first leak was just to establish trust so they can random or auction password hashes later.
Either way, I can only imagine the chaos inside as they try to figure out what has transpired here.
Password hashes are relatively useless though? Once the leak is announced I imagine most of the big targets will rotate their credentials. Then the next thing you need to do is spend possibly thousands in CPU time bruteforcing bcrypt hashes. Then I'm not sure what you can even do with those.
I'm not criminally creative but I imagine you could make more by abusing trust with payment processors or fraudulent invoices.
I think anyone would be excited to hack Twitch as the site alone - or any big platform for that matter - but this is quite literally someone just downloading the entire Twitch ecosystem and publishing it online.
The volume of data is irrelevant - source code is usually teensy tiny and of far more value to companies than, say, three months of livestream chat logs.
I'm not certain what security hardware you're thinking of - but I'm pretty sure I hate it already since it doesn't effectively guard anything while making everyone's lives difficult. For effective corporate security you need 1) data use policies and 2) access control lists - both of those are generally more effectively implemented at an entirely software level.
I am trying to recall, but I am pretty sure when I worked in Microsoft Office that a build would pull down many tens of gigabytes of data.
125GB in one day from the build system wouldn't be uncommon!
Remember that Twitch handles streams. Good luck implementing this without having all sorts of false alarms everywhere.
Plus, you don't have to exfiltrate 125GB in one go.
So let's say someone with access to all GitHub repos gave the password to someone else, maybe then it was downloaded from another machine?
Or someone stole the credentials and downloaded from another machine?
Or someone got access to such a machine?
It's it not possible to prevent these cases?
How long does such a download take?
Running security at scale in a hypergrowth B2C company is very difficult. It's also completely different from running security at a startup, in a B2B company, or a slower-growth situation. _Every_ security executive and manager I've met has given up in frustration after 12-24 months and gone to take a cushy FAANG job instead.
I'm not surprised at all. My experience in security at a larger SV unicorn was that changes only happened in the immediate aftermath of a security crisis. Otherwise, there was incredible inertia and you just wouldn't be able to get the institutional support you needed to make progress.
Within Amazon those are almost going to be two entirely separate companies, with very different security focuses.
The idea that Amazon is monolithic and uniform wasn't true when I left there in 2006, and I'm certain it is less so now.
And that isn't just that its related to the merger, but that fundamentally its different business orgs with different focus.
From what I heard about Twitch-interns over the years, it seems the company is more a third-rate-s**hole that grew too big too fast and accumulated a huge amount of technical debt and fatal security flaws. Making billions doesn't mean anything if you don't invest them back into the important corners of the company. It's considered a miracle that the platform is still working that well in that state. And what comes from the leaks so far supports this view.
Though, said that, it seems they did start to improve one or two years ago, just too late to prevent this critical hit. But considering this was also a strike that avoided the deadly parts (yet), maybe there is a different aim here and the company can grow from this? It will be interesting to see how Amazon will react to this.
I mean this as a genuine question, but is there any company that didn't end up like this after an exponential growth phase? I'm not saying it's okay, but this feels par for the course. I've now been at two start ups during that hockey stick growth time and both went through this as well.
I'd be curious if anyone here has worked at a large, fast growing tech company where they didn't accumulate a ton of technical debt during growth. If so, what did the company do to prevent that?
It'd be strange if they don't have two factor auth, of course, but it's just as strange to have this large of a hack.
I think if it is a simple case of an employee account takeover, then the attack would "work" to some extent at any company. Larger companies typically have strict data access requirements, though. Good luck finding the few employees who have raw access to Google password hashes, for example. And even more luck knowing how to get that data if you do.
Yes, IIRC everyone at Amazon has a hardware security key (which is more secure than the standard mobile app TOTP most of us use everywhere online).
Luckily iirc from a conversation with a senior Twitch engineer the Tax information backend has been migrated to Amazon. So hopefully that did not leak... Because that would be full legal name and addresses of a ton of streamers that likely have stalkers.
https://www.theguardian.com/technology/2012/feb/17/facebook-...
…except Mangham didn’t ever get to release his spoils to The Internet?
Linkedin, Microsoft, Yahoo, Google
And if speech is "radical" meaning to the point of illegality, shouldn't the legal system decide, rather than the court of public opinion?
Because you expect Amazon to put security priority over new features and profit? We have very different understandings of what Amazon stands for.
I don't know what you think Amazon stands for, but Amazon runs the largest cloud hosting service in the world - AWS, which not only runs a large number of other large companies but governments as well. I know, first hand, that their datacenter security protocols are state of the art.
Amazon has a much larger surface attack area so if they were playing fast and loose with security, chances are we would know already.
Too bad, it would be nice to see someone go through and document how Twitch works. I've never worked at "web scale" so I'd probably learn a lot.
As someone who has worked at both large and small companies, you'd probably be disappointed.
The download was posted to 4chan today, described by its unidentified source as “part one” of “an extremely poggers leak,”
> Calling Twitch a “disgusting toxic cesspool,”
This will help with ad preroll blockers.
I would love to see someone look deep into Twitch recommendation system - last time I tested the thing they call "Feedback" is a rolling buffer and wont let you exclude more than ~100 things, adding more simply removed oldest entries and started spamming you with things you already excluded in the past. This looked like performance optimization (less things to track per user).
You get a "twitch commercial break in progress" video for the time the ads are playing.
You can check this by loading a stream with MPV.
>You can check this by loading a stream with MPV
I watch all of my twitch using mplayer. "magic incantations" when generating access token is what produces ad free .m3u8. For example early methods involved setting origin and/or referrer headers to internal Amazon systems.
How would current AWS policies hold up? Obviously the code would be illegally acquired, but do they have detection mechanisms in place?
Many times there is some magic command only one guy knows and he will share with you on slack.
Rubbing a service of any complexity takes years of institutional knowledge.
What language, and framework if they use one, do they use?
WARNING: do not click the link, copy it and paste it in new tab.
(twitch used to sponsor and attend local ember.js meetups)
Now I wonder if the commit history has database dumps or sensitive information, which is a common practice, or if any twitch servers have been accessed through a breach or privileged information found in some of their source code.
And if you want to be perfectly safe, don't visit twitch. Because if that source code has any vulnerabilities they might be exploited against twitch visitors as we speak.
Everyone interested, just download the code :)
The chat had a few Amazon insiders, which was interesting to read their perspectives.
That I've been DONT MENTION ARROWS ON HN on this post is a good indication we're not close to solving this by a long shot.
That fact in general industry the controls on how PII data is accessed internally is so lightly managed should worry everyone
(I would also expect that the Amazon retail systems are in most senses "just another tenant" on AWS, albeit with much more liberal quotas!)
Edit: This won't help against a thumbdrive, but that type of thing should be also tracked.
So by copy + paste into a new tab, it will lose the HN referrer.
network.http.sendRefererHeader = 0Oh you're be surprised. Divisions get billed constantly for the AWS resources they consume, and this bill gets taken out of their annual budget. From what I hear, this is a common practice in most large organizations.
Also, the AWS services you can access from within Amazon are almost identical to the AWS services you can access as an external customer. It's equally easy/hard for a random company to achieve web scale, compared to Twitch.
I get your point and I am no taking about AWS but about Twitch. Each part of the company has its own incentives. Amazon is well know for not caring about quality nor its employees. In my experience with corporations there is little to no technical sharing between different parts of the company. AWS could have the best SecOps in the world and Twitch could have no security at all. Is your experience different?
As far as I can tell, there's no data to back up the assertion that these large tech companies are disregarding security if favor of profits, except for Twitch now, which is why this leak is interesting to me.
Amazon is all about sharing efforts with the company. That's the whole point of AWS - its a monetization of this efforts. Most older AWS services started out as internal services that someone realized was generally useful.
But it could also be a 128GB thumb drive plugged into the system somewhere.
Just log in to FB messenger or Discord and egress it as small data chunks that way. Lots of people have private chats on work computer for practical purposes.
Discord allows for bots, so you could easily write a script to chunk data and egress, and another to re-assemble.
I highly doubt it would be possible to do something like this at AWS, just because hosting multitenant infrastructure and working with the government forces you to implement security since you're being audited and awarded contracts on that basis. Twitch users don't give a crap about the security of the platform. They just want to monetize as quickly as they can, too.
So I'm not hugely surprised that practices and culture would be different even if they have the same parent company, especially since Twitch was an acquisition. Even if not, though, I'd expect security at Prime to be better than Twitch but worse than Marketplace, Marketplace to be worse than AWS, etc. All speculation since I've never worked at any Amazon product, but that's what I would expect.
Quality of life and developer experience are important topics in many ways, but should they really trump security consistently? It's always going to be dependent on people's risk assessment and comfort, but frequently it skews the wrong way because the people making the decisions know that they'll be gone.
My company can shut off my access to the all the databases when they stop asking me to troubleshoot any and all data issues. Which will never happen.
You can look at leaked source code for educational purposes in most places (not legal advice). As far as I understand leaks are commonly used in vulnerability research for example (if the bad guys can use it so can bug hunters).
Streaming copyrighted material is a separate issue - but using it for "criticism, comment, news reporting, teaching" should fall under fair use, no?
I can certainly understand why twitch banned this and don't blame them (although I think it's stupid), but I see nothing unethical about openly talking about this code in the public now that it's already there.
Copyright would disagree with you, and I would say that ethically it is basically the same as stealing it yourself. You're profiting off of someone else having done the dirty work for you.
> this isn't someone's personal life being exposed.
Apparently a lot of payment information, telephone numbers, etc. was also in the leak. I don't think we should downloading or encouraging people to download and peruse that stuff.
Is it though? The "wrench theory" applies here. It's not unthinkable that an employee was stalked on social media and had their key stolen.
More secure != perfectly secure.
I don't know which protocols they use (obviously), but if they use WebAuthn, everything is public-key signatures. Even if you leak everything from the server, public keys buy you nothing.
This leak has made me understand clearly that code quality is not what makes a product great.
I guess that’s something.
The jenkinsfiles are mostly nice and clean though. I’ve definitely seen worse of those.
But I also say it like that because, well, I've seen code that causes (objectively easy-to-fix) crashes but still ships because of one reason or another: laziness, politics, inexperience. It's a part of software engineering I'm still trying to accept.
(This is a joke but also, at many companies, it’s not. Twitch was once small and grew. Who knows what ancient all-access switches are still critical to running the systems, marked “tech debt” in someone’s backlog)
Once you have "DevOps" the devs are ops, your head count drops, and all that pesky security and other things those dirty sysadmins wanted are gone
kinda /s sometimes I think that is really want managers think about devops
I know, I know, service contacts but my point is sometimes engineers need at least temporary access to provide support at times.
It protects the company against rogue employees (not even strictly malicious, but also curious employees who want to see more than they should). It limits exposure if an employee's account gets hacked (my pet theory for this Twitch hack). And if something does go wrong, logs help track down the issue/leak.
And at the end of the day, there should be a lightweight way to request access. Many times I've seen people request access that they didn't actually need. And most other times they have access pretty quickly.
And you can try to prevent them from accessing live/real customer data, but the cost is that they will never be able to debug issues in production. Most companies, even very large ones, are just not able to pay that cost. Not to mention that once you have access to the codebase there are a million ways to leak customer data anyway -- it is a lost battle.
For the rest of the stuff, there's a sliding scale. In no universe does your average twitch developer need raw access to password hashes, for example.
This is just a guess but I wouldn't be surprised if companies have to start taking stricter precautions with their security in a WFH world.
The issue that building these systems accurately so they are NOT a constant annoyance is difficult, expensive, and takes a large team to support well.
If someone who doesn't have a business need to upload lots of traffic begins uploading large amounts of data, you may ask questions. Maybe you kick off a scripted playbook that then checks for increased logins to other privileged systems, or for large transfers of data from internal sources to the user's desktop.
Anything else is found quickly. I certainly wouldn't even dream of someone extracting the repo.
And if it's a remote FB/VNC connection, what is preventing you from just recording the screen? Not really hard...
Most companies I've seen could see all their code extracted with one malformed NFS packet. These are "air gapped" systems holding the type of industrial secrets that we don't want to leak to china. Practically the only real line of defense they have is employee screening, which does not really stop the lone man guy.
The payout data likely wasn’t ripped from a DB but rather dashboards which customer service or partnerships likely had access to. Tier1 or Tier2 support kinda stuff.
This smells like a stolen backup or maybe network access and http scanning, finding the internal GitHub and maybe a support admin cred that allowed dashboard view.
I would classify that as access to production systems.
The other access rights that come from staff access is either incedential or miss /debt in architecture.
The leak includes source code of multiple active websites and applications that are operated under the umbrella of Twitch/Amazon.
Why would an intern have access to this data?
monorepos are a thing at several companies (e.g. Google).
I don't think anybody is streaming this stuff on twitch with the intention to make money, anymore than someone sharing it on a blog is trying to make money. Sure, in that edge case I'd agree with you, but it seems like the exception to the rule (after all people can just go look at the code themselves for free). I'm not talking about the guy who stole the code and is likely ransoming Amazon with it - I'm talking about people that just like to talk about code because it's something they like to do (there's an entire category for it on twitch already).
> Apparently a lot of payment information, telephone numbers, etc. was also in the leak. I don't think we should downloading or encouraging people to download and peruse that stuff.
My limited understanding is none of this information actually has been leaked yet, and is likely part of a future ransom (I could be wrong, I haven't looked because I don't care). I don't condone sharing that either, but that's not what the guy streaming was sharing. I'm talking about discussing the source code which is already publicly available.
> Copyright would disagree with you
I know very little about copyright so I'll just assume you're right. I still see no ethical problem with openly discussing this code publicly though. Anyway, agree to disagree.
The main security benefit is unphishability. With yubikey/webauth crypto is used so you can't give the code to the wrong website. Phishing is a pretty major cause of account hacks generally, so pragmatically that is a very big win.
With a Yubikey, you need to use your password to log in to your computer, and then need to auth using Yubikey.
With OTP app, you need to use your password to log into your computer, passcode for phone, and then auth.
In both cases, it's something you know, and something you have. You could argue that the app based is a bit more secure in that you need two passwords. On the flipside, if your phone gets pwned, someone can access completely remote.
Everything is a tradeoff.
"your average twitch developer" needs access to the password hashes or at least the code that checks these hashes the moment they need to debug an issue which involves logging in, and from then its all downwards.
On the flip side while many smaller repos _can_ have independent ACLs, you are very unlikely to set those up until you reach a certain scale -- and then when you reach that scale it gets hard to implement ACLs across everything at once. So your engineers probably all have access to all your repos until you reach a very large size anyway. So the question becomes just "can someone write a for-loop over all of the repo names and check them all out," and it's like, yeah, that's not terribly hard, I as a programmer can do that pretty easily in bash.
Ideal repo size should not in my view be directed at "how do I prevent compromise to the external world," because VCS is not designed to give you the superpower of being resilient around being compromised. Rather VCS is trying to give you the superpower of time travel. So you should probably scope your repo to "what is the unit that makes sense to time travel with?" -- in other words if you are adamant that you have these independent services which operate decoupled and running this one backwards by a year should not affect that one, then those services should be in separate repos. If on the other hand they have some moderate coupling and rewinding this service by 1 year would break the APIs that that service uses to communicate... then those should ideally be in the same repo so that you can coordinate changes between them to their shared protocol.
Happens at my company. We have rudimentary ACL but not sure how its implemented because you can find things via explicit searching, or via "organic finding" via links from repo->repo but it won't be surfaced if you just search for code.
Google, for example, has a small number of subdirectories in the tree that only certain engineers can view (the really sensitive stuff, like the actual ranking algorithms for search and ads) but the build system is setup to allow you to still link against it.
Assume some end users used the same passwords on other, non-twitch accounts. That's what makes hacked passwords valuable, no matter where they came from.
Never implemented auth myself.
But mistakes such as salting with just the username are sometimes made even by very large companies and in that case, hashes could be the same.
Ideally that would be useless because things are properly salted and you don't know the salt, however with access to all of the source code as we have here I think it isn't as clear cut, as it may be possible to reverse out the salts as well.
I'm not a cybersec guy so please take my speculation with a grain of salt.
Salting isn't really supposed to make a hashing algorithm secure by being secret but by being unique. Unique salts make hashing more secure because an attacker can't re-use a single rainbow table for multiple hashed passwords. That, combined with a sufficiently computationally difficult hashing algorithm, it makes it prohibitively expensive to reverse the hashes of all your users.
This may not be enough to protect high value users or those who use fairly common or easily guessable passwords. This is part of why it is so important that you don't reuse passwords. It's also why your application should reject all known passwords using something like https://haveibeenpwned.com/Passwords or any of the "common password" list you can find online.
Edit: If you do include a secret that is stored seperatly that is added to the password and salt when hashing, this is called "peppering" and these peppers are generally not unique per user.
How long does it actually take in practice to break something like this? I would love it if someone could prove it to me.
I just googled it and found https://hashtoolkit.com/decrypt-sha1-hash/b85ffa7dae2cbed04e... along with other results.
Even bcrypt is not that hard to find a solution to a hash if it didn't use enough rounds.
I learned a bunch of this when a company I worked for was breached and wanted to see just how easy it was to solve out weaker passwords in our db.
With regards to crypto mines being used for breaking hashes, if you have one based on GPUs, yes, you could reuse GPU mining hardware for cracking hashes, albeit with relatively low hashrates for current best practice hashing algorithms.
If you're looking at something like Bitcoin's hashrate and thinking that it could be used to break SHA2 hashes, as far as I understand ASIC miners, this is not possible, as ASIC miners are designed only for mining, and they don't really accept non-mining related inputs (ie, no arbitrary inputs to be hashed, unless it matches Bitcoin's specific steps for iterating over nonces).
I'm really curious where people get their ideas about salting. It's not just a word. It doesn't make one password any more difficult to crack. It makes cracking every password in a given database more difficult to do. A password's salt is public information.
If you're saying that Twitch runs their developer environment in a lousy manner (and you have proof of this), then please go ahead.
But to imply that an intern/average developer would be given access to all this branching information is ignorant.
Maybe the super secure siloed world doesn't really exist outside of military/government organizations.
"Hey, pick through everything I say with a fine-toothed comb and treat it as the official company stance!"
I suspect that's a lot more controlled these days, but it wasn't very uncommon for signified staff to be trolling along with everyone else.
The LinkedIn password leak contained hashed (but not salted) passwords, and some of those where cracked and exploited in the wild.
My old gaming PC with a 1060 can apparently do ≈ 6300 * 10^6 hashes per second. Assuming your password above is az-AZ, 0-9 = 62 possibilities (with no salt) it would take me 10 seconds to test all combinations for 6 characters and 30 days for 9 characters. And it's a trivially parallel problem, making it easy to throw money on to make it wall-clock quicker.
It's just a simple brute force problem, I don't see what there is to question (beside the choice of SHA1 for password hashing...).
The hashes of previously unused passwords were brute forced, or passwords were reused across sites from a previous plain text dump and exploited? Because there's a big difference between those two things. If your password is reused and originally compromised , you're screwed regardless, and having the leaked hashed passwords doesn't leave you in any worse a situation than before.
> My old gaming PC with a 1060 can apparently do ≈ 6300 * 10^6 hashes per second. Assuming your password above is az-AZ, 0-9 = 62 possibilities (with no salt) it would take me 10 seconds to test all combinations for 6 characters and 30 days for 9 characters. And it's a trivially parallel problem, making it easy to throw money on to make it wall-clock quicker.
So practically infeasible to exploit? The claims that are being made (even in this thread) are that having a mining rig would let you brute force a SHA1 hash, but based on the numbers
> It's just a simple brute force problem, I don't see what there is to question
If it's "just a simple brute force problem", and SHA1 is the only issue, then my question is what's the password in the hash above? You (and others here, on reddit, online) are telling us that this is a trivial problem.
That only tells you the passwords are the same.
You can do this anyway. But the space requirements of a rainbow table are so large that including an account's username in the password would make a rainbow table completely unfeasible.
Salts are there to ensure that two accounts on the same website which have identical passwords nevertheless have different password hashes.
Can I try again? Sha1 e7b7cdf949007abe7e8a190ba8eae56c60018c1f
Took me 6 minutes to try all 1.4 trillion passwords. So either you have a strong password or I messed something up. What is it?
In theory if your password was weak enough to be on this list it would take on average 3 minutes to break it on a GTX 1080.
Sha1 is not a very secure/expensive hashing algorithm and thus does make it significantly cheaper to break even with a unique salt.
Your idea of what a rainbow table is appears to be unrelated to what a rainbow table actually is. A rainbow table is prepared in advance, not generated in the process of cracking an individual password.
Ok, so how long does it take to break the hash I've provided if it's not very secure?
I believe there are documented instances where previously not leaked passwords were cracked. Of course not 128 bit random strings, but still passwords more "complex" than what you previously posted. If you have 100 million hashes to try, you will crack some. People are generally have bad passwords, especially in 2012, even if the plaintext weren't available anywhere...
> So practically infeasible to exploit? It depends on how strong the password is and how much money you have to spend. For 32 USD I get an hour with p4d.24xlarge that has 8 graphics card, that in total can do about 175 * 10^9 hashes per second. 20 hours (and 640 USD) machine time (not wall clock time) on that machine can do what 30 days on my old PC does.
> If it's "just a simple brute force problem" […] If you can give me a bound on the number of combinations, and an AWS account to bill, I and many others would gladly attempt to crack your hash :-). But if your second hash is >9 alphanumerical characters we will probably just burn electricity to no avail.
I don't even know what you are arguing?
EDIT: Now that you have some numbers of hashing rates and cost, you can figure out how expensive different passwords are to crack with different approaches. Two common dictionary words with two numbers appended? 6 random alphanumeric characters? Then think about how expensive the cheapest non-leaked password is in a database of 100 million users are...
Is it bad to store plaintext passwords? Yes, obviously. Is some hashing better than none. Yes, obviously. Is salting your hashes much better than not. Yes, because with a salt, your first password wouldn't have turned up on Google / in rainbow tables. Is it even better to use a proper PBKDF. Yes, with a pretty aggressive PBKDF, brute forcing even low-complexity passwords become expensive very quickly, and we get the benefits of salting "built in".
Can SHA1 / MD5 hashes be cracked even if not the _exact_ password-hash pair have been leaked previously? Yes, very much so.
I managed to lock myself out of a dogecoin wallet. I have the hash of the passphrase, so I figured I'd give it a go cracking it. After a few weeks (and a larger than usual power bill) I sent it to some friends with good mining rigs to try and take a stab at it, willing to split the amount 50/50. Its only the passphrase, not the full wallet, so I'm not worried about someone stealing the doge.
The passphrase is probably 15-25 characters, mostly not dictionary words or simple letter/number/symbol substitution, only symbols easy to type on a US keyboard. I'm now about 6 months trying to crack that password with probably a few hundred dollars of electricity used overall between myself and friends (I don't know their power bill), excluding hardware cost as it was already owned, and I'm not even halfway through the search space.
Can it be done? Sure. Will I be able to crack that password with a cost that's less than the value of the DOGE in the wallet? Probably not. Right now its really more of a gamble that I'll get lucky with the rigs running. I had to tone down some of my rigs as it was getting quite hot over the summer, but over the winter I'll be chugging away as the waste heat is just additional home heat. I'll probably need to rent a considerable amount of GPU power on a cloud provider to crack it, at which point maybe it'll take me days to crack it but ultimately cost me many, many thousands of dollars in GPU-time.
Imagine two people have accounts on each of two websites:
eBay YouTube
Alice sunlight bobrules
Bob bobrules bobrules
A password reuse attack dumps the YouTube database, cracks Bob's password, and then accesses Bob's eBay account. The fix for this is that Bob should use different passwords on his different accounts. Hashing helps by making step 2 ("crack Bob's password") more difficult. Salting does not affect this attack in any way. Note that the attacker didn't bother to dump the eBay database.The attack that salting protects against dumps the YouTube database, cracks Bob's password, and then accesses Alice's YouTube account.
Now, realistically, you can't use a rainbow table on passwords of any noticeable length, and a salt may push the password over the edge of that threshold. If that's really what you want... enforce a minimum password length.
So the answer is "It's too expensive to figure out in practice, unless you're being explicitly targetted by someone with nation state level credentials?", i.e. it's pretty much fine?
> Using a more appropriate hashing algorithm with a sufficient cost factor can massively increase the amount of compute needed.
But by the sounds of it, SHA1 is more than enough (given that nobody here is willing to brute force the hash I shared above?)
> Preventing the re-use of that computational effort on additional users is why unique salts are important.
The person who "cracked" my first hash found it in a list of passwords which was actually gotten from a plain text dump 15 years ago. That wasn't found by reversing a hash, so the compute wasn't reused. You are right that once it's cracked, it's cracked and that's that, but if your password _isn't_ cracked it's moot whether it's hashed with SHA1 or something more secure, as per above?
SHA1 is "more than enough" for this specific interaction in which you chose a complex password and/or your only opponents are unmotivated/non-incentivized HN commenters that don't have a password cracker at their immediate disposal. That doesn't mean anything outside of this context.
If your opponent was a motivated hacker with dedicated password cracking machines (which do not require anything even close to a nation-state budget, btw), your SHA1 hash would be much more likely to be cracked. If you were a specific target of a hacker group, such as an employee of a company that is being targeted by an attack or someone known to have a BTC wallet with $10 million in it, your SHA1 hash would be much more likely to be cracked. If your password was a relatively simple phrase like "dog$aregreat2019", like the vast majority of user passwords are, it would almost certainly be cracked.
SHA1 is not even anywhere close to "enough" for general password hashing use. Don't think otherwise just because a couple of random HNers failed your little game.
edit: The premise of your "challenge" is also not equivalent to the goals of most hackers. Unless you are a specifically known and prioritized target (because you're a celeb, VIP, wealthy person or something like that), the goal of a hacker is not to take one specific hash and crack it, because the success of that will depend a lot on the complexity of your password. The goal of most hackers in a breach like this Twitch one is more like "just throw it all at the wall and see what sticks". They take a massive database of thousands of hashes and spend a few hours to see what can be cracked, taking advantage of the fact that while some people may have complex passwords, most do not. After a few hours, maybe they crack 90% of the SHA1 hashes in a leak. Maybe your password was complex enough that it was in the 10% that wasn't cracked; good for you, but just because your password remained uncracked doesn't mean SHA1 is "enough". The hackers still got the other 90%.
Absolutely not and that is a ridicoulous conclusion to draw. State-level resources are absolutely not required to break sha1.
> but if your password _isn't_ cracked it's moot whether it's hashed with SHA1 or something more secure, as per above?
Again, absolutely not. The algorithm and cost setting have a huge impact on the practical likihood that an attacker will crack your password.
Anything under 9 characters I can brute force in minutes. 9 character passwords would take me 9 hours.
Obviously if someone has a nest of the latest GPUs then they could go a lot faster.
But yes if your password is uwv&6qu_brusb618_$@618jg then it doesn’t really matter how you hash it.
No, it doesn't. You could reuse uwv&6qu_brusb618_$@618jg everywhere and it wouldn't get cracked. If the plaintext password leaked, then you'd be in more trouble.
What matters is whether your password is easy to guess, not whether you've reused it. If you have all unique passwords, they can still all be trivial to crack.
"Salts defend against attacks that use precomputed tables (e.g. rainbow tables)" https://en.wikipedia.org/wiki/Salt_(cryptography)
Even if I can only hash a million a day, if your password is one of the top million most popular, and I have a good list, I'll have your password in a day. And if you re-used it...
Salts do make naïve brute-force, all-possible-strings approaches useless, yes.