Adding a SHA1 collision vulnerability test hoses WebKit's source repository

Adding a SHA1 collision vulnerability test hoses WebKit's source repository(bugs.webkit.org)

292 points by raingrove 9 years ago | 59 comments

fanf2 9 years ago |

OK, this is quite a serious vulnerability in Subversion. SVN depends more on raw file SHA1 hashes than git because git prepends a header which prevents raw SHA1 collisions from translating directly into easy svn-style repository corruption.

The reason svn is broken is its "rep-sharing" feature, i.e. file content deduplication. It uses a SQLite database to share the representation of files based on their raw SHA1 checksum - for details see http://svn.apache.org/repos/asf/subversion/trunk/subversion/...

You can mitigate this vulnerability by setting enable-rep-sharing = false in fsfs.conf - see documentation in that file or in the source at http://svn.apache.org/viewvc/subversion/trunk/subversion/lib...

This feature was introduced in svn 1.6 released 2009, and made more aggressive in svn 1.8 released 2013 https://subversion.apache.org/docs/release-notes/

SVN exposes the SHA1 checksum as part of its external API, but its deduplication could easily have been built on a more secure foundation. Their decision to double down on SHA1 in 2013 was foolish.

acqq 9 years ago | |

> this is quite a serious vulnerability in Subversion

I rather believe it's a minor bug, and that once it is fixed, they can actually keep using SHA1 as before, without having the denial of service when somebody tries. Then, for example, if somebody actually tries to put two files with the same SHA1 but different MD5 they can reject the second one before accepting it. Or they if there are two different files with same SHA1 and they accepted both and they store only one content, SVN can still continue to work. So you can't get the second unless you, for example, put it in some archive format first and then put in the SVN, OK, your problem, the SVN would still work for anything else.

In short, it sounds like a denial of service at the moment, but I think that DOS can be avoided without changing the hash algorithm.

However, I'm sure that SVN is not the only source base that was never up to now tested with two different files that have the same SHA1.

stsp 9 years ago | | |

Apache Subversion developer here.

Andreas Stieger (SUSE, SVN) has written a pre-commit hook script which rejects commits of shattered.io style PDFs

https://svn.apache.org/viewvc/subversion/trunk/tools/hook-sc...

This is the first mitigation available. If you are responsible for an SVN server at risk, please make use of this hook.

If somebody could make a similar hook for Windows and post it here or to dev@subversion.apache.org that would be highly appreciated.

(edit: switched script link to HTTPS)

fanf2 9 years ago | |

I didn't realise yesterday that svn also uses SHA1 for deduplicating pristine filed in its working copy. So disabling rep-sharing isn't enough to prevent broken checkouts: you need to prevent any SHA1 collisions from being committed. See the link from Stefan Sperling (stsp) to the collision rejection script elsewhere in this thread. There is more info about what they need to do to fix this from Stefan at http://mail-archives.apache.org/mod_mbox/subversion-dev/2017...

phaemon 9 years ago |

As mentioned in a previous comment ( https://news.ycombinator.com/item?id=13722469 ) git doesn't see these the same as it hashes the header+content which breaks the identical SHA trick.

Of course, I first tested this on our main production repository at work because...oh, wait, I didn't because what were you thinking?!

mikeash 9 years ago | |

I don't think they meant to test it on the production repository. Rather, they added a test for something in WebKit, and it didn't occur to them that it would be "testing" the repository too.

tveita 9 years ago | |

It could be made to work on Git, but you'd need to make a collision that included the git blob header. The resulting files would not have the same SHA-1 hash until the header was added though, so they wouldn't be useful except for testing Git itself.

My guess is that Git wouldn't be 'hosed' like SVN, since it currently doesn't have a secondary hash to detect the corruption. It would simply restore the wrong file without noticing anything was amiss.

sverige 9 years ago | | |

> It would simply restore the wrong file without noticing anything was amiss.

Why hasn't Git switched to SHA-2? People have been warning that SHA-1 is vulnerable for over a decade, but that vulnerability was dismissed with a lot of hand-waving over at Git. Is it a very difficult technical problem to switch, or just a problem of backward compatibility for existing repos (i.e., it would be expensive to change everything over)?

jmount 9 years ago |

(from the link) "For the record: the commits have been deleted, but the SVN is still hosed." That is pretty much my memory of working with SVN. I remember SVN fouling its database a few times. Sure I've broken git a few times, but I am always able to (as Jenny Bryan says) "burn the whole thing down" and take state from another copy of the repository.

I really tried with SVN (wanted something better than CVS) for quite a long time.

mst 9 years ago | |

I've done surgery on svn repos to unhose things a few times over the years, usually due to PEBCAK rather than svn shitting itself. It's actually pretty doable, up to and including the equivalent of interactive rebase.

I much prefer that git's designed to let me do such things and provides tools for doing so, but you can totally rewire svn repos with vi and a bunch of swearing if necessary.

(and I was using svk for a merge tool at the time so I did have the option to burn it down and rebuild from scratch; unhosing svn repos wasn't quite unpleasant enough for me to want to do so)

Then again, I started off doing more ops than dev and have also happily hand-edited mysql replication logs to unfuck things after a partial failover, so I may have more of a masochistic streak than you do :)

lima 9 years ago | | |

I fondly remember editing the raw metadata in a Gluster cluster to recover it after a three-node split brain :)

Negitivefrags 9 years ago | |

To provide a counter anecdote, my company has used SVN for 10 years across hundreds of thousands of commits for a repo that is now 1.2TB in size and not once have we needed to restore from backups.

The bugs you can expect from software that assumed no hash collions are going to be pretty arbitrary. There was that stack overflow post about what happens with Git with collisions and it didn't seem great either, it's just that what gets hashed happens not to collide in this case.

lumisota 9 years ago |

Isn't it the SVN repo that's "hosed", not the Git repo as suggested by the title?

PuffinBlue 9 years ago | |

Yes, the mailing list post backs this up:

https://lists.webkit.org/pipermail/webkit-dev/2017-February/...

daenney 9 years ago | |

Yup.

> For the record: the commits have been deleted, but the SVN is still hosed.

afandian 9 years ago |

Reminds me of when I worked at an antivirus company. We had be careful with the EICAR file in test code because it would set off AV alarms. http://www.eicar.org/86-0-Intended-use.html

isp 9 years ago |

New SVN attack category: denial-of-service by SHA-1 collision.

dsp1234 9 years ago | |

New SaaS service: Repository SHA-1 collision detection

raziel2p 9 years ago |

A bit hard for me to tell what happened here, maybe because I don't know anything about SVN. The two PDFs with equal SHA1 hashes were git commited to the repository, but converting that to an SVN commit failed because... SVN can't handle two separate files with the same SHA1 hash?

espadrine 9 years ago | |

This might be at fault:

> Subversion 1.8 avoids downloading pristine content that is already present in the cache, based on the content's SHA1 or MD5 checksum.

https://subversion.apache.org/docs/release-notes/1.8.html#pr...

wyldfire 9 years ago | |

It's likely some part of the svn implementation that assumes that the SHA1 signatures guarantee uniqueness within a repo. And they might use that hash as an identifier.

I'm guessing shattered-1.pdf and shattered-2.pdf have identical hashes but distinct contents. It's not clear for me to know why this results in a "checksum mismatch."

    Checksum mismatch: LayoutTests/http/tests/cache/disk-cache/resources/shattered-2.pdf
    expected: 5bd9d8cabc46041579a311230539b8d1
        got: ee4aa52b139d925f8d8884402b0a750c

EDIT: see https://news.ycombinator.com/item?id=13725312 for the answer

phaemon 9 years ago | | |

Heh, because those are the md5 checksums which don't match.

  $ sha1sum shattered*
  38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-1.pdf
  38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-2.pdf

  $ md5sum shattered*
  ee4aa52b139d925f8d8884402b0a750c  shattered-1.pdf
  5bd9d8cabc46041579a311230539b8d1  shattered-2.pdf

As you can see.

mintplant 9 years ago | |

They were directly committed to the SVN repository, apparently breaking SVN's tooling even after the commit had been deleted. The git-to-SVN mirror script was the first place where a failure was noticed and was initially thought to be the only broken bit.

fapjacks 9 years ago |

I have to just say here that WebKit is one of the most over-the-top software projects I've ever tried to dig into, in my twenty years of programming. Building it inside a vanilla container was impossible following their directions exactly and required so much research on my part to get working. I'm used to a bit of back-and-forth with just about every project, but WebKit was ridiculous. After two workdays of trying, I'd been able to build a WebKit from the source, but at that point had to concede to the universe the futility of trying to build a golang-based Phantom, as my friend and former coworker originally wanted. And that also gave me mad respect for Phantom's author and immediately taught me why they do not often incorporate new WebKit versions into the project instead of just pegging to the first one they can get to build.

paulddraper 9 years ago |

Site is down.

sigjuice 9 years ago |

This is why a git clone is not a real backup.