Ask HN: How do orgs detect conflicting hashes in sub-second timing? Today I was considering how organizations such as bitly, imgur, reddit, and even github handle hash collision prevention. While for GH it (seems) easier as commit hashes are sufficiently long/unique, for others with character hashes that only span ~6 chars or so, there's bound to be instances where hashes conflict from a statistical standpoint. (iirc reading an article on a hash collision in GH a few years ago here on ycomb) To my mind these orgs have to have a suite tools/algos requesting information from multiple services, checking whether or not a hash has been taken – and those processes have to optimize for time. (e.g. when a user makes a post, what's a reasonable time to do a lookup?) So, what are the considerations which need to be made algorithmically to check such collisions while keeping runtime to an acceptable minimum? |