The end-to-end refresh of our server hardware fleet

The end-to-end refresh of our server hardware fleet(code.facebook.com)

206 points by sloanesturz 9 years ago | 145 comments

cortesoft 9 years ago |

I really dislike this sort of naming scheme (Bryce Canyon, Honey Badger, Mono Lake, etc)

The names tell you nothing. You can't tell which one came before which, or even what they are. You just have to KNOW that information. A good naming scheme tells you information about the thing named.

wmf 9 years ago | |

Historically, code names were chosen specifically to give up no information and it seems like that tradition continues, perhaps unintentionally.

ozim 9 years ago | |

I don't see the problem, I work with naming scheme like XYZ1530 and for knowing which server is which or what is installed we have documentation, name is only reference to find information in docs.

So if you work daily with server you know by heart what is on it, if not any "descriptive" name would only mislead you, because probably stuff changed a lot since naming.

I think the same for hw components, you have to look it up anyway in documentation, because some dimension could be changed after a year.

walshemj 9 years ago | | |

But names like XYZ1530 are harder to recall and these are name for a class of server

erobbins 9 years ago | |

I work here (FB) and I completely agree. I can never keep the names straight.

noir_lord 9 years ago | |

100℅ agreed, drives me crazy with Ubuntu, I have to google the names to get the versions.

Never understood what was wrong with 16.04.2 vs Xenial Xerus (had go google the xerus part just now).

rrdharan 9 years ago | | |

The Ubuntu names are chronologically in alphabetical order.

chrisseaton 9 years ago | | |

Is writing ℅ instead of % some kind of meme? I'm seeing it everywhere all of a sudden, but I can't imagine you can type such an obscure character by accident and so frequently. Wondering if I'm not getting the joke, or if it's some kind of obtuse political or technical statement about something.

XorNot 9 years ago | |

The idea is to avoid people assigning bias: I.E. If something is said to be Mk 1 and Mk 2 people are likely to desire the Mk 2 despite having no practical basis for that.

LeifCarrotson 9 years ago | | |

The whole point of refreshing the hardware fleet from Mk 1 to Mk 2 is that it has a practical basis that they will benefit from.

> Big Basin can train models that are 30 percent larger because of the availability of greater arithmetic throughput and a memory size increase from 12 GB to 16 GB. In tests with popular image classification models like ResNet-50, we were able to reach almost 100 percent improvement in throughput compared with Big Sur

Mk 2 is better than the Mk 1 in several important ways. They're not creating Mk 2 for no reason!

hkmurakami 9 years ago | |

Is there since information in Intel codenames I'm perhaps unaware of?

krylon 9 years ago | | |

I haven't been able to keep those straight for years. Maybe this is just me getting old, but I miss the old days, when you could easily tell that a Pentium is faster than an 80486, and that a Pentium 133 is faster than a Pentium 100.

These days, CPU speed matters less than it did back then, but there still are CPU-hungry applications (I'm looking at you, Autodesk Inventor!), and if I had to put together a PC from scratch (which I think I'll actually sometime this year), I would be kind of lost.

throwawayish 9 years ago | | |

Intel probably intentionally advertises with their weirdo socket names (1156 -> 1155 -> 1150 -> 1151) just to confuse people more. Heck, they probably choose the pin counts in such a strange order just to be more confusing. It's not like they have usable names (Socket H, H2, ...).

walrus01 9 years ago | | |

A long time ago Intel CPU core codenames were geographic features in or near Oregon.

devonkim 9 years ago | |

The naming conventions seem like a way more fun variant of US Intelligence Community naming conventions. But even there there's some scheme for terms that reveal a little about its classification and originating agency unintentionally.

JohnJamesRambo 9 years ago |

So much equipment and money devoted for something as pointless as Facebook. I wish it could go for something cooler and more useful. Something that hasn't been shown in studies to make us feel lonelier.

hyperbovine 9 years ago |

The last time hey did this it flooded the market with dirt cheap E5-2560s. Are we in for a new updated deal?

loser777 9 years ago |

I hope some of the old hardware makes it to ebay, but it looks like many of the form factors are proprietary.

ssttoo 9 years ago | |

s/proprietary/opensource/

http://www.opencompute.org/ :)

jacquesm 9 years ago | |

What do you intend to do with it?

I've found that almost any kind of short lived experiment I can do cheaper on AWS than doing it with hardware that I own. If it is longer running then it might become viable to own the hardware.

hueving 9 years ago | | |

It's sad to me this is becoming the status quo. Using other massively centralized companies for compute resources is a sad future.

It's bad for privacy, it's bad for diversity to protect against SPOFs, it's bad for general computing hardware (vendors primarily target the giants), it's bad for users via vendor lock-in, and it's bad for open source projects in the infrastructure space.

I think hackers justify it to themselves by pretending it's a commodity like electricity, but it's far from that. If my utility goes out, I can turn on generator and get exactly the same electricity. If Amazon goes out I have to build again on another cloud from a (hopefully recent) backup or just sit dead (like the recent s3 outage).

Sorry about the rant, but is there anything that would get you to stop giving the keys to the kingdom to Amazon?

archimedespi 9 years ago | | |

eh, owning hardware is fun and a great learning experience

rb2k_ 9 years ago | |

A lot of them already are:

http://www.ebay.com/sch/i.html?&_nkw=open+compute+server

scurvy 9 years ago | | |

That's all regular Quanta gear. No idea if it was owned by FB or another OCP adopter. OCP is popular with mineral companies.

nananonymous 9 years ago | | |

If facebook is just now announcing their upgrade should we expect these prices to go down?

nodesocket 9 years ago |

How long until Facebook joins the public cloud business with Amazon, Google, and Microsoft?

wmf 9 years ago | |

Never? Their infrastructure is cool but it's only around half of what a public cloud would need.

nodesocket 9 years ago | | |

Never... I'm not convinced. Rollback to when Amazon was pre AWS. Everybody thought they were crazy announcing they were getting into the datacenter and cloud business. I'd say it has worked out well for $AMZN.

motoboi 9 years ago | | |

Well, they could double it then and reap the benefits of scale.

randartie 9 years ago | | |

Half in what sense?

CoolGuySteve 9 years ago | |

Ya, this is nice and all but until I can rent time on one of these servers I don't really care all that much. Are these OpenCompute designs hosted anywhere other than Facebook?

It feels more like they're bragging more than anything.

tristor 9 years ago | | |

Yes, you can host your apps on OpenCompute hardware today with Rackspace Cloud OnMetal among other providers. You might find the list of involved companies for the OpenCompute Project a good start. You can also buy or fabricate your own OpenCompute compatible hardware thanks to its open design.

rattray 9 years ago | |

They bought Parse, and then shut it down with no mention of something else.

It's hard to imagine that they're currently planning on getting into the Cloud space.

jchrisa 9 years ago | |

Parse

dexterdog 9 years ago |

100 million hours of video played per day. Are people actually watching this video or are they just inflating the number?

ploggingdev 9 years ago | |

Facebook has over a billion daily active users, so 100 million hours divided over 1 billion users is 0.1 hours/user which is 6 minutes per user. Seems reasonable. Of course, there are lots of people who don't watch any videos and on the flip side there are a lot of people who watch a lot of videos on facebook. Edit : as pointed out below, the autoplaying videos might skew the numbers quite a bit.

Veratyr 9 years ago | |

They've inflated metrics in the past: http://www.businessinsider.com/facebook-video-views-exaggera...

I'm not sure if there's any standards between platforms for these things that allow you to compare though. I'd say for example that you should exclude watches that last less than 5s or so. YouTube and Netflix may not have thought to do it because it doesn't make much sense to them but Facebook really needs to since I assume most of their video watches are automatic (accidental) while scrolling through the feed.

dexterdog 9 years ago | | |

It does matter to Netflix. They don't publish their numbers and just use them for internal metrics so you can bet that they are honest with themselves about their numbers.

mgkimsal 9 years ago | |

given a lot of auto-play video in my own feed, I'm presuming it's not all actually 'watched'. if they'd give a separate number on videos 'listened' to (where I actually unmute the audio), I'd take that number more seriously.

Strom 9 years ago | | |

This wouldn't be too accurate anymore either, because now even audio autoplays.

aurelianito 9 years ago | |

Autoplayed videos stress their server infrastructure even if no one is watching. It is OK to count them in the context of this article.

wlesieutre 9 years ago | |

Given that you can't scroll past a video without it playing, it's got to be a mix

lucaspiller 9 years ago | | |

You can disable this in the settings.

jabl 9 years ago |

I'm disappointed in the "open rack" designs. For a really minor improvement in density they have broken compatibility with standard 19" gear.

One could argue that at FB scale it's worth it, but then MS seems to manage just fine with 19".

oso2k 9 years ago |

It's interesting. If they wanted to, they could compete with the likes of HP, Dell, Lenovo, and Cisco if they could ramp up production to accommodate customers. I wonder who does their manufacturing on the backend.

wmf 9 years ago | |

Facebook uses Quanta/QCT, Celestica, and Accton for a lot of their manufacturing. You can buy Facebook servers from companies like Hyve, AMAX, and Stack Velocity but they aren't really aiming at the mainstream server market.

quickben 9 years ago |

Is it that cheaper to custom build, if you can't ebay them off at their half life point to recover some of the cost?

wmf 9 years ago | |

Most of the cost is in processors and RAM so those parts can be sold at end of life. There are server recycling companies that specialize in this.

saycheese 9 years ago | | |

Here's an example of one: http://cashforelectronicscrapusa.com

taf2 9 years ago |

When can we start hosting our services with Facebook - similar to aws, gce, etc?

trustfundbaby 9 years ago |

Anyone know what cpus/gpus they use in these?

jacquesm 9 years ago | |

"Built in collaboration with our ODM partner QCT (Quanta Cloud Technology), the current Big Basin system features eight NVIDIA Tesla P100 GPU accelerators. These GPUs are connected using NVIDIA NVLink to form an eight-GPU hybrid cube mesh — similar to the architecture used by NVIDIA's DGX-1 system. This setup, combined with the NVIDIA Deep Learning SDK, utilizes this new architecture and interconnects to improve deep learning training across all GPUs.

Compared with Big Sur, Big Basin will bring us much better gain on performance per watt, benefiting from single-precision floating-point arithmetic per GPU increasing from 7 teraflops to 10.6 teraflops. Half-precision will also be introduced with this new architecture to further improve throughput."

iDemonix 9 years ago | | |

So I just run the setup CD, right?

wmf 9 years ago | |

Facebook uses Xeon D and Xeon E5 CPUs. https://www.servethehome.com/facebook-at-open-compute-summit...

yeukhon 9 years ago | | |

This makes sense. No onboard graphic card.

ksec 9 years ago | |

No mention of AMD. No wonder why Intel isn't worried. They have probably locked up contract with Amazon, Microsoft, Google, Facebook, Oracle, IBM, OVH, Baidu, Alibaba, Salesforces, SAP, DO etc along with dozens of other slightly smaller players.

Which got me to think, what % of Market share in terms of "Server" Market, do these dozens of player own? 50%?