The end-to-end refresh of our server hardware fleet(code.facebook.com) |
The end-to-end refresh of our server hardware fleet(code.facebook.com) |
The names tell you nothing. You can't tell which one came before which, or even what they are. You just have to KNOW that information. A good naming scheme tells you information about the thing named.
So if you work daily with server you know by heart what is on it, if not any "descriptive" name would only mislead you, because probably stuff changed a lot since naming.
I think the same for hw components, you have to look it up anyway in documentation, because some dimension could be changed after a year.
Never understood what was wrong with 16.04.2 vs Xenial Xerus (had go google the xerus part just now).
> Big Basin can train models that are 30 percent larger because of the availability of greater arithmetic throughput and a memory size increase from 12 GB to 16 GB. In tests with popular image classification models like ResNet-50, we were able to reach almost 100 percent improvement in throughput compared with Big Sur
Mk 2 is better than the Mk 1 in several important ways. They're not creating Mk 2 for no reason!
These days, CPU speed matters less than it did back then, but there still are CPU-hungry applications (I'm looking at you, Autodesk Inventor!), and if I had to put together a PC from scratch (which I think I'll actually sometime this year), I would be kind of lost.
I've found that almost any kind of short lived experiment I can do cheaper on AWS than doing it with hardware that I own. If it is longer running then it might become viable to own the hardware.
It's bad for privacy, it's bad for diversity to protect against SPOFs, it's bad for general computing hardware (vendors primarily target the giants), it's bad for users via vendor lock-in, and it's bad for open source projects in the infrastructure space.
I think hackers justify it to themselves by pretending it's a commodity like electricity, but it's far from that. If my utility goes out, I can turn on generator and get exactly the same electricity. If Amazon goes out I have to build again on another cloud from a (hopefully recent) backup or just sit dead (like the recent s3 outage).
Sorry about the rant, but is there anything that would get you to stop giving the keys to the kingdom to Amazon?
It feels more like they're bragging more than anything.
It's hard to imagine that they're currently planning on getting into the Cloud space.
I'm not sure if there's any standards between platforms for these things that allow you to compare though. I'd say for example that you should exclude watches that last less than 5s or so. YouTube and Netflix may not have thought to do it because it doesn't make much sense to them but Facebook really needs to since I assume most of their video watches are automatic (accidental) while scrolling through the feed.
One could argue that at FB scale it's worth it, but then MS seems to manage just fine with 19".
Compared with Big Sur, Big Basin will bring us much better gain on performance per watt, benefiting from single-precision floating-point arithmetic per GPU increasing from 7 teraflops to 10.6 teraflops. Half-precision will also be introduced with this new architecture to further improve throughput."
Which got me to think, what % of Market share in terms of "Server" Market, do these dozens of player own? 50%?
I don't think it's fair to judge the merits of an entity purely by what it brings to the world, without also considering what it in turn takes from it.
- Saw the first pictures of her grandkids on FB
- Organized a school reunion in NY with people flying in from all over the world on FB
- Chats weekly with her daughter working on Singapore over FB messenger
- Writes crazy long prayers for sick friends on FB
- Found former students living locally on FB that she socializes with.
Whether or not FB makes you lonelier or not seems to really be situational and at least for auntie (and many others) they consider it a tremendous positive thing in their lives.
Some people have gone to jail for 30 years for liking the wrong post on FB.
Guess that's one way to get lonely.
Irrespective of whether you find Facebook (the website) useful, Facebook (the company) is amazing. The technologies they have developed e.g. Hive, HBase, Cassandra, ORC now power the Big Data Analytics movement which is transforming enterprises around the world. And frankly nothing in IT is changing the world for more people across more facets of their life than that. Likewise their work on the maching learning front has been incredibly valuable.
And provided you don't make social media the core of your life then you will find it to be a useful tool.
I was curious about that as I'd say personally I've found the opposite effect that it lets you connect with friends more easily, so I tried to look up the studies.
Most articles seem to quote Ethan Kross who found people who reported they were lonely used Facebook more and concluded Facebook made them lonely but it seems to me cause and effect would be more likely to run the other way. I mean if I'm physically on my own I'll chat to friends online. I can't recall chatting online causing me to be physically alone.
There is the most recent one.
Sounds like fake news to me.
I will probably get myself a new desktop computer sometime this year, and those specs sound pretty sweet. But I want to keep an eye on power consumption, too, and I don't my desktop to either melt or have its fans create a tornado in my living room... ;-)
It's a great 4K video/photo editing & OpenShift lab machine. Literally 300 chrome tabs and it hasn't cracked past 16GB of RAM usage in RHEL. Just need to spin up some VMs now.
eBay E5-2670
We already have a well developed and widely understood system for naming items in a sequence, it's called "numbers." I suggest people stop trying to be clever and use it.
But that does make the decision what CPU is best for a given use case and budget much more complex, too.
(Like I said, the impact of the CPU on overall system performance is less today than twenty years ago for many use cases, so it is not that much of a problem.)
Is it electronic-scrap or is it electronics-crap ?
http://www.inc.com/profile/cj-environmental
From the FAQs, "How does material get processed and refined? All materials are unique and subject to different methods of processing. Newer computers are refurbished and given new homes to maximize ROI for our customers. Scrap product is crushed or shredded before the refining process. All material is processed in accordance with all Federal, State and Local regulations. To learn more about our licensing and compliance measures contact us."
There have been many similar cases, prison sentences ranging from 15 to over 30 years.
Facebook enables these people to express themselves, but it also helps the authorities to track down anyone who has the wrong opinion. If they really wanted they could do something about it.
[1] https://www.theguardian.com/world/2015/aug/07/man-jailed-for...
Edit: Here is the story about the single like: http://www.ibtimes.co.uk/thai-man-faces-32-years-prison-liki...
Very quite and solid.
All the comments I heard were positive about how they were diversifyiny by leveraging expertise they had been forced to develop for their own core platform, not that they were crazy. I'm sure there were some.who.said "crazy", but it definitely wasn't everyone.
Sorry for linking to an image, but Google Finance link to this chart did not work. Sigh!
Request for a startup: Make a finance interface as good as Bloomberg terminals for the web.
Here's An article from 2006 regarding the earnings. AWS isn't mentioned. Drop in operating income and announcement of Groceries and Baby and Toy stores.
http://www.slate.com/articles/business/moneybox/2006/07/the_...
That CPU has a TDP of 115W but I can't find much information on the idle power usage.
I would also like to see a write-up of the GPU pass-through setup since that's something I've been wanting to have on my local system for ages, i.e. vm host system => dev vm 1 ... vm N, Windows VM + dedicated GPU for gaming etc.
[0] https://www.reddit.com/r/VFIO/comments/5yo83m/ryzen_pcie_pas... [1] https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVM... [2] https://www.reddit.com/r/VFIO
Three years ago, in 2014, AWS was adding the equivalent hardware every day of what ran Amazon.com in 2004, when it was only a $700-million company. [1]
[1] https://www.enterprisetech.com/2014/11/14/rare-peek-massive-...
For companies that have instances running long term it can very well be cost effective to own the hardware. My email server, web server and DNS server are on my own hardware with a co-location facility that I trust.
But for experimental stuff where you need to spin up a hundred machines for an hour or two you just can't beat the cloud (and that's my only use case for the cloud, though I can see others go much further).
I don't like the monoculture any more than you do, but to see this as me having given the 'keys to the kingdown to Amazon' is several steps too far.
Whenever I'm experimenting I rarely need a burst of 100 instances, it's usually 1 or 2 instances to run things and I prefer to run them on my own hardware.
It's like leveraging Oracle specific database features. Your a fool to do so.
I work for a company (Pivotal) that's had such a product -- Pivotal Cloud Foundry -- for several years. It creates an abstraction layer for apps or container images, your choice.
Deploy with BOSH to raw metal, OpenStack, vSphere, AWS, Azure or GCP. BOSH creates an abstraction layer over the IaaS.
We're also the main driving force behind Spring, Spring Boot and Spring Cloud Services; the latter is in part a generalisation and integration of Netflix OSS.
We cooperate a lot with Google and Microsoft. For example: https://cloud.google.com/solutions/cloud-foundry-on-gcp
What's your use case?
Are you just futzing around at home? Sure, use a server in your bedroom. Who cares?
Are you delivering a service to other people? Then owning hardware is probably a bad idea. If it's in your house, your users are hosed if you lose power or your internet cuts out. Putting it in a DC just means you're handing the same keys to someone else, but your self-managed hardware is definitely going to be less reliable than Amazon's infrastructure.
Owning hardware is a bad deal for everyone involved unless you're big enough to build your own HADR infrastructure.
I don't buy this. I've seen many multi-datacenter self-managed deployments provide better uptime than Amazon web services. You are forgetting that when you own the hardware, you can actually orchestrate maintenance windows with live migrations, etc and then take down an entire datacenter with no impact. Guess when Amazon does maintenance? That's right, you don't know and one screw up can mean instances in "degraded status" (a.k.a. you might as well terminate it and launch a new one) or all of S3 is down during critical business hours.
Of course your own hardware in a single data-center is going to be exposed to high probability of failures, but that's the equivalent of using a single instance in EC2 (which I have lost two of in the last 7 years of managing 15 or so of them for a small company).
I will admit that it takes strong ops skills to maintain high uptime on your own hardware, but that's just due to a lack of good open source tooling in this area. I would rather see a movement to improve tooling rather than continue to boost the stranglehold the public cloud is putting on everyone.
Self-managed, multi-DC? Congrats on having a lot of money to blow, I guess.
Yes, with enough money you can match Amazon for uptime or scalability or whatever metric you prefer. For the same money you can probably buy triple the capacity in Amazon or your preferred cloud provider, so this is mostly a game for people with really deep pockets, really large scale, or really poor budgeting.
> You are forgetting that when you own the hardware, you can actually orchestrate maintenance windows with live migrations, etc and then take down an entire datacenter with no impact.
How many DCs are you talking about here? Are you self-managing in 4+ DCs? Or are you running in 2 DCs and your capacity is overbuilt by 100+%? In either case, deep pockets are nice to have.
Also, does your maintenance strategy seriously involve bringing down entire DCs? This is kind of absurd and makes half of me jealous of the bathtub full of cash you must bathe in. It makes the other half of me question some engineering decisions you've apparently made.
> all of S3 is down during critical business hours.
I have trouble believing people when they claim to do significantly better than Amazon (or another favorite cloud provider) for infrastructure uptime. If you stand up a fairly complex system comprised of a number of loosely-coupled services, you're going to end up experiencing some outages, because you'll face the same challenges as Amazon and those guys aren't idiots. You'll lose your message queue due to a bug, or you'll lose a network switch and realize your failover takes 30 minutes to complete instead of the 5 seconds you hoped for, or you'll accidentally DDOS a subsystem when exercising a failover or a system upgrade, or something else. Complex systems fail and when people tell me they built an "internet scale" system with better uptime than Amazon, I'm left to assume that they probably just do a bad job of tracking uptime or else that their systems are not at the scale they imagine. Everyone who builds large systems experiences outages.
Local datacenters in the city had even less.
I'm not sure where AWS is supposed to get that famous reliability from, but it's not in uptime. (I can't comment on storage reliability, because I only write a few terabytes of data a month — but otherwise, there's RAID 5 or other RAID setups to ensure data staying valid)
AWS has its advantages in its immense scalability within of seconds, it has its advantages in convenience.
But its uptime isn't much better than most home connections.
Home statistics:
Power downtime since 2006 is 29 minutes.
Internet downtime since 2006 is 6 hours in 2014, 2 times 30 minutes each in 2016.
This is on a 100/40 DSL line nowadays (the downtimes were, except for one, when switching ISPs), without any universal power supply, battery or generator.
For comparison, this is equivalent to a downtime of 99.99% — the same as AWS advertises, but better than what they delivered in this or the last year.
Here's one for you:
> uptime
02:52:56 up 714 days, 16:53, 1 user, load average: 0.00, 0.00, 0.00
Which is pretty average for a small, underutilized server. Essentially the uptime here is a function of how reliable the power supply is.But that's not what AWS is offering.
They offer a far more complex solution which by the very nature of its complexity will have more issues than your - and mine - simple computers.
The utility lies in the fact that if you tried to imitate the level of complexity and flexibility that AWS offers that you'd likely not even get close to their uptimes.
So you're comparing apples and oranges, or more accurately, apples and peas.
That needs a dollar-for-dollar or something to that effect qualification. It's possible but very expensive.
There are for instance long running (and I mean really long running, many years or even decades) experiments where any amount of downtime would cause a do-over.
One of my customers had something like this on the go. The amount of money they spent on their power and network redundancy was off the scale, but they definitely had better uptime than Amazon.
Their problems were more along the lines of 'this piece of equipment is nearly eol, how do we replace it without interrupting the work it does'.
If your goal is to build out scale more reliably than Amazon, at the same or lower cost, that's tough and you're unlikely to achieve it unless your scale is approaching that of Amazon (and you have really good people).
Putting a rack in a COLO is still self-managed for the purpose of what I'm talking about. It's easy to get multiple data centers where you are renting the space and electricity but you still own the hardware and can make agreements with various ISPs to get service from.
>How many DCs are you talking about here? Are you self-managing in 4+ DCs? Or are you running in 2 DCs and your capacity is overbuilt by 100+%? In either case, deep pockets are nice to have.
See comment above.
>Also, does your maintenance strategy seriously involve bringing down entire DCs? This is kind of absurd and makes half of me jealous of the bathtub full of cash you must bathe in. It makes the other half of me question some engineering decisions you've apparently made.
See comment above. "bringing down a DC" doesn't mean shutting everything off, it means from the perspective of your end users, your service is not available there.
> because you'll face the same challenges as Amazon and those guys aren't idiots.
No, but they have much different priorities. If all I want is static asset hosting, the loosely-coupled micro-service architecture you are referring to is completely overkill and results in the very instability you are claiming is normal.
>Complex systems fail and when people tell me they built an "internet scale" system with better uptime than Amazon, I'm left to assume that they probably just do a bad job of tracking uptime or else that their systems are not at the scale they imagine. Everyone who builds large systems experiences outages.
Nobody except Google and Microsoft are building something as complex as the entire AWS stack. The vast majority of AWS users are using a tiny percentage of the features that come with AWS and can get by on much simpler systems that are easier to reason about and maintain.
When you dump the majority of what Amazon is actually running, you have a much simpler system and architecture and actually can beat Amazon's uptime.
The previous posters said that I should use AWS, because anything I set up myself will have more downtime than AWS.
Now. I've actually set up a few systems.
Some on rented dedicated servers, some on actual hardware at home.
Including web apps, databases backing dozens of services, etc.
As mentioned above, all of them have better uptime than AWS.
How am I comparing apples with peas if this is exactly the point made above — that even for simple services I should use AWS?
That a single instance of something simple outperforming something complex does not mean anything when it comes to statistical reliability. In other words, if a million people do what you do in general more of them will lose their data / have downtime than those same people hosting their stuff on Amazon. The only reason you don't see it is because there is a good chance that you are one of the lucky ones if you do things by yourself.
And that's because your setup is extremely simple. The more complex it gets the bigger the chance you'll end up winning (or rather, losing) that particular lottery.
Or maybe because I have less complexity in my stack, so it’s easier to guarantee that it works.
Getting redundant electricity and network lines, and getting redundant data storage solutions is easy.
Ensuring that of 3 machines behind a loadbalancer at least 2 work is also easy.
Ensuring a complex system of millions of interconnected machines, services which have never been rebooted or tested in a decade (see the AWS S3 post-mortem), none will ever fail, is a lot harder.
You're also likely rolling the dice on your uptime, since a hardware failure becomes catastrophic unless you are building redundancy (in which case you're almost certainly spending far more than you would with Amazon).
But then again, you have only one special case, and can’t run anything else on that.