Jeremy Edberg shares some of the lessons learned scaling Reddit, advising on pitfalls to avoid. |
Jeremy Edberg shares some of the lessons learned scaling Reddit, advising on pitfalls to avoid. |
Bugmenot is great for sites like this.
is what i use, much simpler.
So, instead of discussing the topics in the video, the majority of commenters here are discussing the flaws of the website it's hosted on or debating whether or not reddit is profitable. Neither of which has ANYTHING to do with scaling.
I expected better, people. Seriously.
In this and some other technical topics, people end up discussing their personal tastes with web site's design, their individual UI frustrations with some button on the web site, the font, the color, and other random non-relevant topic; like now the profitability.
Except for that reddit is not profitable.
http://blog.reddit.com/2013/08/reddit-myth-busters_6.html
http://www.reddit.com/r/TheoryOfReddit/comments/1ihwy8/rathe...
When that blog post went up I was as surprised as you to see it wasn't profitable when I was there.
I can definitely see millions of networked memcache calls being a bottleneck, and if the batching adds another ms per req on average, but removes the bottleneck, then they can serve a lot more users at a cost of 1ms per req.
Is there anything in TFA that would support my theory? I don't know. I don't care enough to endure InfoQ. (I did for a Rich Hickey talk once, lo these various months, and yea it were a minor inconvenience).
Edit: whoa jedbergo!
1. Use AWS
2. Use Postgres
3. Use AWS
4. Use Cassandra
5. Use python, so later you can write C when shit needs to go super fast
That's what I got.
Those are some of the important lessons, although use (postgres|cassandra) are really too prescriptive. More like "use the right tool or tools for the job".
Also, use consistent key hashing where appropriate is another important less that I should emphasize more.
And "build for 3" is another important lesson. It makes scaling much easier.
Memcache has no guarantees about durability, but is very fast, so the vote data is stored there to make rendering of pages as quick as possible.
Cassandra is durable and fast, and gives fast negative lookups because of its bloom filter, so it was good for storing a durable copy of the votes for when the data isn't in memcache.
Postgres is rock solid and relational, so it was a good place to store votes as a backup for Cassandra (we could regenerate all the data in Cassandra from Postgres if necessary) and also for doing batch processing, which sometimes needed the relational capabilities.
A lot of those people aren't interested in that content, so it will suddenly get an influx of downvotes.
I'm glad there is an explanation based on user behavior for this phenomenon because admin level vote tampering is such a tired theory.
It's important to keep in mind that reddit was already doing twice as much traffic as Digg before they launched v4.
I, for one, went from a /r/php lurker to real user at that time so I thought there was dozens of us, dozens !
I'll probably be back to check out more of the videos, but definitely not because of the site. If the editing is good, YouTube is just fine, otherwise SlideShare plus an audio file is just perfect.
Edit:
[F] First Byte Time
[C] Keep-alive Enabled
[C] Compress Transfer
[A] Compress Images
[A] Progressive JPEGs
[F] Cache static content
[ ] Effective use of CDN
Source: http://www.webpagetest.org/result/130816_PE_AYH/A full transcript with interleaved slide images takes a few minutes to read at most, and lets you control the pace of information absorption. A video with slides, especially when you cannot 2x the talking speed, is a painfully slow data transfer method.
Video + Slides = analog modem
Transcript w/ embedded slides = Google fiber :-)
Of course, if you're into audio books instead of reading, maybe you consider that a feature.
// I push video for a living. It's great for visual explanation like DIY instruction (e.g. woodworking, swapping RAM on a Mac Mini), emotional content, personal story telling, etc. Systems architecture is generally not in one of those categories.
I don't have a solid block of 40 undisturbed minutes to listen to a talk. Give me a transcript and I can read a paragraph here and there as I do other things at my own pace. I might have ten minutes here, ten minutes there. I don't want to be constantly pausing/unpausing the video, or worse - switching between the video and my music.
Plus, if I concentrate, I could read a 40 minute talk in 20 minutes or less.
Basically, when I'm reading, I control the pace. I rarely watch videos that are longer than about 5 minutes (that aren't entertainment, which is entirely different).
Funny story about that. When we were owned by Conde, the accounting was a little different (they took on some of the charges, like Akamai), and so we were actually told we were slightly profitable.
When that blog post went up I was as surprised as you to see it wasn't profitable when I was there.
Firstly, as with Wikipedia, if Reddit were forced to close because of money issues, Reddit could simply post a 'donate now or reddit shuts down' post and they would likely be rolling in millions of dollars.
Second, simply because reddit itself is not profitable does not mean people are not making a lot of money off reddit. The moderator system lends itself very well to a kind of 'corporate capture' of communities where moderators can be (and are) bought off for very tidy sums.
From what I remember, this is kind of why they started Reddit Gold.
Reddit succeeded largely because the company that bought them is making money elsewhere.
I don't think grandparent should be downvoted, he raises a good point. Tons and tons of people use Reddit, but Reddit has a hard time making a living.
Checkup alladvantage - they paid people to surf. Had millions of users, but ultimately failed because their "business model" was idiotic.
Getting millions of users is pretty easy if you pay them to be a user. Someday though, it's only worthwhile if you can build a sustainable business which at least doesn't lose money hand over fist.
If Reddit hadn't got bought and supported by other profitable businesses, I doubt it would have survived.
Yep, the site is still in the red. We are trying to finish the year at break-even (or slightly above, to have a margin of error) though. [1]
1: http://www.reddit.com/r/TheoryOfReddit/comments/1ihwy8/rathe...
Someday, the money will run out, and they'll have to try and turn a profit.
Their business model is fundamentally bad.
Source?
[1] http://www.reddit.com/r/CrazyIdeas/comments/1i7crj/antigold_...
[2] http://blog.reddit.com/2013/08/reddit-myth-busters_6.html
Regardless of what you think of either language, "rewrite in X" is not a magic incantation that will spontaneously solve all your architectural issues. Designing a good architecture involves balancing many components, of which your primary implementation language is an important, but not exclusive, element. There are also the organisational issues -- hiring, spending time not adding new features, etc.
Maybe they enjoy it? Maybe it makes them happy?
Not everything is about money.
Such a nonsense makes me angry. Surely you've heard about companies called Google, Facebook, Linkedin, that succeeded without being bought and for a long time not making a profit.
Hi, welcome to your first day on the Internet. Since you're new, let me tell you how things work around here.
There are probably dozens of web sites similar to Wikipedia. But Wikipedia is on the first page of search engine results for just about anything you search for. Why is that? Because people have learned that they can trust them over the last 12.5 years.
When you go to Wikipedia, you know that when you're looking for information on the Battle of Hastings that you aren't going to see ads for anatomy enlargement pills. You won't see any advertising at all in fact. You know that the community at large does a decent job at removing biased information. You know that a company can't buy their way into hiding negative information or promoting positive information.
This level of trust is what causes people to link to Wikipedia thousands of times per day.
So let's say Wikipedia takes your advice. They put a small unobtrusive text advert on each page. Suddenly you're searching for information on acne and an ad for "Acbegone" pops up that promises to cure your problem for 3 easy payments of $19.95. Acbegone ends up becoming a huge advertiser with Wikipedia - spending $1 million per month on advertising. Suddenly Wikipedia gets The Phone Call. "Hi, this is Acbegone. We'd love to continue advertising on your site but your article on acne mentions 10 other products. Get rid of those and we'll double our ad spend with you. Don't get rid of them and we'll be forced to stop advertising." Wikipedia can't make do without the income they've become accustomed to so they make editorial decisions to not mention any product - but still there's that ad from Acbegone. Suddenly Wikipedia seems like one huge cheesey ad. People stop trusting it. People stop linking to it. It stops coming up in search engine results.
For a real world excample, see http://en.wikipedia.org/wiki/Digg#Digg_v4
Look at that - a link to Wikipedia.
If he's right I guess I wasted about 300 hours studying for Financial Reporting and Analysis section in the Chartered Financial Analyst (CFA) program.
But yes, I believe he was trying to point out that I couldn't have been involved if I didn't know, but doesn't understand the ins and outs of G&A and other such accounting practices.
Reddit is an incredibly valuable service. Maybe a lot of people on Hacker News don't see this, but reddit has basically become the Geocities of online discussion communities. The subreddit system has eliminated the "eternal september" problem, since all non-casual users will trickle into the communities that match their interests. Even if reddit loses 90% of its users, it will still be a highly relevant online community. I am certain that they can turn a (modest) profit if they really try.
Reddit will probably never become a massive money machine. But regardless, it is a very influential community. Even community is arguably an understatement at this point, it is really closer to infrastructure. As I have said here before, I would be willing to bet that it is still around in 10 years, with a significant (millions) amount of users.
I can - sort of - confirm they are not trying hard. Last time I tried to advertise on Reddit, I failed because they could not accept CC payments from mainland Europe ... Just think of all the ad revenue they are losing.
If you advertise on Reddit, you're advertising to a violently anti-corporate anti-advertising audience, who may love you, but very well may hate you. You could be subject to a witch hunt at the drop of a hat.
I very much doubt advertisers would be lining up to advertise to that crowd. They're hardly big spenders either.
We operate a dozen of our own colos, with a virtual colo on AWS for insta-scalable multi-region redundancy, and an Amazon "colo" costs the same as about eight of our own when spun up and serving at least a gigabit of traffic.
However, the difference is less if you're going from zero sys admins to 24/7 says admins. I'd SWAG the crossover is once your AWS budget exceeds 4 full time sys admins willing to do shift work.
Being a forum where low wage/students/anti-corporate/anti-advertising types go and share memes is the elephant in the room problem.
For example, I wouldn't be surprised if the Internet's largest right-wing community turned out to be one of the subreddits.
I'd rather not be hassled as a user.
I would also click on adverts, and buy things if they're useful to me, but I don't think I'd ever donate to a website.
Was it fraud if you got a dog to play with the mouse, and never looked at the screen? The dog might have still been looking at the adverts!
I can make millions of dollars selling condom wrappers, but just because I have made millions of dollars does not mean that I have done something important. I may catch quite a bit of hate for this, but a large portion of HN's content is on things that make money, but are not truly important.
1. Make money doing whatever it takes. eg come up with some crappy website, sell it to google, then shut it down. 2. The money problem is solved! 3. Spend money solving world hunger, diseases, philanthropy.
Which IMHO is pretentious BS.
Everyone goes to Google.com to search for things. You know that when you do a search, you're going to see helpful related adverts.
That level of conflict of interests and possible abuse, privacy concerns etc, means that the entire world uses google as their search engine. Oh and they make billions in profit.
Your hypothesis about an advertiser asking wikipedia to alter content surely applies to google search results.
Google indexes other people's content. All Google has to say is, "Sorry, we're not in control of the content others make, our automated systems follow an algorithm we're unable to make one-off tweaks to." It could conceivably cost Google $1MM to make a one-off tweak to their algorithm in terms of programming and testing time.
Wikipedia on the other hand is all content. They have no plausible response other than, "Yeah, it would take 5 minutes to update that but we won't do that for you." Hell, all they'd really have to do is let the advertiser update it as they want and then instruct editors to do nothing.
It really is just different for this and a number of other reasons.
>and then instruct editors to do nothing.
Yeah good luck with getting wikipedia editors to comply with that request!
A site like wikipedia would likely have thousands upon thousands of advertisers. They wouldn't be dependent on a few big advertisers. If an advertiser came to wikipedia and asked them to change a page, wikipedia would just say "no", publish the details to make the advertiser look like a douche (cue internet witch hunt, boycot naming shaming etc), and not care about the 0.000% temporary drop in revenue.