1,000,000 daily users with no cache

1,000,000 daily users with no cache(slideshare.net)

114 points by Sato 14 years ago | 36 comments

Udo 14 years ago |

That was an interesting read. I'd love to watch the actual presentation if it's available somewhere.

One thing that struck me as odd from a software design point of view is the "tile table". The name and the absurdly high number of I/Os per second suggest that each tile was stored in a separate database record? The game looks like a Farmville clone, so wouldn't it have been more economical to store the entire farm of a player in one blob?

Is there someone from Wooga here to shed some light on the architectural decisions that went into the game?

bermanoid 14 years ago | |

I've got to agree - 1M daily users and 50k db hits per second indicates something is being done in a ridiculously inefficient manner somewhere, and I'm really surprised that the solution was to fiddle around on the server-side.

Suppose your players are active for an average of, say, 20 minutes - that's a damn generous upper bound for a Facebook game. That would mean that 1M / (24*3) = 13,888 users are active (on average) at any time. Which means that they must each be generating db events at a rate of about 3.6 events / sec / user, which is ludicrous (at the very least, these should probably be somehow combined so that each client only hits the db once in a while, preferably in response to a user action).

And in reality, the average play session is probably quite a bit shorter than 20 minutes, which means the event rate is even higher per active user. What the hell are they doing that is causing so much database traffic?

Is the game actually doing something a lot more sophisticated than I would assume it is based on the looks of it (to be fair, I've never played it), or is the client-side design just really that messed up? I mean, it's cool that they can handle that much traffic, and all, and the server guys should be proud, but IMO they should really be able to do better on the client-side to prevent this level of scaling from being necessary...

joelhaasnoot 14 years ago | | |

Have you played these games? Everything you do on your farm/plot/area is a click on a tile. You're essentially trying to click on as many tiles in as little time as possible for your own sanity: there's no "select all".

jrirei 14 years ago | |

The presentation was recorded by InfoQ. I hope they will make it available during in the next 1-2 months. There's a lot I said but did not put on the slides.

You are correct in your assumption that each Tile had been a record in a MySQL table (now it is a value i a Redis hash).

Actually we considered using a "blob" approach. But in the client we cannot batch requests as we cannot foresee when the user will simply kill the Flash client to go to some other site. So when a user request (i.e. a game event) arrives in the server there is no way to know if another request will follow. So we have to persist that change right away.

This is using a stateless server. In a later game called Magic Land we are going for a stateful Erlang server. There we keep the whole user state in RAM while the user plays and persist state changes every minute or so. Here we can do without any database and just use S3 for persistence. Works just great. On the upcoming Erlang User Conference we will give an update on that project and slides will be available at Slideshare next week, too. In the meantime please have a look at this old slide set to explain the concept in detail: http://www.slideshare.net/wooga/erlang-the-big-switch-in-soc...

Udo 14 years ago | | |

Thanks for clearing this up.

> Actually we considered using a "blob" approach. But in the client we cannot batch requests as we cannot foresee when the user will simply kill the Flash client to go to some other site. So when a user request (i.e. a game event) arrives in the server there is no way to know if another request will follow. So we have to persist that change right away. <

That's an understandable dilemma. I love thinking about stuff like this and see how other people are dealing with these challenges, so please forgive my Sunday morning quarterbacking ;-) Wouldn't the issue have been solvable by creating an relatively simple persistent software layer between the app code manipulating the tiles and the backend storage? I understand that you moved to this model with your Erlang game, but I'd like to know if a persistence/caching layer was considered for the farming game?

More generally, somewhere in here is an idea for a great Node.js server project that takes coarse grained datasets from a contentious database and serves as an interface for finer grained portions of that data.

tyler 14 years ago |

"with no cache" is a bit of a misleading statement, considering the entirety of their data set is stored in RAM. Turns out you don't really need memcached if you don't read anything from disk.

Sato 14 years ago | |

Not really.

Consider their scenario. 100,000 db operations and 50,000 updates per sec. In this case, simple cache costs more than without. Eviction is expensive.

Also, it's no surprise replication doesn't scale because Updates get propagated.

Not sure details, but they succeeded to relax the tight requirements of ACID transactions. So this is a good case when RDBMS(or traditional database) fails.

I guess their design is more like MMO, hope to hear from the guy.

tyler 14 years ago | | |

No. Memcached on a reasonable server will do millions of requests per second. 50,000 updates per second is nothing for any modern cache.

Also, replication has nothing to do with whether or not "without a cache" is a meaningful statement. The point is that by holding their entire data set in RAM, they've nullified the need for a cache. Effectively, their database is their cache.

And considering the data isn't even written to disk for about 15 minutes, it's really more cache than database anyway.

snorkel 14 years ago | |

Although Redis can be used as a cache they're using it a primary store instead.

fleitz 14 years ago |

I really want to know what would have happened if they just bought a couple 24 drive arrays and stacked them full of SSDs. 50,000 IOPS per second sounds like it could be handled with a couple gigs of BBWC and a decent drive array. Typically, you want about 200 15K spindles per CPU spread over a couple controllers, jammed full of delicious battery backed RAM.

srgseg 14 years ago |

In this Wooga presentation they talk about how DB hosting in the cloud is 20x more expensive than on rented dedicated servers, or 5x more expensive per DAU across the entire variety of servers required.

http://www.slideshare.net/wooga/games-for-the-masses-scaling...

iconfinder 14 years ago |

Why did you go for Ruby instead of PHP?

Would you have had the (roughly) same database issues if you didn't have that many writes to the db?

jrirei 14 years ago | |

We did go for Ruby in order to increase developer productivity (having a very small team of just two developers), and good code quality/high test coverage. We were sure we needed to refactor a lot later on. So Ruby seems like a good choice. But Ruby is NOT good at waiting no a database / network latency. But I guess with PHP we would have had exactly the same problems.

iconfinder 14 years ago | | |

Did you compare bare bones PHP with Ruby or PHP with a library such as Zend?

I curious because I'm considering recoding a large part of a website and are trying to avoid scaling issues.

Thank you for the answer.

gibybo 14 years ago |

50,000 writes/sec seems insanely high for a flash game, would love to hear more about what those writes are doing.

snorkel 14 years ago | |

It's a massive multiplayer game with 1million+ users so seems they're constantly transmitting the state of every active player to their server farm acting the game hub.

henrikschroder 14 years ago |

I'm amazed that some people still do services without using memcached or similar. It's not very difficult, and brings enormous benefits.

So, why not?

judofyr 14 years ago | |

They used Redis. Which I would say is pretty similar to memcached.

henrikschroder 14 years ago | | |

I thought so too, but that means the slide title is wrong, they're not doing that much traffic without a cache.

snorkel 14 years ago | |

Redis is like memcached on steroids.

rmoriz 14 years ago |

In another presentation of Wooga they described why they went off-cloud with the later games using cheap dedicated servers from http://hetzner.de/

Since hetzner recently upgraded the hardware but also limited the different options it would be very interesting to see what Wooga takes out of this...

revorad 14 years ago |

Can any DB veterans offer any insight on how this would scale if they had used PostgreSQL instead of MySQL?

henrikschroder 14 years ago | |

At the end of the day, all transactional RDBMSs need to do the same things, and if they're mature and optimized enough, they'll all reach the same physical limits of what the underlying disk can handle. There will be small differences, but all numbers will be in the same magnitude.

Switching from MySQL to PostGRES in the above scenario wouldn't do much for performance, some operational tasks might change, complex queries might change in speed, but the baseline of simple updates/selects per second won't really change.

To get magnitudes more performance, you need to use a different model. Redis is a NoSQL key-value-store which keeps the entire dataset in memory and occasionally flushes changes to disk. Of course that's going to be faster than a system which flushes all changes to disk individually.

teh 14 years ago | | |

> Of course that's going to be faster than a system which flushes all changes to disk individually.

Even that is tunable these days. You can have unlogged tables, turn off fsync or have a high commit delay etc.

Sato 14 years ago |

I just notified @jrirei of this discussion.

fgielow 14 years ago |

How do you people think this would cope with using NoSQL approaches, such as MongoDB, instead of SQL based ones?