One month of Dirty Dancing

Dirty Dancing’s launch on September 13th 2011 has been quite exciting.  Since the launch, we’ve witnessed over 1.1m eyeballs.  Scaling had its own set of complications, of which we all managed to resolve in a relatively good order.  In the first days of launching, we achieved some interesting numbers, perhaps not comparable to what companies like Zynga achieve, but great nonetheless.  We’ve been consistently growing since the launch, averaging over 20k installs per day.  Over a million wall posts made and invites sent.  However, the stats of the game and interactions aren’t what I’d particularily like to focus on, rather, I’d like to look into the challenges we’ve encourtered.

For launch, our original analysis of the servers indicated we could support at least 100k DAU.  Such seemed to be reasonable based on the dozens of tests made.  However, what we had issues with the most were bursts of traffic.  Some bursts of traffic had enough traffic that would equate to around 600k DAU.  As such, it lead to some rather interesting effects on the servers, making it hard for them to stay up or chugging along at a normal rate.  This was our biggest problem since the launch of the game, which we resolved by setting up a proper database setup to properly replicate and manage the load over multiple nodes.

On October 1st, Facebook turned on forced HTTPS support.  We use Zeus Traffic to balance our traffic, which made this migration simple, by sending the HTTPS traffic to it, decrypting it, sending it back to the web nodes and then back to the client encrypted again.  This meant that no changes were made to our server side code to support HTTPS.  We did have to change some things on the client side to support it, but it was a rather simple set of commands, which replaced all instances of HTTP with HTTPS when in secure mode.

Since the launch, our databases have registered over 15b queries and stored over 25GB worth of data.  Such has caused some rethinking about how data would be stored, how indexes would be generated, and how we run some of our queries throughout the platform.  In some cases, we have managed to cut 80% of stored data size (including indexes) and managed to cut the number of queries by 75%.  These have been particularily interesting challenges, as it would have to be done in such a way as to not effect current players and limit downtime.

Although the game has been running smoothly for the past few weeks, the problems are now ones that are to be addressed with data.   Data is always something I’ve loved and could never really get enough of, but the problems for me have always been finding a properly indexable way of storing and querying this data.  Such a goal was accomplished by our platform and has given us more than enough data on our users, now the problem lies in making software to analyse this data in order to make reacting more efficient.