Sometimes I like to think of the past as a good example of how much I have learned. Scaling is a particular domain that I have learned a lot over the past year. 2 years ago, I had always assumed that scaling merely consisted of upgrading servers. The concept of putting servers together to work side by side was baffling to me.
I do like to poke some fun at my past, only because today I am proud of my accomplishments, proud to say I have scaled a product that can easily handle hundreds of thousands / millions of users with a very small pool of servers.
However, the point of this post is not so much to poke fun at myself, but to teach some of the places I have made mistakes and how to get around them. Of course, without giving some of my secrets or the key to making extremely responsive APIs, like igapi.
Databases – Index index index
I can’t stress this enough. People rarely remember to index their databases properly. You create a new table, without necessarily knowing how your records will be queried or having an idea of how many records will be created. You absolutely must know both of those things or risk having your databases crash, crash miserably. If you are using a database engine that locks the table for inserts, you will find yourself with a queue of hundreds or thousands of queries before the database goes off the grid.
One of the key things to know when creating primary indexes is that you are better off attempting to create them without using auto increment. Why? You reduce the time it takes to perform an insert, reduce the chances for a lock (depends on the engine you use), and also makes it easier to query in the future.
One particularly bad column type to index is a datetime field. Whenever possible, convert it to a UNIX timestamp. Datetime fields require a conversion to UNIX (at least in MySQL) before being able to compare it with the query you are attempting to pass through. This has been a particularly interesting problem when attempting to query millions of records to create reports. A query with datetime would take up to 30 seconds to execute, but changing it to a UNIX timestamp would take less than 0.01 of a second.
This was one of the biggest things I’ve been hearing over the past few years. Never really knew what it was, until tasked with scaling a large product. Then I wondered, why do companies shard? Is it necessary? The answers were simple, sharding is used to attempt to distribute the load over many database servers. But is it necessary to do, not at all.
You must think I am crazy by saying no, but I have my reasons. That reason is NoSQL. MySQL specifically does not scale vertically, at least not well. This is why companies have employed sharding, but have done it because of technological limitations. Limitations that no longer exist today, partly due to new technologies like Percona and Riak. I’ve been using Percona and have been a huge fan of it, the performance is astonishing next to a typical Innodb engine.
Caching is something I’ve always been doing. For App Broker, I would generate portions of content like leaderboards and save them to an html page periodically and load the html on each page view. Other apps I would perform all the heavy calculations and save the results to a database periodically and load the results on the fly. Those methods worked, but were still partly innefficient.
I’ve discovered memcached 15 months ago… I LOVE it. It is simple to use and blisteringly fast. Data is stored in the ram and retrieved in thousands of a second. I partly think of the cache storing method to be very similar to NoSQL, because you are storing and retrieving data based on a key.
One of the downsides to memcached is that you have to be extremely cautious on what you store. Without thinking, you can quickly fill up your web nodes’ memory with data you may not necessarily need or storing data for a period of time much longer than its use requires. Of course there are a lot of tips and tricks to keep in mind, but I will keep that for a later post.
There are a lot more things that I have learned with scaling, I’ve only really tipped the iceberg. I will be posting more tips and tricks in regards to scaling in the future. Leave some feedback on what you’d prefer in future posts.