Algorithms vs the world

[EDIT: For some odd reason, WordPress decided it wanted to delete the rest of this post… so here’s to writing it again]

Haven’t you ever told yourself while in school : “I won’t ever use this again, why am I learning this?”.  Well I do all the time, being that I am fresh out of university, those are still thoughts that continue to come to me. However, recently I’ve discovered that I am beginning to use concepts learned in school in order to adapt them to real world scenarios.  More importantly, I am using algorithms.  There are two algorithms I am using primarily these days, that revolve around Tree Structures and Statistics (as in that really confusing class you need to take when enrolled in Computer Science).

Tree structures are fairly straight-forward.  As the platform I am building with my team begins to evolve, I begin to focus my efforts on efficiency.  Efficiency comes in many forms, of which includes developing quick algorithms to process what I consider to be linear data.  Building trees out of linear data has helped me in a many cases the need to loop through data, but also make search drastically faster than what it has been.  Although the application for trees is often seen in categories, like we are doing with item categories, there are certainly many more applications, like analysing friend connections (like family trees).

What I found surprising is the lack of support for tree structured data in some coding languages, like PHP.  I’ll admit, PHP is an easy language to learn and provides a lot of opportunity to expand, however has drawbacks such as its 64 bit limitations (which used to be a pain to resolve) and its lack of ability to thread connections.  I’ve had several people suggest to move everything over to Python, but that I can keep for a future post.  While in university, I mostly worked in Java, although it had its quirks, I loved features like linkedlists.  Rather than continue to harp on PHP, I’ll dive into my solution.  I had two problems I had to solve, how do I build this tree and how do I find all branches associated to a search I am performing?  To my surprise, pulled up one of my perl projects I had done in university, the answer was staring at me!  Though I saw several inefficiencies in it and decided to improve on it, which resulted in the function being less than 20 lines and able to build the tree as well as search.

The results of this algorithm encouraged me to continue on this path.  I have taken it upon myself to begin moving away from limiting data returned from the databases to returning all the raw data and processing it with algorithms.  Not only has the search on raw data faster, it also saved a lot of memory in cache.  As I move forward, I will be getting rid of all loops.  Loops are useful and have their place in code, however for the majority of instances, it is simply better to recurse through data or iterate through it.  This was something I confirmed 3 years ago, in Java, where iterating through 1200 instances data was 3-5 times faster than processing the of data in a loop.

My other algorithm is likely one that wouldn’t be possible without my wonderful stats class.  Yeah… they weren’t so wonderful, in fact I doubt many in my classes would disagree with me when I say that.  I often found myself staying up late days before the assignment was due, attempting to come up with the same answers that were provided to us (yes, the classes were so difficult they provided the answers, knowing many of us wouldn’t even figure out the steps to get the answer). Analysis I performed in my classes were often linear and we wanted to run tests on data we had, without trying to predict what may happen in the days after.

As much as I disliked the class itself, the content was actually exciting.  I’ve always dreamt about the future (don’t we all?), except I dream of the future of the given data.  I don’t have a crystal ball (I wish I had one), so instead I am moving on a path where I can predict data.  I’ve already started developing algorithms that help me predict revenue or at least issues in revenue.  I am using revenue trends over a period of 2-3 weeks to determine the revenues over the next 24 hours.  These predictions are used to alter developers of issues which may not be immediately visible or alter them about major issues affecting a game or applications.  This has been extremely useful for pointing developers towards issues.

Although what I’ve developed so far is only a ripple in the ocean, it begins to form tools which I can test on before developing a large scale prediction system that can take hundreds or thousands of variables in an attempt to show the effects of a certain change before doing the change.  This means I want to be able to predict things like “If I spend $1000, how many new users will I get?”, “If I add a new Christmas item, will my revenue increase, by how much?” or even “If I give a bonus of 2 points to users on every interaction, how will that effect retention?”.  The business I work in has so many variables that looking at analytics linearily requires a great deal of knowledge and often multiple theories.

Looking back on all the applications / games I have built, you will see they are all centered around stats or predictions.  App Broker is one clear example of a game I developed because I was fascinated by what caused applications’ DAU and MAU to either skyrocket or fall off a cliff.  I was not doing too many analysis on the data through the game, but opened the game up to users and had them provide their theories by purchasing shares in applications.  Another example was Auto Mania, which I used to gather data on users’ preferences in order to see what users liked in cars and what lead them to a purchase.  The results were fascinating and in some cases more valuable to the automakers partnered with Auto Mania than doing surveys and focus groups.

As you can see, this has been a passion of mine for a while, only recently with the platform I’ve been designing at Social Game Universe have I been able to store data in such a way that analysis can be done on it very easily.  I’ve spent a good 3-4 years developing methods to store data to power analytics and perform extensive tests on data, whether it is to analyse the current state of a product or to see how a particular change will affect the application, without necessarily taking a risk to finding results.

Analytics in my view have mainly been linear.  There has been a lot of innovation on analytics on the web over the past few years, few have ventured into adding variables and predicting the outcomes.  That is my passion and those are algorithms I’d like to build.  Oddly enough, tree structures also help on my quest for deeper analytics, as they provide me with the ability to structure data prior to an analysis in a very logical way.  I’ve had a few successful tests on that front and something I will also be saving for a future blog post.