The rise of NoSQL is an opportunity for new RDBMS solutions

It should come as no surprise that NoSQL has become popular over the past few years. This popularity has been driven in large part by the app revolution. Many new apps are hitting millions of users in less than a week, some in a day. This presents a scaling problem for app developers who seek a large audience.

Scaling a typical RDBMS system like MySQL or MsSQL from 0 to 1 million users has never been easy.  You have to setup master and slave servers, shard and balance the data, and ensure you have resources in place for any unexpected events. NoSQL is being touted as the solution to those problems. It shouldn’t be.

NoSQL’s use cases have been mistakenly focused on scalability because of the complexities of putting up an RDBMS and scaling it. Application developers aren’t necessarily interested in becoming server side wizards.  They prefer to focus on building out their apps, not scaling the servers powering those apps.  These developers are looking for something low cost that can keep up with their needs as they grow. Developers have flocked to NoSQL for these very reasons.

However, when developers look to grow their apps and introduce more complex functionality, they sometimes hit roadblocks due to the way data is stored in NoSQL.  A solution to one of those roadblocks is MapReduce.  MapReduce brings index like functionality to NoSQL.

My goal isn’t to dispute the importance of NoSQL, but to promote the reality that not all database systems are alike -many serve their specific purposes. NoSQL’s benefits are to bring terrific key-value access with great performance.  To some the lack of schema is a benefit that allows the application to control how the data is stored, limiting the need to interface with and configure the database.

Over the years I’ve been looking to put up products that need RDBMS like storage that scales. NoSQL just couldn’t do it for me.  Many would agree, but do not know of anything better. Lucky for me, I found VoltDB. To this day, sales people continue to contact me to use their NoSQL solution. I ask: How can NoSQL solve my problems? Are you ACID compliant? How can I merge data from multiple tables? How can I use my data to build out analytics? Most of the time, the sales teams can only sell me on one problem, scaling. They often forget that I have to sacrifice functionality to solve my scaling problems. One should never have to compromise.

If NoSQL is known for scale, how well does VoltDB do it? A picture is worth a thousand, or in this case, a million words.

VoltDB benchmarks from May 2013.

The chart above shows VoltDB achieving 1m transactions per second (it’s a 1 year-old benchmark). Place this benchmark next to the top NoSQL solutions and you will find yourself with an equal or better performing solution.  Best of all, VoltDB does it without sacrificing the common features we need from relational database systems.

The switch from a traditional database like MySQL or MsSQL to VoltDB is simple and can often be measured in hours or days.  A switch from a traditional RDBMS to NoSQL on the other hand is likely to be measured in weeks, days if you are lucky.

VoltDB is a NewSQL solution. NewSQL, a term used to avoid the poor scaling stigmata attributed to RDBMS or typical SQL solutions.

NewSQL’s solution solves today’s data problems without creating new complexities.  Ever heard of trying to fit a square peg in a round hole? NoSQL is that square peg, and it is doing its best to go through that round hole by solving scaling problems with a different solution, causing new complexities. Complexities that arise when trying pulling complex data sets for analytic purposes, adding BI support, updating schema, or normalizing data.

Many of today’s biggest companies, including Twitter use NoSQL systems. Had NewSQL been around when they had issues scaling would the problem still be solved with NoSQL? Chances are NewSQL would also be their solution. NewSQL builds on the decades of research and innovation attributed to relational databases. They have matured and have been solving many of the world’s most complex data problems.

In case this argument for NewSQL doesn’t quite bring it home for you, I will be writing another article supported by detailed use cases. Until then, please let me know what you think here or on Twitter @francispelland.

If your database has export capabilities, use it. Now!

It should be to no one’s surprise that ETL is a process that should go out the door. If you read my two prior posts, you will see how newer databases that employ export functionality provide far better ways to capture data and send it data warehouses.

The difference in data quality is stark.  Through an ETL process for capturing data and loading it into databases, you have to work through several sources, some of which you may never have the data you need. Sometimes it feels like you are a magician for making data appear.  Then you have the export process which sends all data, that you choose, in as raw as a form as possible for the data gurus to play around with it an mold it into terrific stories.

Continue reading…

Sending transactional data to warehouse databases as it happens

Data warehouses work best by storing data by transaction.  Storing individual transactions to a data warehouse, as seen in my last post, allows the data to be used in many different ways and in many cases allowing it to be future proof. One of the greater challenges I’ve come across and sure many have come across is finding the best and most efficient way of storing transactions.

Whether you are looking to build an end to end analytics solution with complete adhoc capability or an engine that can make contextual recommendations based on activity, the data stored in the warehouse will be very similar. Then comes the question of finding a way to best store that data without slowing requests or adding strain to servers.  This is accomplished through various means including asynchronously pushing data to your warehouse, appending a csv file to import into the warehouse database, or if your database supports it to export data when it is most efficient.

Many will perform this process through more complicated methods known as Extract, Transform, and Load or ETL for short. Many have their reasons to prefer ETL over gathering data from transactions as they happen. In my experience, it leads to a lot less flexibility as the data that is being extracted for warehouse purposes may be too limited.

Continue reading…

The challenges of scaling your data vertically

There are many reasons for which databases must be scaled.  The majority of the time it must be scaled to accommodate for performance issues as the product grows.  Though NoSQL is making a lot of noise these days, it is to no one’s surprise that SQL is still extremely popular.  In general the same principles are followed while scaling out any SQL product, be it MySQL, MsSQL, Oracle or even DB2. Scaling is often done to overcome performance issues as the product grows. However, dealing with big data scaling is often done to balance the data across multiple hardware nodes or clusters.

Continue reading…

Technical Redundancy – A Crucial Business Requirement

This post comes in light of recent events in New Jersey and New York, hit by hurricane Sandy.  Like Katrina, it has been a very difficult moment and is nice to see people help each other.   Businesses too were affected by Sandy.  They suffered power loss or loss of hardware due to flooding.  Individuals and business alike will be changed forever.

While working for General Motors, I was given the opportunity to learn and work on disaster recovery and business resumption plans.  This included researching tremendously in something I knew little about.  To my surprise, a lot of horror stories came out of Katrina, many businesses effectively shutting down and liquidating.  These business owners having written about their losses, hoping that others would learn from their mistakes.  GM as you can imagine, has a significant amount of employees, business apps and data required to run day to day operations.  If the headquarters is hit by a tornado or blocked by disgruntled union workers, how do we ensure continuity as if nothing happened?  Working on the Disaster Recovery Plan (DRP) and Business Resumption Plan (BRP) was an eye opening experience for me.

Just to make sure I am not confusing anyone, DRP is a plan that is used to recover data and ensure that the tools used by the business are recovered.  BRP is the plan that is executed when the physical business local is no longer operable and requires setting up remote locations to resume business as normal.  Each business will have different requirements for resuming operations, including timelines and services that are crucial to operations.

I operate under the assumption that anything that can go wrong will go wrong and the edge cases, while rare, will also happen when you least expect it.  For instance, who knew that of all things, a CAW blockade would require execution for the BRP for GM?  Looking at Amazon over the past few months, they’ve had numerous large scale failures.  Sandy has caused major disruptions and forced multiple websites and services to shut down as the backup generators ran out of fuel.

I’ve asked many small and medium sized business owners to describe their disaster recovery process.  To my disbelief, most are unprepared or do not understand the severity of potential events.  I live in a world filled with paranoia, so I asked them “what if your hosting provider disappears tomorrow?” which is often followed up by a puzzled look. Amazon could never crash right?  What about pushing code to live the accidently purges live data?  Or even an intern who runs a query that deletes data?   Companies and developers are assuming that edge cases never happen because they pay attention and they can fix problems as they arise.  They need plans for when things go terribly bad, even if it never will.  I won’t try and claim that I haven’t made mistakes and that I have everything implemented, but I have the plans.  Now if I had money to execute my plans, I’d perhaps be in a better position to convince everyone to follow my lead.

Regardless of your situation, you should plan.  I won’t get into business resumption too much. Unless you have a decently sized company or a corporation, you won’t necessarily need it, your developers likely could work from home and be as productive as they are in the office.  If you operate under VPN and have a variety of services in house, then you will more than likely need a BRP.  I may get into that for another blog post if I get requests. Plan the implementation of the DRP as you get cash and the scale of which you deploy this plan.

Continue reading…

Only the best powering Lightning

It is Sunday night, CBC is going well and no server hiccups at all, so I’d take a bit of time to post some stuff and benchmarks we’ve hit with Lightning.  Lightning is the name we are calling our new platform.  Not only does it sound better, it also works with a few other products that are coming out that support Lightning. Lightning is a name that has meaning for the goals we are looking to accomplish.

Continue reading…