Extracting, Transforming, Loading data into a warehouse database

The ETL process is a common practice in nearly all companies that have data that want to analyse or repurpose. Often are times that products cannot be maintained due to resource constraints, that the developers working on the product aren’t aware of the goals from those working with warehouse data, or the cost to update the product is too high to replace ETL with a far better process as I described in my last post.

I’m not in any way going to say that ETL is bad, because unfortunately for many, that process is required and there is no other way to get the data.  Some of my greatest data challenges came out of ETL processes earlier on before developing a full platform capable of capturing and sending the data to the warehouse in a straightforward and easy to use format.

Continue reading…

Sending transactional data to warehouse databases as it happens

Data warehouses work best by storing data by transaction.  Storing individual transactions to a data warehouse, as seen in my last post, allows the data to be used in many different ways and in many cases allowing it to be future proof. One of the greater challenges I’ve come across and sure many have come across is finding the best and most efficient way of storing transactions.

Whether you are looking to build an end to end analytics solution with complete adhoc capability or an engine that can make contextual recommendations based on activity, the data stored in the warehouse will be very similar. Then comes the question of finding a way to best store that data without slowing requests or adding strain to servers.  This is accomplished through various means including asynchronously pushing data to your warehouse, appending a csv file to import into the warehouse database, or if your database supports it to export data when it is most efficient.

Many will perform this process through more complicated methods known as Extract, Transform, and Load or ETL for short. Many have their reasons to prefer ETL over gathering data from transactions as they happen. In my experience, it leads to a lot less flexibility as the data that is being extracted for warehouse purposes may be too limited.

Continue reading…

Building a data warehouse

The databases of today are in many cases built for specific purposes.  Some of the more common ones we see every day are relational databases, document-oriented databases, Operational databases, Triplestore, and Column-oriented databases / c-store. Typically relational, document-oriented, operational, and triplestore databases are used to solve frontend database problems.  Then you have databases that are column-oriented or similar that focus on solving warehousing and backend database problems.  These products don’t need to solve those problems, though they are often best suited for them.

Continue reading…