If your database has export capabilities, use it. Now!

It should be to no one’s surprise that ETL is a process that should go out the door. If you read my two prior posts, you will see how newer databases that employ export functionality provide far better ways to capture data and send it data warehouses.

The difference in data quality is stark.  Through an ETL process for capturing data and loading it into databases, you have to work through several sources, some of which you may never have the data you need. Sometimes it feels like you are a magician for making data appear.  Then you have the export process which sends all data, that you choose, in as raw as a form as possible for the data gurus to play around with it an mold it into terrific stories.

Continue reading…

Extracting, Transforming, Loading data into a warehouse database

The ETL process is a common practice in nearly all companies that have data that want to analyse or repurpose. Often are times that products cannot be maintained due to resource constraints, that the developers working on the product aren’t aware of the goals from those working with warehouse data, or the cost to update the product is too high to replace ETL with a far better process as I described in my last post.

I’m not in any way going to say that ETL is bad, because unfortunately for many, that process is required and there is no other way to get the data.  Some of my greatest data challenges came out of ETL processes earlier on before developing a full platform capable of capturing and sending the data to the warehouse in a straightforward and easy to use format.

Continue reading…

Sending transactional data to warehouse databases as it happens

Data warehouses work best by storing data by transaction.  Storing individual transactions to a data warehouse, as seen in my last post, allows the data to be used in many different ways and in many cases allowing it to be future proof. One of the greater challenges I’ve come across and sure many have come across is finding the best and most efficient way of storing transactions.

Whether you are looking to build an end to end analytics solution with complete adhoc capability or an engine that can make contextual recommendations based on activity, the data stored in the warehouse will be very similar. Then comes the question of finding a way to best store that data without slowing requests or adding strain to servers.  This is accomplished through various means including asynchronously pushing data to your warehouse, appending a csv file to import into the warehouse database, or if your database supports it to export data when it is most efficient.

Many will perform this process through more complicated methods known as Extract, Transform, and Load or ETL for short. Many have their reasons to prefer ETL over gathering data from transactions as they happen. In my experience, it leads to a lot less flexibility as the data that is being extracted for warehouse purposes may be too limited.

Continue reading…