Extracting, Transforming, Loading data into a warehouse database

The ETL process is a common practice in nearly all companies that have data that want to analyse or repurpose. Often are times that products cannot be maintained due to resource constraints, that the developers working on the product aren’t aware of the goals from those working with warehouse data, or the cost to update the product is too high to replace ETL with a far better process as I described in my last post.

I’m not in any way going to say that ETL is bad, because unfortunately for many, that process is required and there is no other way to get the data.  Some of my greatest data challenges came out of ETL processes earlier on before developing a full platform capable of capturing and sending the data to the warehouse in a straightforward and easy to use format.

Continue reading…

Sending transactional data to warehouse databases as it happens

Data warehouses work best by storing data by transaction.  Storing individual transactions to a data warehouse, as seen in my last post, allows the data to be used in many different ways and in many cases allowing it to be future proof. One of the greater challenges I’ve come across and sure many have come across is finding the best and most efficient way of storing transactions.

Whether you are looking to build an end to end analytics solution with complete adhoc capability or an engine that can make contextual recommendations based on activity, the data stored in the warehouse will be very similar. Then comes the question of finding a way to best store that data without slowing requests or adding strain to servers.  This is accomplished through various means including asynchronously pushing data to your warehouse, appending a csv file to import into the warehouse database, or if your database supports it to export data when it is most efficient.

Many will perform this process through more complicated methods known as Extract, Transform, and Load or ETL for short. Many have their reasons to prefer ETL over gathering data from transactions as they happen. In my experience, it leads to a lot less flexibility as the data that is being extracted for warehouse purposes may be too limited.

Continue reading…

Building a data warehouse

The databases of today are in many cases built for specific purposes.  Some of the more common ones we see every day are relational databases, document-oriented databases, Operational databases, Triplestore, and Column-oriented databases / c-store. Typically relational, document-oriented, operational, and triplestore databases are used to solve frontend database problems.  Then you have databases that are column-oriented or similar that focus on solving warehousing and backend database problems.  These products don’t need to solve those problems, though they are often best suited for them.

Continue reading…

The challenges of scaling your data vertically

There are many reasons for which databases must be scaled.  The majority of the time it must be scaled to accommodate for performance issues as the product grows.  Though NoSQL is making a lot of noise these days, it is to no one’s surprise that SQL is still extremely popular.  In general the same principles are followed while scaling out any SQL product, be it MySQL, MsSQL, Oracle or even DB2. Scaling is often done to overcome performance issues as the product grows. However, dealing with big data scaling is often done to balance the data across multiple hardware nodes or clusters.

Continue reading…

Starting a new series. The BIG data series!

Over the next few weeks I will be starting up a new series which I am hoping will have at least a dozen posts.  I am looking to cover a lot of the high level concepts around big data.  Scaling, data format, software, and using the data will be amongst the topics I will cover.

It is no surprise that I am looking for a job. Over the past few months I’ve had many interviews and have noticed that all companies had similar problems. BIG DATA. Most of these companies had issues scaling their databases for either performance or storage reasons.  Many others were simply unable to pull and use their data efficiently because it would take seconds to get results from a fairly simple query.

I’ll be looking to keep the posts bit sized as this one will be. I look forward to getting feedback or suggestions on topics related to Big Data that I should cover.

Stay tuned, the first post coming shortly will be on scaling.

What about these electric cars? Are they really worth the hype?

Following up on my last post, which was more focused around Google and the driverless car comes this post. Tesla too, created by a very smart and ambitious Elon Musk. Not that I am harping on the accomplishments or the vision, however a lot of what comes out of Tesla in my opinion seem to be repeating the past.  We all know about “Who Killed the Electric Car”.  Has Tesla learned from that? Are they doing things differently?

In my opinion, of which has been formulated based on facts, is that Tesla is repeating some of the same mistakes as GM did in the 90s.  Electric cars carry something called range anxiety.  Replacement batteries are expensive. Charging your battery takes much longer than filling up your tank with gas. These aren’t easy problems to address, but ones that require a lot of money and effort to solve.

Continue reading…

Why is the tech industry desperate to disrupt the auto industry if it doesn’t know what is all about?

Technology blogs and journalists are praising the likes of Google, Tesla and many others.  This is great, those companies are creating terrific products. They are innovating in a market that has been relatively slow to turn around and that has been plagued with various degrees of problems. So why you wonder am I writing this after the title I put up? The short answer… Google, Tesla, and others are NOT the solution or the disruptors of the auto industry.

I’ve gone off on several rants on Facebook, Google+ and in some cases on Twitter.  I really do not want to discredit the advancements Google has made on self driving cars. I also am not looking to discredit Tesla for releasing an electric car. But those products are only scratching the surface of what a full car that can fully disrupt the industry should be. I know, you are probably saying “well Tesla is selling cars”. Yep, you are right. They are good cars. But let me ask you this, are they available to anyone? Is it an inconvenience for you to buy one?

Continue reading…

Find me on User Experience Stack Exchange

I’ve recently joined the User Experience chapter of Stack Exchange. Though I’ve been on for only a week, this is a community I am excited about joining.  It is a terrific community for exchange knowledge and providing answers to questions on user experience.

I’ve been getting a lot more requests lately by email for User Experience questions which I am happy to answer. However it you are comfortable, posting on Stack Exchange would allow others to use the feedback myself and others of the community provide to help with user experience.

Check out User Experience on Stack Exchange or my profile.

Looking to get my PMP and need some advice

I’ve been looking to get my PMP for quite some time now.  I think the best time for me to do that would be now.  I’ve gotten a lot of suggestions so far on my Facebook on some possible local options I can take including Cheetah Learning and University of Toronto.

I am wondering if others have some suggestions. I would prefer cheaper options and ones that would allow me to do this as soon as possible. Many of the options that are available will take me up to the end of summer or later before I can get my PMP. Online may be more work, something that isn’t a problem for me.

This is something I am looking to get to not only boost my chances of new employment, but as valuable knowledge I can bring to wherever I go next.