Finances: Marvin’s first skills

With Marvin’s core architecture designed, I started developing the framework.  The framework itself is intended to handle multiple use cases, from supporting personal information to provide contextual experiences to controlling devices.  Marvin’s key differentiator is that it is powered by data, rather than a simple trigger based on an action.  This is why the core of the framework contains APIs that handle data and extends itself into building out analytics.  Behind the scenes exists an ETL process that feeds into various services, including machine learning.

I’ve been feeding the last 7 years of financial data through to Marvin’s core databases, with just over 9000 transactions. The transactions look something like this:

Date Description Original Description Amount Transaction Type Category Account Name
6/22/2017 Ooma OOMA, INC 08887116662 CA XX.XX debit Home Phone Smart Cash Platinum Plus MasterCard
6/22/2017 Costco COSTCO WHOLESALE W159 AJAX ON XX.XX debit Home Supplies Smart Cash Platinum Plus MasterCard
6/21/2017 Transfer to Chequing TRANSFER OUT XX.XX debit Transfer General Savings
6/21/2017 Transfer from General Savings TRANSFER IN XX.XX credit Transfer Chequing
6/20/2017 Costco COSTCO WHOLESALE W1128 OSHAWA ON XX.XX debit Groceries Smart Cash Platinum Plus MasterCard
6/20/2017 Costco WWW COSTCO CA 905-264-8337 ON XX.XX debit Sporting Goods Smart Cash Platinum Plus MasterCard
6/20/2017 Taunton Endo TAUNTON ENDO OSHAWA ON XX.XX debit Doctor Smart Cash Platinum Plus MasterCard

These transactions provide a very good base for generating some initial learning models. By including credit or debit, dates, retailers, amounts, categories for purchases and labels, it enables various types of skills to be identified.  These skills help me understand my finances better while also helping me improve.

Continue reading…

Disruptive Companies, Passionate Teams

I’ve come to the realization that I like companies that try to disrupt a market. Whether it be phones, IoT devices, or software. Love rooting for the next up and coming success story, even if it means some growing pains along the way. I tend to be a very passionate person when it comes to technology.  I find myself pouring over details, specs, and features with each new thing. It’s rewarding to engage their teams over the details, as you get to see their drive and dedication.

The products trying to disrupt the market are often ones that bring in their own special flare. Even though they may not offer all the same features as the industry leaders, you can tell they are making a difference through the details. The communications coming out of these teams show their passion.  They don’t often generate hostility. As a result, their message tends to be softer and they normally try to rally others to join them.  As a result of a more neutral tone, hoping to rally people together, you often find that these teams often offer the best support. With good support comes a good and lasting impression on the market, because the best form of marketing is word of mouth.

Because of my passion for technology and great products, I often find myself going out of my way to find these new and upcoming products or technologies.  Over the years, I’ve bought into or crowdfunded numerous products that were either help achieve a personal or professional need.  I don’t always buy into products solely for the product itself, I often do it for the experience and the support. I’ve had the opportunity to follow a number of companies who were just starting out to flourishing into companies now competing against some industry leaders or going through large acquisitions. These teams were huge on gathering feedback and interacting with their communities or “evangelists”.

Continue reading…

Marvin: Automating my Smart Home Devices

Part of the process of designing Marvin is to carefully ensure that all devices within my home fit well into the ecosystem.  I severely underestimated the time it would take to carefully plan out each part of my ecosystem.

I’ve started buying and implementing smart home / IoT devices over the last year or so. The devices were implemented with the idea that they will all be managed a central device / hub in the future.  For a while, my hub of choice was the Wink Hub. But as smart home hubs evolved, SmartThings continued to get better.  While Wink is a terrific hub, SmartThings allows for far more complex automation routines or even create your own SmartApp.

While the hub itself was a difficult choice, the other devices were not a whole lot simpler unfortunately.  For one, I’ve been avoiding maintenance costs, as some devices have, like Nest Cams. Secondly, I tend to go for devices that support Windows devices as much as Android and iOS. Thirdly, the devices need to be user friendly enough that I would enjoy tinkering with it. Lastly, I try to choose devices that have mostly favorable reviews.  For devices to meet all 4 requirements has been surprisingly difficult.

Continue reading…

Should Products Ensure Developer Happiness?

Over the past while, I’ve been fully immersed these days into figuring out how to build Marvin.  In doing so, I’ve been researching products, services, or even technology that would help me do it. While diving into each one and evaluating whether to proceed in using them, I’ve found myself looking for the very same thing in each one.  That is, are they working to ensure their developers are happy?  Would I be happy working with them?

One of the biggest gaps that I’ve been looking to fill is the financial transactional information.  To do that, I’ve been evaluating Plaid, Xignite, Yodlee, and Finicity.  All are very capable and established services.  However, of the 4 services, Plaid is the only one that makes me feel that they care about developers. They have a healthy set of SDKs, great documentation, and their dev portal is welcoming.

Companies from all over are beefing up their engineering and development teams.  In many of those technology focused companies, the decision makers are those who are working directly with the product or service, the developers.  Developers normally choose the product that is the most effective at helping them achieve their goals.  Many products today continue to focus on selling to the business minded person, focusing on the features and pricing.  While that is certainly not a bad strategy, they tend to completely avoid the developers who would be working with the product or service.

I believe we are reaching the tipping point where you need to appealing to developers and their happiness in order to succeed. Many products brand themselves as a platform, normally with some externally facing API.  Those that don’t have an API likely have it near the top of their roadmap.  The API can serve multiple purposes: to bring a richer experience, to share data with other tools, or to build new functionality which isn’t supported within the parameters of the service. The market is flooded with technology companies, each trying to cater to a niche. While that company is using your product or service, their goals may be different, which is why they look for API access to enable them to perform their own business logic.

Continue reading…

Marvin – Personalized and Smart Virtual Assistant

Over the last two years, I’ve been buying into Internet of Things (IoT) devices. While the devices themselves may be smart, they do not make a smart home.  In comes Marvin.  Unlike many who have gone for Jarvis as a name, I chose the name Marvin from Hitchhiker’s Guild to the Galaxy. Marvin has a brain the size of a planet, which is fitting considering the information it will be processing.
Continue reading…

The rise of NoSQL is an opportunity for new RDBMS solutions

It should come as no surprise that NoSQL has become popular over the past few years. This popularity has been driven in large part by the app revolution. Many new apps are hitting millions of users in less than a week, some in a day. This presents a scaling problem for app developers who seek a large audience.

Scaling a typical RDBMS system like MySQL or MsSQL from 0 to 1 million users has never been easy.  You have to setup master and slave servers, shard and balance the data, and ensure you have resources in place for any unexpected events. NoSQL is being touted as the solution to those problems. It shouldn’t be.

NoSQL’s use cases have been mistakenly focused on scalability because of the complexities of putting up an RDBMS and scaling it. Application developers aren’t necessarily interested in becoming server side wizards.  They prefer to focus on building out their apps, not scaling the servers powering those apps.  These developers are looking for something low cost that can keep up with their needs as they grow. Developers have flocked to NoSQL for these very reasons.

However, when developers look to grow their apps and introduce more complex functionality, they sometimes hit roadblocks due to the way data is stored in NoSQL.  A solution to one of those roadblocks is MapReduce.  MapReduce brings index like functionality to NoSQL.

My goal isn’t to dispute the importance of NoSQL, but to promote the reality that not all database systems are alike -many serve their specific purposes. NoSQL’s benefits are to bring terrific key-value access with great performance.  To some the lack of schema is a benefit that allows the application to control how the data is stored, limiting the need to interface with and configure the database.

Over the years I’ve been looking to put up products that need RDBMS like storage that scales. NoSQL just couldn’t do it for me.  Many would agree, but do not know of anything better. Lucky for me, I found VoltDB. To this day, sales people continue to contact me to use their NoSQL solution. I ask: How can NoSQL solve my problems? Are you ACID compliant? How can I merge data from multiple tables? How can I use my data to build out analytics? Most of the time, the sales teams can only sell me on one problem, scaling. They often forget that I have to sacrifice functionality to solve my scaling problems. One should never have to compromise.

If NoSQL is known for scale, how well does VoltDB do it? A picture is worth a thousand, or in this case, a million words.

VoltDB benchmarks from May 2013.

The chart above shows VoltDB achieving 1m transactions per second (it’s a 1 year-old benchmark). Place this benchmark next to the top NoSQL solutions and you will find yourself with an equal or better performing solution.  Best of all, VoltDB does it without sacrificing the common features we need from relational database systems.

The switch from a traditional database like MySQL or MsSQL to VoltDB is simple and can often be measured in hours or days.  A switch from a traditional RDBMS to NoSQL on the other hand is likely to be measured in weeks, days if you are lucky.

VoltDB is a NewSQL solution. NewSQL, a term used to avoid the poor scaling stigmata attributed to RDBMS or typical SQL solutions.

NewSQL’s solution solves today’s data problems without creating new complexities.  Ever heard of trying to fit a square peg in a round hole? NoSQL is that square peg, and it is doing its best to go through that round hole by solving scaling problems with a different solution, causing new complexities. Complexities that arise when trying pulling complex data sets for analytic purposes, adding BI support, updating schema, or normalizing data.

Many of today’s biggest companies, including Twitter use NoSQL systems. Had NewSQL been around when they had issues scaling would the problem still be solved with NoSQL? Chances are NewSQL would also be their solution. NewSQL builds on the decades of research and innovation attributed to relational databases. They have matured and have been solving many of the world’s most complex data problems.

In case this argument for NewSQL doesn’t quite bring it home for you, I will be writing another article supported by detailed use cases. Until then, please let me know what you think here or on Twitter @francispelland.

If your database has export capabilities, use it. Now!

It should be to no one’s surprise that ETL is a process that should go out the door. If you read my two prior posts, you will see how newer databases that employ export functionality provide far better ways to capture data and send it data warehouses.

The difference in data quality is stark.  Through an ETL process for capturing data and loading it into databases, you have to work through several sources, some of which you may never have the data you need. Sometimes it feels like you are a magician for making data appear.  Then you have the export process which sends all data, that you choose, in as raw as a form as possible for the data gurus to play around with it an mold it into terrific stories.

Continue reading…

Extracting, Transforming, Loading data into a warehouse database

The ETL process is a common practice in nearly all companies that have data that want to analyse or repurpose. Often are times that products cannot be maintained due to resource constraints, that the developers working on the product aren’t aware of the goals from those working with warehouse data, or the cost to update the product is too high to replace ETL with a far better process as I described in my last post.

I’m not in any way going to say that ETL is bad, because unfortunately for many, that process is required and there is no other way to get the data.  Some of my greatest data challenges came out of ETL processes earlier on before developing a full platform capable of capturing and sending the data to the warehouse in a straightforward and easy to use format.

Continue reading…

Sending transactional data to warehouse databases as it happens

Data warehouses work best by storing data by transaction.  Storing individual transactions to a data warehouse, as seen in my last post, allows the data to be used in many different ways and in many cases allowing it to be future proof. One of the greater challenges I’ve come across and sure many have come across is finding the best and most efficient way of storing transactions.

Whether you are looking to build an end to end analytics solution with complete adhoc capability or an engine that can make contextual recommendations based on activity, the data stored in the warehouse will be very similar. Then comes the question of finding a way to best store that data without slowing requests or adding strain to servers.  This is accomplished through various means including asynchronously pushing data to your warehouse, appending a csv file to import into the warehouse database, or if your database supports it to export data when it is most efficient.

Many will perform this process through more complicated methods known as Extract, Transform, and Load or ETL for short. Many have their reasons to prefer ETL over gathering data from transactions as they happen. In my experience, it leads to a lot less flexibility as the data that is being extracted for warehouse purposes may be too limited.

Continue reading…