I recently got interested in analyzing and tracking my monthly spending patterns. There are tools out there such as Mint which help you do just that. However I was more interested in having access to historical raw credit card transaction data for analysis. Furthermore, I found that services like Mint did not do a very good job of categorizing transactions accurately. For example, If I would buy something on Amazon or Google, the charge would end up being bucketed into a general Shopping category. This was not very useful if I was trying to figure out how much I was spending per month on purchases from Amazon for example.
Naturally, I decided to build something of my own which was more suited to my use case cases which included:
- Access to raw historical credit card transaction data
- More granular transaction categorization
The rest of this post details how I built spendthrift — a small application for tracking my monthly expenses.
Getting Credit Card Transaction Data
Banks and credit card companies don’t make it easy for you to programmatically query for credit card transaction data. Thankfully Plaid makes that super easy. Plaid provides client SDKS in various programming languages which you can use to connect to your credit or debit card accounts and query for transaction data.
To get stared with Plaid, I needed to create an account and request API access keys. Plaid does provide a free tier for development and testing with the ability to connect to up 100 live credentials. This is more than enough unless you are planing to write a commercial application to be used by hundreds or thousands of users. You can signup for an account with Plaid here.
Once I had created an account and had my API access keys, the next step was to authenticate with my bank and credit card provider to get access tokens for use with the plaid API. The simplest way to do so was to use the quickstart application provided by Plaid. After obtaining the access tokens, I was all set to begin querying for transaction data using the plaid API.
I used Plaids ruby SDK to create a simple wrapper class to query for credit card transaction data. The structure of the class is as follows:
initialize method of this class, I create an instance of the
Plaid::Client object which is used for all communication between the app and the Plaid servers. I used
vault to manage all my access tokens. The method
load_secrets_from_vault simply loads the app secrets from a local instance of a
vault server that I have running. The method
get_credit_card_accounts fetches account meta-data (account ids in particular for use with the transaction query later on) for only credit card accounts I have with my bank. This was needed since I only wanted to track transaction activity on my credit card accounts(because these are what I use day to day expenses). Finally, the method
get_transactions fetches all the transactions for a given data range. Since the number of transactions could be large, the Plaid API returns a paginated response which the
get_transactions function is structured to handle.
After retrieving transaction data, I needed to clean it a bit. Specifically I wanted to remove pending transactions and credit card payments. Furthermore I wanted to sanitize the transaction categorization a bit. Plaid returns a hierarchical taxonomy of categories associated with a transaction. For my use case I was not really interested in the hierarchy of the categories — so I ended up concatenating them to get a fully qualified descriptive category name. Furthermore, sometimes a transaction does not have a category associated with it. In this case, instead of leaving the category blank, I ended up using the vendor name associated with the transaction. Lastly, the transaction object returned by Plaid has quite a few attributes. I leaned towards extracting only the ones I needed. The
DataSanitize class code snippet below captures these clean up methods.
Finally, I wrote a small module to do a monthly rollup of all transactions and generate a monthly expenditure report.