Wilk

@wilk

From CSV to Buxfer: an unexpected journey — Introduction

Part 1 — the infrastructure: an introduction to the challenge and an overview to the general project infrastructure

Introduction

Let me share with you, reader, this adventure with Buxfer and a variety of development tools and programming languages.
What’s this, you’re wondering?
This is the first part of a journey I’ve made for learning something new and of course doing something useful.
Actually, that’s called hacking. You got the point.

Journey

This journey took me through the following milestones:

  1. Part 1 (this part): Introduction
  2. Part 2: Cleaner
  3. Part 3: Collector
  4. Part 4: Goxfer
  5. Part 5: Conclusions

But, wait, what’s Buxfer?

Buxfer helps you see all your accounts at one place, understand where your money goes, reduce unwanted spending, and save for future goals.

In other words, it’s an online storage service for your financial data with cool charts, grids and very useful features, like budgets, forecasts and reminders.

I discovered it last year because previously I was saving my transactions inside Excel files or using programs like GnuCash.
All the solutions I’ve tried did not satisfy me and I was always looking for something better.
To be franklin, at the beginning my research was focused on free softwares; then I started looking also for premium solutions and that’s when Buxfer showed up.

Why Buxfer?

The requirements of my research were:

  • an online responsive web application, so I can use it with any device with just a browser
  • web APIs, so I can put/get data to/from it
  • automagic charts, grids, forecasts, so I can see in real time what’s going on
  • tags, so I can dispose my data not only in a hierarchical form but also flat

Buxfer satisfies those requirements.
Of course, it’s not the only one and maybe it’s not the best one: franklin, I don’t know and I don’t care. Buxfer it’s the best for me and my needs.
Anyway, one thing that attracted me during the research was the team (you can find it inside the About Page): a Software Engineer, a Designer and 3 Investorts.
Woah.
So few. And they did it. They did it well.
So, why don’t give them a chance?

The challenge

Load all of my data on Buxfer and start analysing it.
So, the goal is clear but the problem is that my data has been built and rebuilt many times, with different structures.
Let’s take a look, year per year:

  • 2014: Excel files
  • 2015: different Excel files
  • 2016: GnuCash files

As you can see, I was so smart to change the data structure year after year, breaking the rule of “backwards compatibility”. Very clever, huh?
Now, regrets apart, let’s face it: I’ve got a lot of different data I need to put it online.

Getting Started

If you have a dream, you can spend a lifetime studying, planning, and getting ready for it. What you should be doing is getting started.
Drew Houston

Ok, first of all, what do I need to do to achieve my goal?

  1. choose the technology stack
  2. setup the development environment
  3. export the data in a common format
  4. define the data model
  5. clean the data (part 2)
  6. collect the data (part 3)
  7. use Buxfer APIs to put the data online (part 4)
  8. draw the conclusions (part 5)

Technology stack

So, what kind of language I’m going to use?
I’m very good with Javascript but you know what? I use it every day, so, just no.
Python it’s kinda fun and very useful for data manipulation: check!
Oh, and don’t forget virtualenv along with Python because I want to have everything isolated inside the project.
I’ll need something fast, that can perform HTTP calls in parallel, because I want to load all the data in just few seconds (unfortunately, you can just add a transaction per time with Buxfer APIs): what about GoLang?
It’s fast, with native support for parallelism and it has got explicit types: I need it!
Along with Python and GoLang, I’ll need also pip and glide, the package managers. Well, actually glide is not the official one for GoLang but it’s really good.

Where to store data?
An important point is the data storage.
I don’t need schema nor relations between tables: I just need something to fill with transactions.
So, I think I’m going with a document-oriented database. MongoDB? Yes, why not? I don’t have special requirements for storing and I do know MongoDB, so, go for it!

What else?
Well, I don’t want to install that version of Python, neither that version of GoLang, nor that version of MongoDB on my operating system.
I need containers. Docker FTW.
Just Docker?
Nope, let’s use docker-compose to facilitate the setting-up/tearing-down process.

Environment setup

I need the worktable ready so when the rough time comes I’ll be prepared to face it with all of those shining tools listed above.

The very first thing to setup is the project folder:

.
├── LICENSE
├── README.md
├── go
└── python

go and python folders will keep the project source code.
Now, it’s time to setup Docker and docker-compose, so let’s add the docker-compose.yml file and the docker folder:

.
├── LICENSE
├── README.md
├── docker-compose.yml
├── go
└── python

For now, docker-compose.yml is empty but it will be filled with services later.
Instead, let’s define the docker structure inside each sub-projects (go and python):

docker
├── Dockerfile
└── entrypoint.sh

I’ve just added some docker definitions for each language (Python and GoLang), so I’ll put there dependencies and requirements for each service.
docker-compose will use those Dockerfile to build all the images and the entrypoints will be used just to pass commands to the containers.
Good, for now, that’s enough.
The project folder structure looks like this:

.
├── LICENSE
├── README.md
├── docker-compose.yml
├── go
│ └── docker
│ ├── Dockerfile
│ └── entrypoint.sh
└── python
└── docker
├── Dockerfile
└── entrypoint.sh

Ok, now it’s time to draft the docker-compose.yml file!
What I need is:

  1. a service to setup the Python container
  2. a service to setup the GoLang container
  3. a service for each program (cleaner, collector, goxfer)
  4. a service for the DB
  5. a global volume for the DB

It should look like this:

Exactly!
setup-python and setup-golang will be used just once to build the images.
cleaner, collector and goxfer form the very core of this project: the last two depend on mongodb service.
mongodb, well, it’s the db.
db_mongo is a global volume to store the database data.

I’m going to define each services and Dockerfiles later in this journey.
For now, that’s enough!

Data export

Ok, this is easy.
Every program, like Excel and GnuCash, usually has an export procedure that allows you to get data in a common format.
CSV, that’s it.
With GnuCash you can also get a JSON of your data but I don’t want to parse different formats.
Let’s use CSV.
By the way, there’re a lot of libraries for parsing CSV: for instance, Python has got a native support for it.

Data model

Another important part of this journey is to define a common model to use inside each program (cleaner, collector and goxfer):

Model {
description: String,
amount: Float,
tags: String[],
account: String,
date: String
}

Buxfer needs those information, like description, tags, account, etc.
So, what I’m going to do is to convert the following data into that model:

04/06/2016,Abbigliamento,maglietta,"5,00"

This CSV row can be viewed like this:

<date>,<tag>,<description>,<amount>

But wait: where’s the account?
Actually, the account is hardcoded inside the file name, like expenses.csv: when I read a CSV file I know if it has to be marked as an expense or income.

End of part 1

I know, I know, this part was tricky but I needed it to draw the whole picture and to define the structure of this project.
Before starting, it’s always a good thing writing the “blueprint” of what you’re going to do.
Now, that I’ve got the project skeleton, I’m able to start developing the main functionalities.

If you enjoyed this article don’t forget to share it!
See you in Part 2: Cleaner!

Spoiler

Source code is already available here: https://github.com/wilk/from-csv-to-buxfer

More by Wilk

Topics of interest

More Related Stories