How Piwik built a Google Analytics alternative out of an open-source project

Written by flagsmith | Published 2021/02/26
Tech Story Tags: google-analytics | data-privacy | open-source | analytics | digital-marketing | saas | saas-startups | b2b

TLDRvia the TL;DR App

Founders of open source analytics software Piwik - Maciej and Piotr - joined Ben Rometsch, CEO of Flagsmith (open-source feature flagging product), on The Craft of Open Source Podcast to discuss the company from its inception as part of an opens source project to its present state and what lies ahead. 

The Genesis of Piwik

Piwik began as an open source statistics system back in 2007. Maciej began contributing to the project and using it for the advertising business he owned at the time. Over time, he became a core team member and was the pioneer of the idea to provide professional services and support as a component alongside the open source project, which by this point had changed name to Matomo.
The team created a company called Piwik PRO and they made revenue offering customised services. By 2014, they realised that the platform was nearly ten years old and relied on legacy systems. So, they decided to rebuild it and create an enterprise version. Unfortunately, the vision of one of the founders was not aligned as he wanted to keep things entirely open-source and free.
Therefore, in 2016 the company split. With Maciej as the CEO, the team bought out the founder that disagreed with the future vision of the company. This type of split is not unusual in open-source projects, with similar events taking place with MySQL and Drupal, for instance.
The team began to build a product known as Piwik PRO Analytics Suite, alongside supporting clients who were still using the open-source Piwik. 

Developing Piwik PRO Analytics

The commercial analytics suite and the open-source project were developed by two separate teams from 2016 onwards. 
The first team worked on the core platform and open-source product and was made up of the original contributors that had coded the original commits for GitHub.
The second team of developers worked on the commercial Piwik PRO platform, concentrating on creating a cloud version and expanding its capabilities. This team worked entirely within the private repository on the cloud platform.
Taking this approach of two separate teams meant that from the perspective of outside users, developers, and the open-source community not much changed when the commercial business was created. In fact, the changes were positive for the open-source community as now there was more investment into the platform and paid developers working on it.
The decision was made to start from scratch rather than adapting and updating the existing Piwik system. One reason for this was that the developers wanted to refactor the whole thing on a new directory to keep customers with a high-volume of traffic happy. Maciej and Piotr also saw traction on the business side of things and the potential for growth. They also secured funding to create a proprietary product and offer a migration service from one platform to another.
From a technical point of view, they didn’t want to build a modern microservices-based platform chunk-by-chunk on old technology and legacy systems. They wanted to use newer backend technology with a fresh database using ClickHouse which is specially designed for analytics services and platforms. ClickHouse is ideal for platforms that work with lots of data, aggregation, and filtering. The team then built a new UI on React JS, which allowed them to create a common components library, replacing and adding functions and features to the platform. The primary services are written in Python and the trackers use Rust for its speed.

What Is A “Tracker” In Analytics Software? 

The tracker is the part of the platform that receives all data requests and carries out some transformations. It’s the endpoint that receives behavioral data captured by Piwik PRO when it is tracking a website or mobile app. Therefore, the tracker is the endpoint with the highest volume of traffic.
To decide which coding language to use for the tracker, the company held an internal competition. Developers submitted proof of concepts which were then benchmarked against the open-source version. Rust won the first prize, outperforming others in terms of speed, which is why it is used for the tracker.

On-premise Deployment

Piwik PRO Analytics, despite being a cloud-based platform, is deployed on-premise. This seems like a slightly unusual decision at first glance but has its merits. 
First of all, although it is technically an on-premise deployment, the installation isn’t carried out by our team visiting the customer’s physical premises. Instead, it is done remotely. We verify that the customer’s infrastructure is capable of running our scripts. Then, a time and place for the installation is agreed upon and a nominated team member deploys the platform during that dedicated time slot. 
The main reason for the on-premise approach is that Piwik’s customers tend to be enterprises that need high-availability of analytics. The complexity of deploying a large-scale analytics platform means it cannot simply run on a virtual machine, for instance. It needs to run on physical servers to achieve high performance.

Keeping Up With Google Analytics

There’s no hiding the fact that all analytics platforms are in a race to keep up with Google Analytics and trying to offer extra value. 
Piwik PRO has done a good job of keeping up with the functionality offered by the paid version of Google Analytics. Not everyone knows that Google Analytics charges when you cross a certain threshold and high-traffic users end up paying a lot of money annually.
Google’s new Analytics version for 2021 is designed to compete with Segment or Amplitude. This means that the platform has become much more complex. Piwik PRO, by contrast, seeks to offer a more user-friendly analytics platform that has some of its own unique features and flavour, rather than competing directly with Google which is hard to do.
Also, Piwik PRO has kept up-to-date with changes in privacy laws. They quickly adapted the platform to comply with GDPR and respect individual’s data privacy. This gives them a competitive edge in today’s world of increasing data privacy awareness.

The Future of Piwik?

Both Maciej and Piotr think that focusing on data privacy and customisation is probably a wise move in today’s global climate. They discussed the idea that Piwik may be customised to collect the least amount of data, which means that site visitors won’t be irritated by a consent window. There will be different solutions available depending on the business use case, which creates a situation where companies can pick their own version of the platform and still remain compliant.
When pressed on whether they have any open-source components in their business at the moment, they replied by saying no, but there’s a good chance there will be in the future, hinting that there may be some announcements regarding this subject soon.
Another area of development for Piwik is integrations with other software and frameworks. This applies to both the open-source side of things and the commercial side of the business.
This article just scratches the surface of what was covered on the podcast. To listen to the whole thing check out the podcast here today.

Written by flagsmith | Open Source Feature Flags...Move Fast. Flag Things.
Published by HackerNoon on 2021/02/26