paint-brush
What Should You Do to Trust Event Data? Part 1 – Events Catalogueby@shagane
305 reads
305 reads

What Should You Do to Trust Event Data? Part 1 – Events Catalogue

by Shagane MirzoianJune 28th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Unlock event data power to leverage it for insightful analysis. Start with simple practical steps to define once and reuse events definition on code, docs, and data validation level.
featured image - What Should You Do to Trust Event Data? Part 1 – Events Catalogue
Shagane Mirzoian HackerNoon profile picture


When it comes to event-based analytics, most of the teams struggle to leverage its power and get stuck on the “our events data is a mess!”. I’ve been there done that, so I have some learnings to share with theory and practical steps on how to achieve it.


Today we will talk about best practices in organizing the event definition data and why you are likely building yesterday’s news.

Why should I care about the events catalog?

Google Sheets is usually a place where teams store their events with triggers, definitions, and properties. While it's a good option just to start with, you cannot use it as a long-term solution.


When documentation relies on manual support, it is only a matter of time before it becomes outdated. One day you will not update the document after yet another iteration with the dev team, and it will store wrong information about the events sent. Also, as the number of teams grows, you will inevitably face multiple files aiming to describe existing events (and of course they will have mismatches with each other).


While multiple documents try to describe events nature, the real behaviour is hidden in source code and never revealed


Additionally, Google Sheets have limited structure to capture complex events, are not handy in validating events consistency, and most importantly, are not connected to the source code that indeed fires the event in your application.


So the main lesson on events' definition is—there should be one source of truth for the events, and it should not be supported manually.


The magic here is in reusing the event definition data. When stored properly, definitions can serve multiple valuable purposes. It can be leveraged by software engineers to generate code for firing the events, presented in a user-friendly interface to act as comprehensive event documentation for analysts and product managers.


Additionally, it can be utilized for validating the events sent to your data warehouse, particularly if you store the raw data. What you get as a result – your documentation is always showing what is sent and stored – pure magic.



Define once and reuse – the pure magic of making events work



How can this be achieved in practice?

  1. Start by communicating with the development team to define an event description format that can be used by them to generate the code based on. For example, you can use JSON or YAML formats to define each event in a separate file. Remember that these files should contain all the information you what to have in your documentation: event name, trigger, properties with possible values and their meaning, supported platforms, comments, etc.


  2. Choose a tool for rendering a given format into the frontend interface. It's important to do such research in advance to amend the schema if needed. In my previous project, we used JSON Schema Reader from Atlassian to transform JSON files into handy documentation. Collect the events definitions in the needed format and place them in a single place, f.e. the GitHub repository. This might not be a simple exercise but surely worth it. To make it easier, ask the engineers if they can pull directly from the code what events are fired with their structure – you will only have to fill in the definitions in this case.


    How it can look like – parsing JSON event definition into a page any teammate can have pleasure to use


  3. Collect the events definitions in the needed format and place them in a single place, f.e. the GitHub repository. This might not be as simple exercise but surely worth it. To make it easier, ask the engineers if they can pull directly from the code what events are fired with their structure — you will only have to fill in the definitions in this case.


  4. Set up a process on how events are added, reviewed and updated. To provide inspiration, let me share an example on how it can be settled up.


    1. Each event should have an analyst who owns it – knows the business logic, edge cases, metrics that are calculated by this event. This person should be the one to mainly update the event as the product evolves and review the changed made by the other analysts. To control it on role level, you can set GitHub code owners for each event you have in your repository.


    2. Updating an event should be approved by the event owner and the development team. Software engineers can challenge not needed properties, understand their priorities, raise questions and edge cases, and validate that the needed logic can be implemented. It's a win-win for both analysts and engineers as reviewing steps decreases misunderstanding and ambiguous naming. To enable this, each change to the structure of the event should be done by opening a Pull Request (PR) and having approvals from the dev team to be merged.


    3. Provide a help page for cases when an analyst does not know who to ask for approval, decide on the SLAs for PRs, collect feedback and iteratively update the process.


  5. Validate the data you receive. Sometimes we do not get what we send – data injections introduce unauthorized data, bots can generate fake events, and encoding issues lead to data loss or misinterpretation. These all are serious problems as they might skew the metrics and lead to inaccurate analysis results → wrong decisions → customer loss. To have control over it, you should validate you receive what you expect to, and that you don't receive what you don't expect to.


To sum up, events documentation is probably the most underestimated tool that can leverage the quality and ease of using it. Define once and reuse—this is the mantra to a trustworthy data description. When the same definition is used in the documentation, production code, and data validation, you unlock the trustworthy events and insights from them.


P.S. Don't forget to subscribe to my blog, I am and will be posting about events and insights from them.

See you soon.