Hunting for Customers With PostgreSQL

Written by billbeaux | Published 2020/09/22
Tech Story Tags: sql | business | startup | startup-advice | tutorial | postgresql | postgres | customer-rentention

TLDR With just graphing tools, it’s near impossible to spot trends in customer usage. Postgres' statistical functions help you rapidly sift through the noise. The good news: a single proactive outreach can reverse a downward trend. I explore these functions further in our template on Linear Regression in SQL. The output of the CTE action_data looks like this:Finding slope of all customer usage trendlines. Contact the customers with the worst slope! Their usage of your product is plummeting. You could be letting great customers slip away!via the TL;DR App

Now more than ever, you have to hold onto every customer you can. Customers don’t love your product one minute, then fall out of love with it the next. Normally there’s a declining trend in usage of your product over time.
The good news: a single proactive outreach can reverse a downward trend. 🙌
The challenge: efficiently finding customers with declining usage. 🤦‍♀️

Why SQL?

With just graphing tools, it’s near impossible to spot trends in customer usage. It’s neon spaghetti:
And examining customers one by one? That would take ages.

Using statistical functions

PostgreSQL’s statistical functions help you rapidly sift through the noise. I explore these functions further in our template on Linear Regression in SQL (full disclosure: I am an avid SQL writer at PopSQL, a modern SQL editor built for teams).
Below we’ll use just the regr_slope() function:
| Function         | Argument Type    | Return Type      | Description                                                                   |
|------------------|------------------|------------------|-------------------------------------------------------------------------------|
| regr_slope(Y, X) | double precision | double precision | slope of the least-squares-fit linear equation determined by the (X, Y) pairs |
This function first creates a trendline that fits your data. It then tells you the slope of that line (your algebra teacher was right! You will use this stuff in your real job 🤓 ).
What regr_slope() is doing.
The slope of that light blue line is the output of the regr_slope() function:

SQL you can copy / paste

Here’s the query (I’ll break it down below):
-- DATA PREP
with action_data AS (
  select
    extract('week' from time) as week, -- quick hack to turn week into a numeric so we can use regr_slope() function
    team_id, -- we're looking at team usage
    (count(name) / count(distinct user_id))::numeric as actions_per_user -- we want to see actions per user to normalize
  from events
  where time between '{{start_date}}' and '{{end_date}}' -- your date range, make sure your start_date is a Monday and end_date is a Sunday
  group by 1,2
)

-- FINDING SLOPE OF ALL CUSTOMER USAGE TRENDLINES
select
  team_id,
  count(week) as weeks_considered,
  round(regr_slope(actions_per_user,week)::numeric,2) as slope
from action_data
group by 1
having count(week) >= 5 -- let's say we want at least 5 weeks of data
  and regr_slope;
Data prep
The first part of the query in the CTE action_data is just data prep.
The extract() function turns our timestamp into a numeric, as regr_slope() doesn’t accept timestamps as parameters. This hack works great except at the change of calendar years (workaround at the bottom of the post).
For the extract() function, a week goes from Monday to Sunday. Avoid partial weeks by starting your date range with a Monday and ending on a Sunday 👍
The output of the CTE action_data looks like this:
| week | team_id | actions_per_user |
|------|---------|------------------|
| 12   | 93336   | 28               |
| 13   | 93336   | 6                |
| 14   | 93336   | 10               |
| ...  | ...     | ...              |
| 12   | 92982   | 26               |
| 13   | 92982   | 1                |
| 14   | 92982   | 2                |
| ...  | ...     | ...              |
Finding slope of all customer usage trendlines
In the second part of the query, the regr_slope() creates a trendline between actions_per_user and week for each team, then returns the slope of that trendline. Again, visualized:
If the slope is negative, then the usage trend is negative. The more negative the output, the steeper the decline in their usage. We added the weeks_considered column to ensure we had enough data points to see a trend.
You can see that in the output:
| team_id | weeks_considered | slope |
|---------|------------------|-------|
| 97003   | 5                | -5.70 |
| 77503   | 9                | -4.93 |
| 95535   | 5                | -4.23 |
| 92982   | 5                | -3.11 |
| …       | ...              | …     |
☎️ Contact the customers with the worst slope! Their usage of your product is plummeting. You could be letting great customers slip away!

Seeing the SQL

Once more, to help you visualize what PostgreSQL is doing, I've graphed the trends from the output above for each of these teams:
If you'd like to see an individual customer's behavior, here’s the query I used:
select
    extract('week' from DATE_TRUNC('week',time)) as week,
    (COUNT(name) / COUNT(distinct user_id))::integer as actions_per__user
from events
where team_id = 97003 -- or 77503, 95535, 92982
group by 1;

Try it yourself?

Run this template against a sample database that mirrors real startup data. You can also run this query on your own data, so long as you have a table that tracks events and includes a timestamp and a user/team ID.

Bonus Content

  • The inverse is also true: customers with strong positive slopes are growing 📈
  • Here's the aforementioned workaround if your data spans a changing calendar year. It involves casting from integer to text and back to integer. But it works!
with action_data AS (
 select
  (extract('year' from time)::text || extract('week' from time)::text)::integer as yearweek,
... -- rest of query continues as above
Photo by Sarah Brown on Unsplash (with some emoji game by the author).

Written by billbeaux | Head of Growth @ PopSQL
Published by HackerNoon on 2020/09/22