Learn how to group data in SQL using the
clause. In this article, I’ll show you this process by using a sample of marketing data.GROUP BY
Those who work with data know that grouping data can be a chore. If you are struggling with a spreadsheet or some other tool to group various sets of data, read this tutorial and learn how to do the work more easily (and faster) using SQL. We’ll be working on marketing sample data.
Have you ever found yourself in a situation where you need to analyze data and present the results in a form other than the one in which the data is stored? In this tutorial, you will see how helpful SQL can be for effective data manipulation when it comes to grouping data in SQL.
You work for a worldwide digital influencer marketing agency. The agency has branches all over the world, and the CMO has decided to carry out a promotion planning event. The plan is to host a forum in each of the locations with as many influencers attending.
Each influencer has a manager who takes care of the influencer during events, provides information, and handles settlements. One manager can take care of only one influencer at a time. Managers with more than one influencer to serve will have to recruit an event assistant.
You have been requested for the following information:
This information is internally stored in the agency’s SQL database, which contains a table with data on associated influencers.
Your job is to analyze the data stored in the influencer table and provide the requested information to the CMO.
The table below is named
and contains sample data you'll be working on throughout the tutorial.influencer
In order to graphically present the grouping process, you have to sort sample data by location and influencer_manager. The
table data will be used as a starting point to explain the process of grouping data in SQL.influencer
Let's start with the following query:
SELECT *
FROM influencer
ORDER BY location, influencer_manager
Now, the
table looks like this:influencer
As you can see, the data in the table is sorted in ascending order, first by location and then by influencer’s manager.
The tasks defined by the CMO all require data grouping. In SQL, data grouping is performed using the
GROUP BY
clause.The SQL
GROUP BY
allows you to group data based on defined criteria. You can group individual data by one or more table columns. To do the grouping properly, you often need to apply aggregate functions to the column(s) within the SELECT
statement.The syntax of the
GROUP BY
clause is the following:SELECT count(column_name1), column_name2, column_name3, ... ,
FROM table_name
WHERE condition
GROUP BY column_name2, column_name3, ... ,
ORDER BY column_name2, column_name3, ... ,
When it comes to syntax, it’s important to keep in mind the placement of the GROUP BY clause in the SELECT statement. Always write the GROUP BY statement after the FROM and/or WHERE clauses and before the ORDER BY clause (if one exists).
As you can see, to use the
GROUP BY
clause, you need to know the basics of querying data in SQL. If you feel like brushing up on those basics, take a look at some online courses. There are many Internet resources to learn SQL, like Udemy, Coursera, edX but they often lack of interactivity and may be too challenging for beginners.
I recommend sites where you gain hands-on experience in writing code from the very first exercise, such as in the SQL Basics by Vertabelo Academy.
You will be amazed how fast you can progress through practice!
If you’re already familiar with writing basic
SELECT
statements, learning how to group data in SQL will be a piece of cake. Let's see how GROUP BY
works in practice!It’s time to solve the tasks presented in the scenario. First, you’ll have to write a basic GROUP BY query. Afterward, you’ll go through a slightly more complicated example to see how grouping can be performed on more than one table column.
The Query:
SELECT location, count(location) as number_of_influencers
FROM influencer
GROUP BY location
ORDER BY location
Let’s take a closer look at the process of building the above query:
1. SELECT the location column. We’ll group your data using this column.
SELECT location
FROM influencer
2. Add another column to the SELECT statement by applying the COUNT function to one of the table columns to get the number of influencers.
Since the COUNT function is used to count the table rows, it doesn't matter which table column you apply it to.
SELECT location, count(location) as number_of_influencers
FROM influencer
3. The previous query will not work without a GROUP BY clause, because it contains an aggregate function. Therefore, insert a GROUP BY clause, and use the location column for grouping.
Since the column contains five different values (Brighton, London, Rome, Los Angeles, and Sydney), the final query result will have five rows.
SELECT location, count(location) as number_of_influencers
FROM influencer
GROUP BY location
The table below displays how the grouping is done: the data is grouped based on distinct values in the location column, where each drawn rectangle contains data for a particular location. In other words, each rectangle represents one data group.
The COUNT function will count the number of rows within each group and return the number of influencers within a location.
4. Apply an
ORDER BY
clause on the location column to sort the final query results in ascending order based on the values in that column.SELECT location, count(location) as number_of_influencers
FROM influencer
GROUP BY location
ORDER BY location
Query Result:
Let’s verify whether managers will have to hire assistants for the forum.
The Query:
SELECT influencer_manager, location, count(influencer_name) as number_of_influencers
FROM influencer
GROUP BY influencer_manager, location,
ORDER BY influencer_manager
Now, we’ll take a closer look at the process of building the above query:
1.
SELECT
the influencer_manager
and location
columns; you will group your data using these columns.SELECT influencer_manager, location,
FROM influencer
2. Just as we did in the previous example, add another column to the SELECT statement by applying the COUNT function to one of the table columns (to get the number of influencers per manager).
SELECT influencer_manager, location, count(influencer_name) as number_of_influencers
FROM influencer
3. Again, the query from the previous step won’t work without a GROUP BY clause since it contains an aggregate function. You should now apply a GROUP BY clause to both the influencer_manager and location columns.
So, first, you group data by managers. Next, you group those managers by location, effectively creating manager subgroups within the location groups.
SELECT influencer_manager, location, count(influencer_name) as number_of_influencers
FROM influencer
GROUP BY influencer_manager, location
The
COUNT
function will count the rows within the subgroup of each group and return the number of influencers per manager at each location.4. Apply an ORDER BY clause on the influencer_manager column to sort the query results in ascending order by the name of a manager.
SELECT influencer_manager, location, count(influencer_manager) as number_of_influencers
FROM influencer
GROUP BY influencer_manager, location
ORDER BY influencer_manager
Query Result:
Et voilà! As you can see, there are managers with more than one local influencer, and there are managers with influencers from different locations all over the world.
Thanks to this knowledge, we can now, for example, plan the budget for the hiring managers’ assistants or spot managers who’d have to decide on attending only one event location.
In this article, I’ve reviewed the basics of the SQL
GROUP BY
clause. Grouping data is a common operation when it comes to data manipulation. Being able to group data easily is a very useful skill when working with datasets of different sizes, and you know now how the SQL
GROUP BY
clause makes grouping easy: All you need is some basic SQL knowledge and a couple of minutes to write a query! So if you do have some basic SQL knowledge, it would be a pity if you don't upgrade it with
GROUP BY
functionality and get the most of it when it comes to data grouping.Thank you for reading! ;-)