Learn How To Group Data in SQL Using The GROUP BY Clause [Tutorial]

Written by kate | Published 2019/10/18
Tech Story Tags: sql | writing-sql-queries | better-sql-querying-tips | sql-beginner-tips | how-to-write-sql-query | data-analysis | latest-tech-stories

TLDR Learn how to group data in SQL using the "Group BY" clause. In this tutorial, you will see how helpful SQL can be for effective data manipulation. In SQL, data grouping is performed using the GROUP BY clause. To do the grouping properly, you often need to apply aggregate functions to the column(s) within the table within the "group BY statement" clause to do the data grouping properly. You can group individual data by one or more table columns using the 'Group BY' clause.via the TL;DR App

Learn how to group data in SQL using the
GROUP BY
clause. In this article, I’ll show you this process by using a sample of marketing data.
Those who work with data know that grouping data can be a chore. If you are struggling with a spreadsheet or some other tool to group various sets of data, read this tutorial and learn how to do the work more easily (and faster) using SQL. We’ll be working on marketing sample data.
Have you ever found yourself in a situation where you need to analyze data and present the results in a form other than the one in which the data is stored? In this tutorial, you will see how helpful SQL can be for effective data manipulation when it comes to grouping data in SQL.

Problem

Scenario

You work for a worldwide digital influencer marketing agency. The agency has branches all over the world, and the CMO has decided to carry out a promotion planning event. The plan is to host a forum in each of the locations with as many influencers attending.
Each influencer has a manager who takes care of the influencer during events, provides information, and handles settlements. One manager can take care of only one influencer at a time. Managers with more than one influencer to serve will have to recruit an event assistant. 
You have been requested for the following information:
  1. The number of influencers per location (to determine how big the event is going to be)
  2. The number of influencers per manager at each location (to examine how many event assistant will have to be recruited)
This information is internally stored in the agency’s SQL database, which contains a table with data on associated influencers.
Your job is to analyze the data stored in the influencer table and provide the requested information to the CMO.

Sample Data Table

The table below is named
influencer
and contains sample data you'll be working on throughout the tutorial.
In order to graphically present the grouping process, you have to sort sample data by location and influencer_manager. The
influencer
table data will be used as a starting point to explain the process of grouping data in SQL.
Let's start with the following query:
SELECT *
FROM influencer 
ORDER BY location, influencer_manager
Now, the
influencer
table looks like this:
As you can see, the data in the table is sorted in ascending order, first by location and then by influencer’s manager.

Solution

What is the SQL GROUP BY Clause?

The tasks defined by the CMO all require data grouping. In SQL, data grouping is performed using the
GROUP BY
clause.
The SQL
GROUP BY
allows you to group data based on defined criteria. You can group individual data by one or more table columns. To do the grouping properly, you often need to apply aggregate functions to the column(s) within the
SELECT
statement.

Syntax of GROUP BY Clause

The syntax of the
GROUP BY
clause is the following:
SELECT count(column_name1), column_name2, column_name3, ... ,
FROM table_name
WHERE condition
GROUP BY column_name2, column_name3, ... , 
ORDER BY column_name2, column_name3, ... , 
When it comes to syntax, it’s important to keep in mind the placement of the GROUP BY clause in the SELECT statement. Always write the GROUP BY statement after the FROM and/or WHERE clauses and before the ORDER BY clause (if one exists).
As you can see, to use the
GROUP BY
clause, you need to know the basics of querying data in SQL. If you feel like brushing up on those basics, take a look at some online courses.
There are many Internet resources to learn SQL, like Udemy, Coursera, edX but they often lack of interactivity and may be too challenging for beginners.
I recommend sites where you gain hands-on experience in writing code from the very first exercise, such as in the SQL Basics by Vertabelo Academy.
You will be amazed how fast you can progress through practice!
If you’re already familiar with writing basic
SELECT
statements, learning how to group data in SQL will be a piece of cake. Let's see how
GROUP BY
works in practice!

SQL GROUP BY Examples

It’s time to solve the tasks presented in the scenario. First, you’ll have to write a basic GROUP BY query. Afterward, you’ll go through a slightly more complicated example to see how grouping can be performed on more than one table column.

Task #1: Get the Number of Influencers Per Location

The Query:
SELECT location, count(location) as number_of_influencers
FROM influencer 
GROUP BY location
ORDER BY location
Let’s take a closer look at the process of building the above query:
1. SELECT the location column. We’ll group your data using this column.
    SELECT location
    FROM influencer 
2. Add another column to the SELECT statement by applying the COUNT function to one of the table columns to get the number of influencers.
Since  the COUNT function is used to count the table rows, it doesn't matter which table column you apply it to.
SELECT location, count(location) as number_of_influencers
FROM influencer
3. The previous query will not work without a GROUP BY clause, because it contains an aggregate function. Therefore, insert a GROUP BY clause, and use the location column for grouping.
Since the column contains five different values (Brighton, London, Rome, Los Angeles, and Sydney), the final query result will have five rows.
SELECT location, count(location) as number_of_influencers
FROM influencer 
GROUP BY location
The table below displays how the grouping is done: the data is grouped based on distinct values in the location column, where each drawn rectangle contains data for a particular location. In other words, each rectangle represents one data group.
The COUNT function will count the number of rows within each group and return the number of influencers within a location.
4. Apply an
ORDER BY
clause on the location column to sort the final query results in ascending order based on the values in that column.
SELECT location, count(location) as number_of_influencers
FROM influencer
GROUP BY location
ORDER BY location
Query Result:

Task #2: Get the Number of Influencers Per Manager at Each Location

Let’s verify whether managers will have to hire assistants for the forum.
The Query:
SELECT influencer_manager, location, count(influencer_name) as number_of_influencers
FROM influencer
GROUP BY influencer_manager, location,  
ORDER BY influencer_manager
Now, we’ll take a closer look at the process of building the above query:
1.
SELECT
the
influencer_manager
and
location
columns; you will group your data using these columns.
SELECT influencer_manager, location,
FROM influencer
2. Just as we did in the previous example, add another column to the SELECT statement by applying the COUNT function to one of the table columns (to get the number of influencers per manager).
SELECT influencer_manager, location, count(influencer_name) as number_of_influencers
FROM influencer
3. Again, the query from the previous step won’t work without a GROUP BY clause since it contains an aggregate function. You should now apply a GROUP BY clause to both the influencer_manager and location columns.
So, first, you group data by managers. Next, you group those managers by location, effectively creating manager subgroups within the location groups.
SELECT influencer_manager, location, count(influencer_name) as number_of_influencers
FROM influencer
GROUP BY influencer_manager, location
The
COUNT
function will count the rows within the subgroup of each group and return the number of influencers per manager at each location.
4. Apply an ORDER BY clause on the influencer_manager column to sort the query results in ascending order by the name of a manager.
SELECT influencer_manager, location, count(influencer_manager) as number_of_influencers
FROM influencer
GROUP BY influencer_manager, location 
ORDER BY influencer_manager
Query Result:
Et voilà! As you can see, there are managers with more than one local influencer, and there are managers with influencers from different locations all over the world. 
Thanks to this knowledge, we can now, for example, plan the budget for the hiring managers’ assistants or spot managers who’d have to decide on attending only one event location.  

Summary

In this article, I’ve reviewed the basics of the SQL
GROUP BY
clause. Grouping data is a common operation when it comes to data manipulation.
Being able to group data easily is a very useful skill when working with datasets of different sizes, and you know now how the SQL
GROUP BY
clause makes grouping easy: All you need is some basic SQL knowledge and a couple of minutes to write a query!
So if you do have some basic SQL knowledge, it would be a pity if you don't upgrade it with
GROUP BY
functionality and get the most of it when it comes to data grouping.
Thank you for reading! ;-)




Written by kate | Kate is a marketer with a passion for databases.
Published by HackerNoon on 2019/10/18