Learn how to group data in SQL using the GROUP BY clause. In this article, I’ll show you this process by using a sample of marketing data. Those who work with data know that grouping data can be a chore. If you are struggling with a spreadsheet or some other tool to group various sets of data, read this tutorial and learn how to do the work more easily (and faster) using SQL. We’ll be working on marketing sample data. Have you ever found yourself in a situation where you need to analyze data and present the results in a form other than the one in which the data is stored? In this tutorial, you will see how helpful SQL can be for effective data manipulation when it comes to grouping data in SQL. Problem Scenario You work for a worldwide digital influencer marketing agency. The agency has branches all over the world, and the CMO has decided to carry out a promotion planning event. The plan is to host a forum in each of the locations with as many influencers attending. Each influencer has a manager who takes care of the influencer during events, provides information, and handles settlements. One manager can take care of only one influencer at a time. Managers with more than one influencer to serve will have to recruit an event assistant. You have been requested for the following information: The number of influencers per location (to determine how big the event is going to be) The number of influencers per manager at each location (to examine how many event assistant will have to be recruited) This information is internally stored in the agency’s SQL database, which contains a table with data on associated influencers. Your job is to analyze the data stored in the influencer table and provide the requested information to the CMO. Sample Data Table The table below is named and contains sample data you'll be working on throughout the tutorial. influencer In order to graphically present the grouping process, you have to sort sample data by location and influencer_manager. The table data will be used as a starting point to explain the process of grouping data in SQL. influencer Let's start with the following query: * influencer location, influencer_manager SELECT FROM ORDER BY Now, the table looks like this: influencer As you can see, the data in the table is sorted in ascending order, first by location and then by influencer’s manager. Solution What is the SQL GROUP BY Clause? The tasks defined by the CMO all require data grouping. In SQL, data grouping is performed using the clause. GROUP BY The SQL allows you to group data based on defined criteria. You can group individual data by one or more table columns. To do the grouping properly, you often need to apply aggregate functions to the column(s) within the statement. GROUP BY SELECT Syntax of GROUP BY Clause The syntax of the clause is the following: GROUP BY (column_name1), column_name2, column_name3, ... , table_name condition column_name2, column_name3, ... , column_name2, column_name3, ... , SELECT count FROM WHERE GROUP BY ORDER BY When it comes to syntax, it’s important to keep in mind the placement of the GROUP BY clause in the SELECT statement. Always write the GROUP BY statement after the FROM and/or WHERE clauses and before the ORDER BY clause (if one exists). As you can see, to use the clause, you need to know the basics of querying data in SQL. If you feel like brushing up on those basics, take a look at some online courses. GROUP BY There are many Internet resources to learn SQL, like Udemy, Coursera, edX but they often lack of interactivity and may be too challenging for beginners. I recommend sites where you gain hands-on experience in writing code from the very first exercise, such as in the SQL Basics by Vertabelo Academy. You will be amazed how fast you can progress through practice! If you’re already familiar with writing basic statements, learning how to group data in SQL will be a piece of cake. Let's see how works in practice! SELECT GROUP BY SQL GROUP BY Examples It’s time to solve the tasks presented in the scenario. First, you’ll have to write a basic GROUP BY query. Afterward, you’ll go through a slightly more complicated example to see how grouping can be performed on more than one table column. Task #1: Get the Number of Influencers Per Location The Query: location, (location) number_of_influencers influencer location location SELECT count as FROM GROUP BY ORDER BY Let’s take a closer look at the process of building the above query: 1. SELECT the location column. We’ll group your data using this column. location influencer SELECT FROM 2. Add another column to the SELECT statement by applying the COUNT function to one of the table columns to get the number of influencers. Since the COUNT function is used to count the table rows, it doesn't matter which table column you apply it to. location, (location) number_of_influencers influencer SELECT count as FROM 3. The previous query will not work without a GROUP BY clause, because it contains an aggregate function. Therefore, insert a GROUP BY clause, and use the location column for grouping. Since the column contains five different values (Brighton, London, Rome, Los Angeles, and Sydney), the final query result will have five rows. location, (location) number_of_influencers influencer location SELECT count as FROM GROUP BY The table below displays how the grouping is done: the data is grouped based on distinct values in the location column, where each drawn rectangle contains data for a particular location. In other words, each rectangle represents one data group. The COUNT function will count the number of rows within each group and return the number of influencers within a location. 4. Apply an clause on the location column to sort the final query results in ascending order based on the values in that column. ORDER BY location, (location) number_of_influencers influencer location location SELECT count as FROM GROUP BY ORDER BY Query Result: Task #2: Get the Number of Influencers Per Manager at Each Location Let’s verify whether managers will have to hire assistants for the forum. The Query: influencer_manager, location, (influencer_name) number_of_influencers influencer influencer_manager, location, influencer_manager SELECT count as FROM GROUP BY ORDER BY Now, we’ll take a closer look at the process of building the above query: 1. the and columns; you will group your data using these columns. SELECT influencer_manager location influencer_manager, location, influencer SELECT FROM 2. Just as we did in the previous example, add another column to the SELECT statement by applying the COUNT function to one of the table columns (to get the number of influencers per manager). influencer_manager, location, (influencer_name) number_of_influencers influencer SELECT count as FROM 3. Again, the query from the previous step won’t work without a GROUP BY clause since it contains an aggregate function. You should now apply a GROUP BY clause to both the influencer_manager and location columns. So, first, you group data by managers. Next, you group those managers by location, effectively creating manager subgroups within the location groups. influencer_manager, location, (influencer_name) number_of_influencers influencer influencer_manager, location SELECT count as FROM GROUP BY The function will count the rows within the subgroup of each group and return the number of influencers per manager at each location. COUNT 4. Apply an ORDER BY clause on the influencer_manager column to sort the query results in ascending order by the name of a manager. influencer_manager, location, (influencer_manager) number_of_influencers influencer influencer_manager, location influencer_manager SELECT count as FROM GROUP BY ORDER BY Query Result: Et voilà! As you can see, there are managers with more than one local influencer, and there are managers with influencers from different locations all over the world. Thanks to this knowledge, we can now, for example, plan the budget for the hiring managers’ assistants or spot managers who’d have to decide on attending only one event location. Summary In this article, I’ve reviewed the basics of the SQL clause. Grouping data is a common operation when it comes to data manipulation. GROUP BY Being able to group data easily is a very useful skill when working with datasets of different sizes, and you know now how the SQL clause makes grouping easy: All you need is some basic SQL knowledge and a couple of minutes to write a query! GROUP BY So if you do have some basic SQL knowledge, it would be a pity if you don't upgrade it with functionality and get the most of it when it comes to data grouping. GROUP BY Thank you for reading! ;-)