Data Mastery: SQL — Subqueries There are so many exciting projects out there in the Data World. Artificial Intelligence, Machine Learning, Neural Nets, Blockchain, and more are sweeping the technology industry. In order to get to the cutting-edge stuff, first and foremost, data needs to be stored, evaluated, and tested. The best place to do that is SQL (or a library that operates with SQL-like commands, ). see my article on Python’s Pandas library This series , will teach you the essential subjects. These are not exhaustive tutorials. Instead they are focused preparation guides — with brevity and efficiency in mind. It is meant for: Data Mastery: SQL Software Engineers who want to analyze their creation’s data Product Managers, Marketers, and others who want to be data-driven Beginning Data Scientists, Data Engineers, Data Analysts, or Business Intelligence Developers preparing for interviews See my resource list of the books I used to prepare for my big interview Each article will contain a brief technical explanation of the subject, an example question, and an answer. Follow up articles will contain challenging questions so you can evaluate your learning. This series does not come with accompanying data sets. The advantage to this is when you are on the drawing board, whether in an interview or project design, you do not have test data to play with. You have to think abstract. Subqueries The subquery is a query within a query. You can use it in a JOIN, a step in transforming data, or in a condition. This is accomplished by isolating a query inside parantheses ( ) and using it as part of a greater query. Lets revisit our three tables from earlier in the series. daily_user_score table: date | userid | sessionid | score ------------------------------------------ 2018–09–01 | 983489272 | 125 | 112 2018–09–01 | 234342423 | 34 | 112 2018–09–01 | 567584329 | 207 | 618 2018–09–02 | 983489272 | 126 | 410 2018–09–02 | 983489272 | 127 | 339 user_revenue table : userid | revenue -------------------- 983489272 | 100 234342423 | 33 567584329 | 57 305852827 | 8 867736361 | 29 user_messages: date | userid | num_messages ------------------------------------- 2018–09–01 | 983489272 | 6 2018–09–01 | 234342423 | 3 2018–09–01 | 305852827 | 4 2018–09–01 | 867736361 | 2 2018–09–02 | 874726454 | 15 In a previous article, we wanted to see for all users how much they paid and their scores on 2018–09–01. This was our query: SELECT r.userid, r.revenue, d.score FROM user_revenue AS r LEFT OUTER JOIN daily_user_score AS d ON d.userid = r.userid AND d.date = ‘2018–09–01’; Say we wanted to see their total scores. Here is a query doing that utilizing a subquery: SELECT r.userid, r.revenue, d.total_score FROM user_revenue AS r LEFT OUTER JOIN ( SELECT userid, SUM(score) AS total_score FROM daily_user_score GROUP BY userid ) AS d ON d.userid = r.userid; Make sure you assign an alias to any column with an aggregation function in a subquery. That way you can use the column you created in the greater query, as we see in the SELECT statement. Here is the resulting data: userid | revenue | score --------------------------- 983489272 | 100 | 861 234342423 | 33 | 112 567584329 | 57 | 618 305852827 | 8 | NULL 867736361 | 29 | NULL Try it yourself Find the average number of messages and average score per day. Answer SELECT COALESCE(m.date, s.date) as date, m.avg_messages, s.avg_score FROM ( SELECT date, AVG(num_messages) as avg_messages FROM user_messages GROUP BY date ) AS m FULL OUTER JOIN ( SELECT date, AVG(score) AS avg_score FROM daily_user_score GROUP BY date ) AS s ON m.date = s.date; Here is the resulting data: date | avg_messages | avg_score ------------------------------------- 2018–09–01 | 3.75 | 280.6 2018–09–02 | 15 | 374.5 Thanks for reading! If you have questions feel free to comment & I will try to get back to you. Connect with me on Instagram @ljglass Connect with me on LinkedIn Check out my essentials list on Amazon Search for me using my nametag on Instagram!