There are so many exciting projects out there in the Data World. Artificial Intelligence, Machine Learning, Neural Nets, Blockchain, and more are sweeping the technology industry. In order to get to the cutting-edge stuff, first and foremost, data needs to be stored, evaluated, and tested. The best place to do that is SQL (or a library that operates with SQL-like commands, see my article on Python’s Pandas library).
This series Data Mastery: SQL, will teach you the essential subjects. These are not exhaustive tutorials. Instead they are focused preparation guides — with brevity and efficiency in mind. It is meant for:
See my resource list of the books I used to prepare for my big interview
Each article will contain a brief technical explanation of the subject, an example question, and an answer. Follow up articles will contain challenging questions so you can evaluate your learning.
This series does not come with accompanying data sets. The advantage to this is when you are on the drawing board, whether in an interview or project design, you do not have test data to play with. You have to think abstract.
One scenario you might come across is you need to find all unique values from a column where that value will repeat. You will know it if the question includes: “Find the users who…” or “Show the countries which…” or “Find only the dates that…”
First check in the schema if there are other columns that would cause the target column/s to have repeated rows. If there is such a column, you will need to use the word DISTINCT after the word SELECT in your query.
What do I mean by this? Let’s revisit our table called daily_user_score.
This table records users’ scores for each game they play a day and has four columns: date, userid, sessionid, and score.
Say we want a list of all our game’s players. Highlighted in our data we can see the date and sessionid columns have created multiple records of 983489272 in the userid column.
date | userid | sessionid | score
------------------------------------------
2018–09–01 | 983489272 | 125 | 112
2018–09–01 | 234342423 | 34 | 112
2018–09–01 | 567584329 | 207 | 618
2018–09–02 | 983489272 | 126 | 410
2018–09–02 | 983489272 | 127 | 339
Here is the query that will find all the users who play our game.
SELECT DISTINCT userid
FROM daily_user_score;
This is what the query returns:
userid
---------
983489272
234342423
567584329
Write a SQL query that finds all the different scores achieved.
SELECT DISTINCT score
FROM daily_user_score;
This query returns:
score
-----
112
618
410
339
Thanks for reading! If you have questions feel free to comment & I will try to get back to you.
Connect with me on Instagram @lauren__glass
Connect with me on LinkedIn
Check out my essentials list on Amazon
Search for me using my nametag on Instagram!