There are so many exciting projects out there in the Data World. Artificial Intelligence, Machine Learning, Neural Nets, Blockchain, and more are sweeping the technology industry. In order to get to the cutting-edge stuff, first and foremost, data needs to be stored, evaluated, and tested. The best place to do that is SQL (or a library that operates with SQL-like commands, see my article on Python’s Pandas library).
This series Data Mastery: SQL, will teach you the essential subjects. These are not exhaustive tutorials. Instead they are focused preparation guides — with brevity and efficiency in mind. It is meant for:
See my resource list of the books I used to prepare for my big interview
Each article will contain a brief technical explanation of the subject, an example question, and an answer. Follow up articles will contain challenging questions so you can evaluate your learning.
This series does not come with accompanying data sets. The advantage to this is when you are on the drawing board, whether in an interview or project design, you do not have test data to play with. You have to think abstract.
There are many versions of SQL such as MySQL, SQL Server, Presto, Postgres, Spark, etc. There will be small syntax differences between the different systems but the basics are the same.
SQL deals with relational databases which means the system can check if the columns in one table match up with columns in another. This allows data to be stored across simple and organized tables. It also means you can leverage the relationships in a query and select only the data you need.
For now we will start with a single table called daily_user_score that records users’ scores for each game they play a day. It has four columns: date, userid, sessionid, and score.
A basic SQL query requires you to list the specific column names you want to see in a comma separated list after the word SELECT. It also requires you to put a table name after the word FROM. This query extracts all the columns from the table:
SELECT *
FROM daily_user_score;
OR
SELECT date, userid, sessionid, score
FROM daily_user_score;
Both of these queries return all the data in this table:
date | userid | sessionid | score
------------------------------------------
2018–09–01 | 983489272 | 125 | 112
2018–09–01 | 234342423 | 34 | 112
2018–09–01 | 567584329 | 207 | 618
2018–09–02 | 983489272 | 126 | 410
2018–09–02 | 983489272 | 127 | 339
Write a SQL query that extracts only the userid and their score from this table.
SELECT userid, score
FROM daily_user_score;
This query returns:
userid | score
-----------------
983489272 | 112
234342423 | 112
567584329 | 618
983489272 | 410
983489272 | 339
Thanks for reading! If you have questions feel free to comment & I will try to get back to you.
Connect with me on Instagram @lauren__glass
Connect with me on LinkedIn
Check out my essentials list on Amazon
Search for me using my nametag on Instagram!