paint-brush
The Most Commonly Used SQL Queries by Data Scientistsby@starktechie
340 reads
340 reads

The Most Commonly Used SQL Queries by Data Scientists

by Sarang S. BabuFebruary 20th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

SQL (Structured Query Language) is a programming tool or language that is widely used by data scientists to extract and manipulate information from databases. As a data scientist, understanding SQL is essential for extracting and analyzing the data you need to make informed decisions for your organization. Some of the most commonly used SQL queries include SELECT, JOIN, GROUP BY, WHERE, COUNT, SUM, UPDATE, DELETE, and INSERT.

People Mentioned

Mention Thumbnail
featured image - The Most Commonly Used SQL Queries by Data Scientists
Sarang S. Babu HackerNoon profile picture

A sizable database might be burdensome If you don't have a reliable method for organizing, managing, and locating the required data. The same database can become a pleasant and practical resource if you are familiar with the most frequent and straightforward SQL queries.

SQL (Structured Query Language) is a programming tool or language that is widely used by data scientists and other professionals working with data to extract and manipulate information from databases. As a data scientist, understanding SQL is essential for extracting and analyzing the data you need to make informed decisions for your organization.


SQLis a declarative language, meaning that you specify what you want the database to do, rather than telling it how. The SQL queries you use are essentially instructions that tell the database what data to retrieve, how to filter it, and how to organize it. Just like a search engine uses complex algorithms but we don’t need to be aware of it to use it.

There are many different types of SQL queries, and each has its own specific use. Some of the most commonly used SQL queries by data scientists include SELECT, JOIN, GROUP BY, WHERE, COUNT, SUM, UPDATE, DELETE, and INSERT. Each of these queries plays a specific role in the data analysis process, and understanding how and when to use them is key to effectively working with data.

In this article, we will discuss commonly used SQL queries by data scientists and explore the different ways in which these queries can be used to extract and analyze data. As the article ends, you will have a sound understanding of how to use SQL to extract and analyze data, and you will be able to use these queries to gain valuable insights from your data.


Queries

Note that to demonstrate the queries, we will take the help of the following sample database. This example database contains two tables: "EMPLOYEES" and "DEPARTMENTS". The "EMPLOYEES" table contains information about employees, such as their ID, name, age, and department. The "DEPARTMENTS" table contains information about departments, such as their ID, department name, and location.



EMPLOYEES TABLE

ID

NAME

AGE

DEPARTMENT

1

Abhinav

25

IT

2

Vaishnavi

30

HR

3

Shubhangi

35

IT

4

Dhruvin

40

HR

5

Aditi

45

Finance


DEPARTMENTS TABLE

ID

DEPARTMENT

LOCATION

1

IT

Hyderabad

2

HR

Kolkata

3

Finance

Pune


SELECT

The SELECT statement is the most fundamental and often used SQL query. It is used to select specific columns from a table and retrieve data from them. The SELECT statement has the following syntax:


SELECT column1, column2, …

FROM table_name;


For example, in the EMPLOYEES table, to select name and age, the query will be:

SELECT name, age FROM employees;


By doing this, the name and age columns from the employees table will be retrieved. The output for the above query will be:

NAME

AGE

Abhinav

25

Vaishnavi

30

Shubhangi

35

Dhruvin

40

Aditi

45


In order to display all the columns, you can use an asterisk ‘*’ rather than mentioning all the columns in the SELECT statement.


For example,


SELECT * FROM employees;


The output for the above query will be

NAME

AGE

DEPARTMENT

Abhinav

25

IT

Vaishnavi

30

HR

Shubhangi

35

IT

Dhruvin

40

HR

Aditi

45

Finance


WHERE

The WHERE clause is used to filter data based on a specific condition. It is often used in conjunction with the SELECT statement to retrieve a subset of data that meets a certain criteria. The WHERE clause has the following syntax:


SELECT column1, column2 FROM table_name WHERE condition;


For example, if you wanted to select the "name" and "age" columns from the "employees" table, but only for employees who are older than 30, the query would be:


SELECT name, age FROM employees WHERE age > 30;


This query selects the name and age columns from the employees table, but only for employees who are older than 30. The output would be:


NAME

AGE

Shubhangi

35

Dhruvin

40

Aditi

45


JOIN

Data from many tables are combined using the JOIN clause. It is often used to retrieve data from multiple tables that are related to each other through a common column. The JOIN clause has the following syntax:


SELECT column1, column2 FROM table1 JOIN table2 ON table1.column = table2.column;


For instance, the following query might be used to pick the "name," "department," and "location" columns from the "employees" table and the "location" column from the "departments" table:


SELECT employees.name, employees.department, departments.location FROM employees JOIN departments ON employees.department = departments.department;


This query uses a JOIN clause to combine data from the employees and departments tables. It selects the name and department columns from the employees table and the location column from the departments table, where the department values in both tables match. The output would be:


NAME

DEPARTMENT

LOCATION

Abhinav

IT

Hyderabad

Vaishnavi

HR

Hyderabad

Shubhangi

IT

Kolkata

Dhruvin

HR

Kolkata

Aditi

Finance

Pune


AGGREGATE FUNCTIONS

Aggregate functions are used to perform calculations on a set of data and return a single value. Some of the most commonly used aggregate functions in SQL are:


  • COUNT: returns the number of rows in a table or the number of rows that match a specific condition.
  • SUM: returns the sum of all values in a specific column.
  • AVG: returns the average of all values in a specific column.
  • MIN: returns the minimum value in a specific column.
  • MAX: returns the maximum value in a specific column.


For example, if you wanted to find the average salary of employees in the "employees" table, the query would be:


SELECT AVG(age) FROM employees;


Output: 32.4


This query selects the average age of all employees in the employees table. It uses the AVG() aggregate function to calculate the average age of all employees.


GROUP BY, AND HAVING

The GROUP BY clause is used to group data based on a specific column. It is often used in conjunction with aggregate functions such as COUNT, SUM, and AVG to retrieve summary information about the data. The GROUP BY statement has the following syntax:


SELECT column1, COUNT(column2) FROM table_name GROUP BY column1;


For example, if you wanted to count the number of employees in each department, the query would be:


SELECT department, COUNT(*) FROM employees GROUP BY department;


This query uses the GROUP BY clause to group the employees by department and the COUNT() function to count the number of employees in each department. The output would be:


DEPARTMENT

COUNT(*)

IT

2

HR

2

FINANCE

1


The HAVING clause is used to filter data based on aggregate functions. It is similar to the WHERE clause, but it is used in conjunction with the GROUP BY clause to filter data after it has been grouped. The HAVING clause, using the COUNT aggregate function will have the following syntax:


SELECT column1, COUNT(column2) FROM table_name GROUP BY column1 HAVING COUNT(column2) > number;


For example, if you wanted to count the number of employees in each department, but only for departments with more than 5 employees, the query would be:


SELECT department, COUNT() FROM employees GROUP BY department HAVING COUNT() > 5;


This query uses the GROUP BY clause to group the employees by department, the COUNT() function to count the number of employees in each department, and the HAVING clause to filter only the departments that have more than 5 employees. Since no department has more than 5 employees, this query would return an empty set.


ORDER BY

The ORDER BY clause is used to sort data based on a specific column. It is often used in conjunction with the SELECT statement to retrieve data in a specific order. The ORDER BY clause has the following syntax:


SELECT column1, column2 FROM table_name ORDER BY column1 ASC/DESC;


For example, if you wanted to select the "name" and "age" columns from the "employees" table and sort them by age in descending order, the query would be:


SELECT name, age FROM employees ORDER BY age DESC;


This query selects the name and age columns from the employees table and orders the rows by age in descending order. The output would be:


NAME

AGE

Aditi

45

Dhruvin

40

Shubhangi

35

Vaishnavi

30

Abhinav

25


INSERT

The INSERT statement is used to insert new data into a table. The INSERT statement has the following syntax:


INSERT INTO table_name (column1, column2, column3) VALUES (value1, value2, value3);


For example, if you wanted to insert a new employee into the "employees" table with the name "John Smith", age 30, and salary $50,000, the query would be:


INSERT INTO employees (name, age, department) VALUES ("Rahul Gandhi", 30, PR);

The output of the query would be the insertion of a new row in the employees table with the following values:


ID

NAME

AGE

DEPARTMENT

6

Rahul

30

PR


The INSERT INTO statement is used to add a new row to the employees table.

The values being inserted are specified in the VALUES clause, in the order that they correspond to the columns in the table. The values being inserted in this query are "Rahul Gandhi", 30 and "PR" for the columns name, age and department.


The ID is auto-incrementing, so it will be assigned automatically.


As long as the syntax is correct and the specified columns and values match, this query will insert a new row into the employees table with the specified values.


UPDATE

The UPDATE statement is used to modify existing data in a table. The UPDATE statement has the following syntax:


UPDATE table_name SET column1 = value1, column2 = value2 WHERE condition;


For example, if you wanted to update the age of all employees in the "employees" table who have the department "IT" to 18, the query would be:


UPDATE employees SET age = 18 WHERE department = "IT";


This query would update the age of all employees in the "IT" department to 18. Update query can be helpful in situations like setting the salary of a group of employees to a particular value.


DELETE

The DELETE statement is used to delete existing data in a table. The DELETE statement has the following syntax:


DELETE FROM column

WHERE condition;


As an example,

DELETE FROM employees

WHERE department = 'IT';


This query will delete all rows from the employees table where the department is 'IT'. It is important to use the WHERE clause in a DELETE query to prevent accidentally deleting all rows in a table.


Advanced Query: TRIGGERS

Triggers are special types of stored procedures that are automatically executed in response to certain events on a particular table or view in a database. Triggers can be used to enforce data integrity, maintain data consistency, or audit changes to the data.


Triggers are defined using the CREATE TRIGGER statement, and specify the event that will trigger the execution of the trigger, such as INSERT, UPDATE, or DELETE statements on a specific table, as well as the actions that will be taken when the trigger is executed.


For example, a trigger can be created to automatically update the "updated_at" field of a table whenever any data in the table is modified. The syntax for this trigger would be something like this:


CREATE TRIGGER update_date

AFTER UPDATE ON employees

FOR EACH ROW

BEGIN

    UPDATE employees

    SET updated_at = CURRENT_TIMESTAMP

    WHERE id = OLD.id;

END;


In this example, the trigger is named "update_date" and it is set to fire after an update statement on the "employees" table. The "FOR EACH ROW" clause specifies that the trigger should fire once for each row affected by the update statement.


The "BEGIN" and "END" keywords define the body of the trigger, which contains the statement that updates the "updated_at" field with the current timestamp.


Triggers can also be used to enforce referential integrity by cascading updates or deletes to related tables, or to prevent certain actions from being performed if certain conditions are not met.


It is important to keep in mind that triggers can have a significant impact on performance, especially if they are not written or used properly. They should be used with caution and only when absolutely necessary.


Conclusion

In this article, we discussed the various queries in SQL that are commonly used in the field of data science.


  • SQL is a powerful language for managing and querying relational databases
  • SELECT, INSERT, UPDATE, and DELETE are the basic SQL commands for reading and modifying data in a database
  • Aggregate functions, JOINs, and GROUP BY clauses allow for advanced data manipulation and analysis
  • Triggers can be used to automatically execute certain actions in response to specific events in the database.
  • These SQL commands and features provide a wide range of tools for effectively working with and understanding the data in a database.
  • A good understanding of SQL can greatly aid in data analysis and management tasks.
  • SQL can be used to implement the CRUD operations in the database and enables the developer to perform complex data manipulation and analysis.