The first step in optimizing a query is looking at it’s execution plan. An execution plan is a visual representation of the operations performed by the database engine in order to return the data required by your query.
The execution plan for a query is your view into the SQL Server query optimizer and query engine. It will reveal which objects a query uses, object like:
type of uses:
- in which order
- how they were accessed (seek or scan)
- types of joins
- data size
- calculated columns derivation
- foreign keys access
within the database, and how it uses them.
There are 3 types of execution plans:
Estimated execution plan:
To get a query’s estimated execution plan:
- Switch to the database in concern
- highlight the query in concern
- click Query
- click Display Estimated Execution Plan
The shortcut for this is Ctrl+ L. When you get this plan, some sections of the query might show a very high percentage as cost.
Remember that this is just an estimate, and many a times what we see as the most expensive section comes up as the most cheapest or cheap.
Btw, each symbol in the execution plan is an operator, and each operator has additional information within its properties that define the work performed by that operator.
Actual execution plan:
To get a query’s actual execution plan:
- Switch to the database in concern
- highlight the query in concern
- click Query
- click Include Actual Execution Plan
The shortcut for this is Ctrl+ M. One of the queries gave this actual execution plan.
Look at the actual vs estimated numbers, looks like the estimate is good enough. One of other queries gave this execution plan:
Looks like the estimates and actuals are far different.
This proves that actual execution plan are always good to tune compared to estimates. Btw, we read execution plans from left to right and top to bottom.
The estimated percentage costs at the front level are still estimates. So to tune the queries, it looks like we need to go through each of the blocks one-by-one. This looks like a lot of pain. This is where the 3rd type of execution plans comes into the picture — the cached execution plan.
Cached execution plan:
Just execute this
EXEC sp_BlitzCache @top=10, @sort_order='duration'. This query will give the top 10 queries with the most execution time. In a column it also shows the possible problems like missing indexes, downlevel cardinality estimates and implicit conversions in the Warnings column. In the Query Plan column, it gives the actual execution plan map link which can be clicked upon.
The single largest use for execution plans is query tuning. It’s the most common tool for understanding what you want to tune. It’s invaluable for identifying and fixing poorly performing code. I can’t run through every potential cause of poor query performance here, or even close, but we can look briefly at a few of the common culprits and the warning signs you’ll find in the execution plan.
Poorly Designed Queries
Like with any language, there are common code smells within T-SQL. I’ll list out a few of them as well as what you might see in the execution plan.
Search Conditions with Functions
Placing a function on a column in the
WHERE clause (e.g.
WHERE SomeFunction(Column) = @Value ) can lead to very poor performance because it renders the predicate non-SARGable, which means that SQL Server cannot use it in an index seek operation. Instead, it will be forced to read the entire table or index. In the execution plan, look out for a Scan operator against a table or index where you expected a Seek operation, i.e. a single or small set of lookups within an index.
Having a view that calls to other views or has
JOIN operations to other views can lead to very poor performance. The query optimizer goes through a process called simplification, where it attempts to eliminate tables that are not needed to fulfil the query, helping to make it faster. For simple queries involving views or functions, the optimizer will simply access the underlying base tables, and perform table elimination in the usual fashion. However, use of nested views and functions will lead quickly to a very complex query within SQL Server, even if the query looks simple in the T-SQL code. The problem is that beyond a certain point, SQL Server can’t perform the usual simplification process, and you’ll end up with large complex plans with lots of operators referencing the tables more than once, and so causing unnecessary work.
Incorrect Data Type Use
Of course, to do this it must apply the conversion function to the column, in the
WHERE clause and we are back in the situation of having a non-SARGable predicate, leading to scans where you should see seeks. You may also see a warning indicating an implicit conversion is occurring.
Row-by-row approach is expressed via cursors and sometimes as
WHILE loops that often lead to extremely poor performance. In such cases, rather than a single execution plan, you’ll see one plan for each pass through of the loop, and SQL Server essentially generates the same plan over and over, to process each row.
Common Warning Signs in the Execution Plan
These are red crosses or yellow exclamation marks superimposed on an operator. There are sometimes false positives, but most of the time, these indicate an issue to investigate. Costly Operations — the operation costs are estimated values, not actual measures, but they are the one number provided to us so we’re going to use them. The most costly operation is frequently an indication of where to start troubleshooting
Within an execution plan, the arrows connecting one operator to the next are called pipes, and represent data flow. Fat pipes indicate lots of data being processed. Of course, this is inevitable if the query simply has to process a large amount of data, but it can often be an indication of a problem. Look out for transitions from very fat pipes to very thin ones, indicating late filtering and that possibly a different index is needed. Transitions from very thin pipes to fat ones indicates multiplication of data, i.e. some operation within the T-SQL is causing more and more data to be generated.
Over time you’ll start to recognize operators and understand quickly both what each one is doing and why. Any time that you see an operator and you don’t understand what it is, or, you don’t understand why it’s there, then that’s an indicator of a potential problem.
We say this all the time, scans are not necessarily a bad thing. They’re just an indication of the scan of an index or table. If your query is
SELECT * FROM TableName, with no
WHERE clause then a scan is the best way to retrieve that data. In other circumstances, a scan can be an indication of a problem such as implicit data conversions, or functions on a column in the
Let’s take a look at a few of the warning signs that can appear, and what we can do in response.
DECLARE @LocationName AS NVARCHAR(50);
SET @LocationName = 'Paint';
SELECT p.Name AS ProductName ,
l.Name AS LocationName
FROM Production.Product AS p
JOIN Production.ProductInventory AS pi ON pi.ProductID = p.ProductID
JOIN Production.Location AS l ON l.LocationID = pi.LocationID
WHERE LTRIM(RTRIM(l.Name)) = @LocationName;
And the problems are as follows:
- The INDEX SCAN on the
Location.Namecolumn – there’s an index on that column a SEEK is expected - culrit:
LTRIM(RTRIM(l.Name)), in the
- The FAT PIPE coming out of the Clustered Index Scan on the ProductInventory table — in this example, it’s processing over 1000 rows at this stage, and eventually only returning 9 rows. — culprit: missing index on the column.
- The HASH MATCH join — for the number of rows returned I’m expecting a NESTED LOOPS join — culprit: bad or missing indexes.
At the top of some execution plans, you may even see an explicit Missing Index suggestion, indicating that the optimizer recognized that if it had a certain index, it might be able to come up with a better plan. Note these, but do not assume they’re accurate. Test them before applying them.
A view is just a query that the optimizer resolves just like any other query. However, if you decide to JOIN a view to a view, or nest views inside of each other, it causes problems for SQL Server. Firstly, as the number of referenced objects increases, due to deep nesting, the optimizer, which only has so much time to try to optimize the query, will give up on simplification and just build a query for the objects defined. For each object, the optimizer must estimate the number of rows returned and, based on that, the most efficient operation to use to return those rows. However, once we go beyond three levels deep on obfuscation, the optimizer stops assigning costs and just assumes one row returned. All of this can result in very poor choices of plan and a lot of avoidable work.
To illustrate how nesting views can hurt the server, consider this:
FROM dbo.PersonAddress AS pa
JOIN dbo.CustomerSales AS cs
ON cs.CustomerPersonID = pa.PersonID
LEFT JOIN dbo.ContactEmailAddress AS cea
ON cea.ContactPersonID = pa.PersonID
WHERE pa.City = 'Redmond';
All the 3 referenced objects are views. And the execution plan will be so complex like this:
The simplication of the above query can be done by actually accessing the underlying tables like this:
SELECT p.LastName + ', ' + p.FirstName AS PersonName ,
FROM Person.Person AS p
JOIN Person.EmailAddress AS ea
ON ea.BusinessEntityID = p.BusinessEntityID
JOIN Person.BusinessEntityAddress AS bea
ON bea.BusinessEntityID = p.BusinessEntityID
JOIN Person.Address AS a
ON a.AddressID = bea.AddressID
LEFT JOIN Person.BusinessEntityContact AS bec
ON bec.PersonID = p.BusinessEntityID
JOIN Sales.Customer AS c
ON c.PersonID = p.BusinessEntityID
JOIN Sales.SalesOrderHeader AS soh
ON soh.CustomerID = c.CustomerID
WHERE a.City = 'Redmond';
The execution plan is much simplified now:
Lack of Database Constraints
Consider this query:
SELECT soh.OrderDate ,
p.Name AS ProductName
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod
ON sod.SalesOrderID = soh.SalesOrderID
JOIN Production.Product AS p
ON p.ProductID = sod.ProductID
WHERE p.Name = 'Water Bottle - 30 oz.'
AND sod.UnitPrice < $0.0;
With no constraints in place, the execution plan for this query will look as shown:
This query returns zero rows, since there are no products with a price less than $0, and yet SQL Server still performs an index scan, two index seeks and a couple of nested loop joins. The reason is that it has no way to know they query will return zero rows.
However, what if we have a constraint in place on the SalesOrderDetail table that requires that the UnitPrice for any row be greater than zero (this constraint already exists in the AdventureWorks database).
ALTER TABLE Sales.SalesOrderDetail WITH CHECK
ADD CONSTRAINT CK_SalesOrderDetail_UnitPrice CHECK ((UnitPrice>=(0.00)));
If we rerun the above code with this constraint in place, then the execution plan is interesting:
Missing or out-of-date Database Statistics
The statistics can be turned off or on by the following command:
SET AUTO_UPDATE_STATISTICS OFF | ON;
It is recommended to not set them off to avoid stale statistics.
Maybe it’s not the Database?
It’s also possible that the database is not the culprit, the application can cause performance problems. So, profile the application code as well.