How non-persisted computed column indexes make your JSON queries high performance Recently I’ve been working with JSON in . SQL Server 2016 a lot One of the hesitations many people have with using in is that they think that querying it must be really slow — SQL is supposed to excel at relational data, not string parsing right? JSON SQL Server It turns out that performance is pretty good with the standalone SQL Server JSON functions. Even better is that it’s possible to make queries against JSON data run at ludicrous speeds by . In this post I want to take a look at how SQL is able to parse* with such great performance. using indexes on JSON parsed computed columns *“Parse” here is actually a lie — it’s doing something else behind the scenes. You’ll see what I mean, keep reading! Computed Columns in SQL Server The only way to get JSON indexes working on SQL server is to use a . A computed column is basically a column that performs a function to calculate its values. computed column For example, let’s say we have a table with some car JSON data in it: DROP TABLE IF EXISTS dbo.DealerInventory;CREATE TABLE dbo.DealerInventory(Id int IDENTITY(1,1) PRIMARY KEY,Year int,JsonData nvarchar(300)); INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Volkswagen", "Model" : "Golf" }'); INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Honda", "Model" : "Civic" }'); INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Subaru", "Model" : "Impreza" }'); SELECT * FROM dbo.DealerInventory; /* Output:Id Year JsonData----- -------- ---------------------------------------------1 2017 { "Make" : "Volkswagen", "Model" : "Golf" }2 2017 { "Make" : "Honda", "Model" : "Civic" }3 2017 { "Make" : "Subaru", "Model" : "Impreza" }*/ We can add a new computed column to the table, “Make”, which parses and extracts the Make property from each row’s JSON string: ALTER TABLE dbo.DealerInventoryADD Make AS JSON_VALUE(JsonData, '$.Make'); SELECT * FROM dbo.DealerInventory; /* Output:Id Year JsonData Make-- ----- ------------------------------------------- ----------1 2017 { "Make" : "Volkswagen", "Model" : "Golf" } Volkswagen2 2017 { "Make" : "Honda", "Model" : "Civic" } Honda3 2017 { "Make" : "Subaru", "Model" : "Impreza" } Subaru*/ By default, the above Make computed column is non-persisted, meaning its values are never stored to the database (persisted computed columns can also be created, but that’s a topic for a different time). Instead, every time a query runs against our table, SQL Server will calculate the value for each row. dbo.DealerInventory The performance of this isn’t great — it’s essentially a scalar function running for each row of our output :(. However, when you combine a computed column with an index, something interesting happens. Time to dive in with DBCC Page is an undocumented SQL Server function that shows what the raw data stored in a SQL page file looks like. . DBCC Page Page files are how SQL Server stores its data In the rest of this post we’ll be looking at how data pages (where the actual table data in SQL is stored) and index pages (where our index data is stored) are affected by non-persisted computed columns — and how they make JSON querying super fast. First, let’s take a look at the existing data we have. We do this by first turning on trace flag 3604 and using to get the page ids of our data. Additional details on the column definitions in and can be found in . DBCC IND DBCC IND DBCC PAGE Paul Randal’s blog post on the topic DBCC TRACEON(3604); -- "Sandbox" is the name of my databaseDBCC IND('Sandbox','dbo.DealerInventory',-1); If you look at the results above, row 2 contains our data page (indicated by PageType = 1) and the PagePID of that page is 305088 (if you are playing along at home, your PagePID is most likely something else). If we then look up that PagePID using we get something like this: DBCC PAGE DBCC PAGE('Sandbox',1,305088,3) WITH TABLERESULTS You can see our three rows of data highlighted in red. The important thing to note here is that our computed column of the parsed “Make” value is truly non-persisted and no where to be found, meaning it has to get generated for every row during query execution. Now, what if we add an index to our non-persisted computed column and then run again: DBCC IND CREATE NONCLUSTERED INDEX IX_ParsedMake ON dbo.DealerInventory (Make) DBCC IND('Sandbox','dbo.DealerInventory',-1); You’ll now notice that in addition to data page 305088 (PageType = 1), we also have an index page 305096 (PageType = 2). If we examine both the data page and the index page we see something interesting: DBCC PAGE('Sandbox',1,305088,3) WITH TABLERESULTS DBCC PAGE('Sandbox',1,305096,3) WITH TABLERESULTS Nothing has changed with our data page: But our index page contains the parsed values for our “Make” column: What does this mean? I thought non-persisted computed columns aren’t saved to disk! Exactly right: our non-persisted computed column “Make” isn’t saved to the data page on the disk. However if we create an index on our non-persisted computed column, the computed value persisted on the index page! is This is basically a cheat code for indexing computed columns. SQL will only compute the “Make” value on a row’s insert or update into the table (or during the initial index creation) — all future retrievals of our computed column will come from the pre-computed index page. This is how SQL is able to parse indexed JSON properties so fast; instead of needing to do a table scan and parsing the JSON data for each row of our table, SQL Server can go look up the pre-parsed values in the index and return the correct data incredibly fast. Personally, I think this makes JSON that much easier (and practical) to use in SQL Server 2016. Even though we are storing large JSON strings in our database, we can still index individual properties and return results incredibly fast. 💚 Enjoyed this post? Please recommend it by giving it a green heart below. Do you use computed column indexes for any of your tables? Tell me about them in the comments below!