Wednesday, October 19, 2011
SQL Server Good Guidance on Missing Indices
http://beyondrelational.com/blogs/martinschmidt/archive/2011/09/27/missing-indexes.aspx?utm_source=brnewsletter&utm_medium=email&utm_campaign=2011Oct18 One of the most common things I encounter when asked to help with performance problems, is wrong or inadequate indexing. Creating the optimal indexes for a system, is no where near a trivial excercise. You need to consider the read/write ratio as well as how you write your queries. This is not the scope of this blog post. SQL Server has a number of dynamic management views/functions, which is very useful for different purposes. At least if you know they are there, and how to use them. For missing index information, these views are important: view source print? 1.SELECT * FROM sys.dm_db_missing_index_group_stats 2.SELECT * FROM sys.dm_db_missing_index_details 3.SELECT * FROM sys.dm_db_missing_index_groups sys.dm_db_missing_index_group_stats tracks information about indexes which could have been useful to the SQL Server. It holds information about how many times the SQL Server could have used that index, and how great an impact it would have had on the query. This DMV does not give you the details about the table and columns that the index should have been created on. This is however available in the sys.dm_db_missing_index_details DMV, where you get information about the database_id, object_id and the columns in the index. The last DMV is only used to bind the two DMV’s together. The DMV’s contains data on server level, so it will show details about missing index in all your databases. Let me show how a basic query on these DMV’s could look like: view source print? 1.SELECT * 2.FROM 3. sys.dm_db_missing_index_groups g 4. INNER JOIN sys.dm_db_missing_index_group_stats gs ON gs.group_handle = g.index_group_handle 5. INNER JOIN sys.dm_db_missing_index_details d ON g.index_handle = d.index_handle On my local SQL Server, this gives me an empty resultset: So, let me create some test data: view source print? 01.CREATE DATABASE IndexDemo 02.GO 03.USE IndexDemo 04.GO 05.CREATE TABLE DemoData ( 06. Id INT IDENTITY PRIMARY KEY, 07. Val1 INT, 08. Val2 INT, 09. Val3 CHAR(4000) 10.) 11.GO 12.INSERT INTO DemoData (Val1, Val2, Val3) 13.VALUES (1,1, '') 14.GO 15. 16.INSERT INTO DemoData (Val1, Val2, Val3) 17.SELECT Val1, Val2, Val3 18.FROM DemoData 19.GO 15 I have created a table with a clustered index on the Id column, and filled the table with 32K rows. Now I will create a query to return the Id and Val1 column, given a specific value for Val2: view source print? 1.SELECT Id, Val1 2.FROM DemoData 3.WHERE Val2 = 123 So, now let me try to run the missing index query again, but this time only looking at a limited number of columns: view source print? 01.SELECT 02. user_seeks, 03. last_user_seek, 04. avg_total_user_cost, 05. avg_user_impact, 06. database_id, 07. object_id, 08. equality_columns, 09. inequality_columns, 10. included_columns 11.FROM 12. sys.dm_db_missing_index_groups g 13. INNER JOIN sys.dm_db_missing_index_group_stats gs ON gs.group_handle = g.index_group_handle 14. INNER JOIN sys.dm_db_missing_index_details d ON g.index_handle = d.index_handle Now one row is returned. Let’s go through the returned columns, and see what the values mean: user_seeks This tells you how many times the SQL Server could have used this missing index to lookup data. The higher this number is, the more reason to build the index. How high this is, depends on your system. On high volume OLTP setups, numbers of less the 10.000 is not worth worrying about, and on smaller systems a value of 20-100 might be high enough to consider the index. last_user_seek This tells you when the missing index was last needed. If this value is equal to last saturday at night, it _could_ mean that you have a nightly job running on saturday, which might need this index. avg_total_user_cost This value does not have a unit, but it’s a value that tells you something about the estimated cost of the queries that needed the index. The higher the value is, the more resource intensive is the query that needs the index. If the value is very low (perhaps 0.01) then the query that needed the index was not very resource intensive. Think of a query that executes in 2ms compared to a query that takes 2000ms. You will probably gain more by adding an index to satisfy the 2000ms query than the 2ms one. avg_user_impact This is the expected improvement of the query if you build the query. The closer the value is to 100, the more the query will benefit from building the index. database_id This is the database where you need to build the index. object_id This is the object_id of the table that could need an index. equality_columns These columns are the ones that you should build the index on. Equality columns means that the predicate used the column like this: “Val2 = x”, which was exactly how I wrote my demo query. inequality_columns You should also add these columns to your index. Inequality columns are used for range predicates, like: “Val2 > x”. My missing index query returned NULL in this column, because my demo query had no such predicate. included_columns If columns are used in the SELECT part of your query, and not in the predicate, you can add these columns as included columns in the index. This way all needed data is available in the index, and no key lookup is needed on the clustered index. But beware! The more columns you add to your index, the more write IO and space you need. My rule of thumb says, that the total number of columns (equality_columns + inequality_columns + included_columns) should not exceed 5. If the total number of columns exceeds 5, that does not mean that you should not build the index – it just means that you need to think carefully about what you are doing! So, normally I run this query, and returns the rows ordered by user_seeks desc. Then I look at the top 10-20 rows, and see if any of them have avg_user_impact in the 90-100 range. Then I look for the number of columns in the index, and then I decide whether or not the index should be build. This is NOT a query that gives you the answer to everything, but it can definately help you spot the indexes that you should add to your system. Finally you can actually modify the query to give you the full CREATE INDEX statement, so you just need to copy that to a query window and execute it. The full query I usually use, looks like this: view source print? 01.DECLARE @DBName VARCHAR(100) 02.DECLARE @TableName VARCHAR(100) 03.--If this line is commented in, the missing index list will only contain 04.--missing indexes for the given database 05.--SET @DBName = 'MyDatabase' 06.--SET @TableName = 'MyTable' 07. 08.;WITH CTE 09.AS 10.( 11.SELECT 12. DB_NAME(d.database_id) AS DatabaseName, 13. user_seeks, 14. user_scans, 15. avg_total_user_cost, 16. avg_user_impact, 17. d.equality_columns, 18. d.inequality_columns, 19. d.included_columns, 20. 'USE ' + DB_NAME(d.database_id) + '; CREATE NONCLUSTERED INDEX IX_' + 21. replace(replace(replace(replace(isnull(equality_columns, '') + 22. isnull(inequality_columns, ''), ',', '_'), '[', ''),']', ''), ' ', '') + 23. CASE WHEN included_columns IS NOT NULL 24. THEN '_INC_' + replace(replace(replace(replace(included_columns, ',', '_'), '[', ''),']', ''), ' ', '') 25. ELSE '' END + ' ON ' + statement + ' (' + 26. CASE 27. WHEN equality_columns IS NOT NULL AND inequality_columns IS NOT NULL 28. THEN equality_columns + ', ' + inequality_columns 29. WHEN equality_columns IS NOT NULL AND inequality_columns IS NULL 30. THEN equality_columns 31. WHEN equality_columns IS NULL AND inequality_columns IS NOT NULL 32. THEN inequality_columns 33. END + ')' + 34. CASE WHEN included_columns IS NOT NULL THEN ' INCLUDE (' + 35. replace(replace(replace(included_columns, '[', ''),']', ''), ' ', '') + ')' 36. ELSE '' END + 37. CASE WHEN @@Version LIKE '%Enterprise%' THEN ' WITH (ONLINE = ON)' 38. ELSE '' END AS CreateIndex 39.FROM 40. sys.dm_db_missing_index_groups g 41. INNER JOIN sys.dm_db_missing_index_group_stats gs on gs.group_handle = g.index_group_handle 42. INNER JOIN sys.dm_db_missing_index_details d on g.index_handle = d.index_handle 43.WHERE 44. (DB_NAME(d.database_id) = @DBName 45. OR @DBName IS NULL) 46.) 47.SELECT * FROM CTE 48.WHERE CreateIndex LIKE '%'+@TableName+'%' OR @TableName IS NULL 49.ORDER BY user_seeks DESC Feel free to modify to match your naming standards.