Wednesday, October 19, 2011
SQL Server Good Guidance on Missing Indices
http://beyondrelational.com/blogs/martinschmidt/archive/2011/09/27/missing-indexes.aspx?utm_source=brnewsletter&utm_medium=email&utm_campaign=2011Oct18
One of the most common things I encounter when asked to help with performance problems, is wrong or inadequate indexing. Creating the optimal indexes for a system, is no where near a trivial excercise. You need to consider the read/write ratio as well as how you write your queries. This is not the scope of this blog post.
SQL Server has a number of dynamic management views/functions, which is very useful for different purposes. At least if you know they are there, and how to use them. For missing index information, these views are important:
view source
print?
1.SELECT * FROM sys.dm_db_missing_index_group_stats
2.SELECT * FROM sys.dm_db_missing_index_details
3.SELECT * FROM sys.dm_db_missing_index_groups
sys.dm_db_missing_index_group_stats tracks information about indexes which could have been useful to the SQL Server. It holds information about how many times the SQL Server could have used that index, and how great an impact it would have had on the query. This DMV does not give you the details about the table and columns that the index should have been created on.
This is however available in the sys.dm_db_missing_index_details DMV, where you get information about the database_id, object_id and the columns in the index.
The last DMV is only used to bind the two DMV’s together.
The DMV’s contains data on server level, so it will show details about missing index in all your databases.
Let me show how a basic query on these DMV’s could look like:
view source
print?
1.SELECT *
2.FROM
3. sys.dm_db_missing_index_groups g
4. INNER JOIN sys.dm_db_missing_index_group_stats gs ON gs.group_handle = g.index_group_handle
5. INNER JOIN sys.dm_db_missing_index_details d ON g.index_handle = d.index_handle
On my local SQL Server, this gives me an empty resultset:
So, let me create some test data:
view source
print?
01.CREATE DATABASE IndexDemo
02.GO
03.USE IndexDemo
04.GO
05.CREATE TABLE DemoData (
06. Id INT IDENTITY PRIMARY KEY,
07. Val1 INT,
08. Val2 INT,
09. Val3 CHAR(4000)
10.)
11.GO
12.INSERT INTO DemoData (Val1, Val2, Val3)
13.VALUES (1,1, '')
14.GO
15.
16.INSERT INTO DemoData (Val1, Val2, Val3)
17.SELECT Val1, Val2, Val3
18.FROM DemoData
19.GO 15
I have created a table with a clustered index on the Id column, and filled the table with 32K rows. Now I will create a query to return the Id and Val1 column, given a specific value for Val2:
view source
print?
1.SELECT Id, Val1
2.FROM DemoData
3.WHERE Val2 = 123
So, now let me try to run the missing index query again, but this time only looking at a limited number of columns:
view source
print?
01.SELECT
02. user_seeks,
03. last_user_seek,
04. avg_total_user_cost,
05. avg_user_impact,
06. database_id,
07. object_id,
08. equality_columns,
09. inequality_columns,
10. included_columns
11.FROM
12. sys.dm_db_missing_index_groups g
13. INNER JOIN sys.dm_db_missing_index_group_stats gs ON gs.group_handle = g.index_group_handle
14. INNER JOIN sys.dm_db_missing_index_details d ON g.index_handle = d.index_handle
Now one row is returned. Let’s go through the returned columns, and see what the values mean:
user_seeks
This tells you how many times the SQL Server could have used this missing index to lookup data. The higher this number is, the more reason to build the index. How high this is, depends on your system. On high volume OLTP setups, numbers of less the 10.000 is not worth worrying about, and on smaller systems a value of 20-100 might be high enough to consider the index.
last_user_seek
This tells you when the missing index was last needed. If this value is equal to last saturday at night, it _could_ mean that you have a nightly job running on saturday, which might need this index. avg_total_user_cost
This value does not have a unit, but it’s a value that tells you something about the estimated cost of the queries that needed the index. The higher the value is, the more resource intensive is the query that needs the index. If the value is very low (perhaps 0.01) then the query that needed the index was not very resource intensive. Think of a query that executes in 2ms compared to a query that takes 2000ms. You will probably gain more by adding an index to satisfy the 2000ms query than the 2ms one.
avg_user_impact
This is the expected improvement of the query if you build the query. The closer the value is to 100, the more the query will benefit from building the index.
database_id
This is the database where you need to build the index.
object_id
This is the object_id of the table that could need an index.
equality_columns
These columns are the ones that you should build the index on. Equality columns means that the predicate used the column like this: “Val2 = x”, which was exactly how I wrote my demo query.
inequality_columns
You should also add these columns to your index. Inequality columns are used for range predicates, like: “Val2 > x”. My missing index query returned NULL in this column, because my demo query had no such predicate.
included_columns
If columns are used in the SELECT part of your query, and not in the predicate, you can add these columns as included columns in the index. This way all needed data is available in the index, and no key lookup is needed on the clustered index. But beware! The more columns you add to your index, the more write IO and space you need. My rule of thumb says, that the total number of columns (equality_columns + inequality_columns + included_columns) should not exceed 5. If the total number of columns exceeds 5, that does not mean that you should not build the index – it just means that you need to think carefully about what you are doing!
So, normally I run this query, and returns the rows ordered by user_seeks desc. Then I look at the top 10-20 rows, and see if any of them have avg_user_impact in the 90-100 range. Then I look for the number of columns in the index, and then I decide whether or not the index should be build. This is NOT a query that gives you the answer to everything, but it can definately help you spot the indexes that you should add to your system.
Finally you can actually modify the query to give you the full CREATE INDEX statement, so you just need to copy that to a query window and execute it. The full query I usually use, looks like this:
view source
print?
01.DECLARE @DBName VARCHAR(100)
02.DECLARE @TableName VARCHAR(100)
03.--If this line is commented in, the missing index list will only contain
04.--missing indexes for the given database
05.--SET @DBName = 'MyDatabase'
06.--SET @TableName = 'MyTable'
07.
08.;WITH CTE
09.AS
10.(
11.SELECT
12. DB_NAME(d.database_id) AS DatabaseName,
13. user_seeks,
14. user_scans,
15. avg_total_user_cost,
16. avg_user_impact,
17. d.equality_columns,
18. d.inequality_columns,
19. d.included_columns,
20. 'USE ' + DB_NAME(d.database_id) + '; CREATE NONCLUSTERED INDEX IX_' +
21. replace(replace(replace(replace(isnull(equality_columns, '') +
22. isnull(inequality_columns, ''), ',', '_'), '[', ''),']', ''), ' ', '') +
23. CASE WHEN included_columns IS NOT NULL
24. THEN '_INC_' + replace(replace(replace(replace(included_columns, ',', '_'), '[', ''),']', ''), ' ', '')
25. ELSE '' END + ' ON ' + statement + ' (' +
26. CASE
27. WHEN equality_columns IS NOT NULL AND inequality_columns IS NOT NULL
28. THEN equality_columns + ', ' + inequality_columns
29. WHEN equality_columns IS NOT NULL AND inequality_columns IS NULL
30. THEN equality_columns
31. WHEN equality_columns IS NULL AND inequality_columns IS NOT NULL
32. THEN inequality_columns
33. END + ')' +
34. CASE WHEN included_columns IS NOT NULL THEN ' INCLUDE (' +
35. replace(replace(replace(included_columns, '[', ''),']', ''), ' ', '') + ')'
36. ELSE '' END +
37. CASE WHEN @@Version LIKE '%Enterprise%' THEN ' WITH (ONLINE = ON)'
38. ELSE '' END AS CreateIndex
39.FROM
40. sys.dm_db_missing_index_groups g
41. INNER JOIN sys.dm_db_missing_index_group_stats gs on gs.group_handle = g.index_group_handle
42. INNER JOIN sys.dm_db_missing_index_details d on g.index_handle = d.index_handle
43.WHERE
44. (DB_NAME(d.database_id) = @DBName
45. OR @DBName IS NULL)
46.)
47.SELECT * FROM CTE
48.WHERE CreateIndex LIKE '%'+@TableName+'%' OR @TableName IS NULL
49.ORDER BY user_seeks DESC
Feel free to modify to match your naming standards.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment