consolidating indexes

in most situations, we don’t need indexes that has the same columns in their definitions, it just add storage and update overhead, and slow down the systems, so dropping them is just about finding the definition characteristics other than its own definition like being unique, adding them to the staying index so we don’t miss on these features, and dropping the redundant one

Now, how could we do that:

use adventureworks2022
go
select 
      s.name + '.'+t.name as table_name,
      i1.index_id as ix1_id,
      i1.name as ix1_name ,
      duplicates.index_id as ix2_id,
      duplicates.name as ix2_name,
      left(i1_stuff.index_key,len(i1_stuff.index_key)-1) as ix1_key,
      left(i2_stuff.index_key,len(i2_stuff.index_key)-1) as ix2_key,
      i1.filter_definition as ix1_filter,
      left(i1_stuff.index_included,len(i1_stuff.index_included)-1) as ix1_included,
      left(i2_stuff.index_included,len(i2_stuff.index_included)-1) as ix2_included,
      duplicates.filter_definition as ix2_filter,
      IIF
          (
           CHARINDEX(i1_stuff.index_key,i2_stuff.index_key) = 1 or CHARINDEX(i2_stuff.index_key,i1_stuff.index_key ) = 1 , 'yes','no'
          ) redundant_index
from sys.tables t inner join 
              sys.indexes i1 on t.object_id =i1.object_id
                  inner join sys.index_columns ic1 on
                           i1.object_id = ic1.object_id
                           and i1.index_id = ic1.index_id
                           and ic1.key_ordinal = 1
                  inner join sys.columns c on 
                            ic1.object_id = c.object_id
                            and ic1.column_id = c.column_id
                  inner join sys.schemas s on 
                  s.schema_id = t.schema_id
                   cross apply (
                               select i2.index_id,i2.name,i2.filter_definition
                               from 
                                  sys.indexes i2 
                                      inner join  sys.index_columns ic2 on ic2.object_id =i2.object_id 
                                                                   and ic2.index_id = i2.index_id
                                                                   and ic2.key_ordinal = 1
                               where 
                                    i2.object_id =i1.object_id
                                 and i2.index_id > i1.index_id
                                 and ic2.column_id = ic1.column_id
                                 and i2.type IN (1,2) 
                                 and i2.is_disabled = 0 
                                 and i2.is_hypothetical = 0
                                 and(
                                     i1.has_filter = i2.has_filter 
                                 and isnull(i1.filter_definition,'') = ISNULL(i2.filter_definition,'')
                                    )
                               ) duplicates
                  cross apply 
                             (
                             select
                                   (
                                   select col.name as [text()], IIF(ic_data.is_descending_key = 1,'desc','') as [text()],',' as [text()]
                                   from 
                                       sys.index_columns ic_data
                                       inner join sys.columns col on 
                                           col.object_id = ic_data.object_id
                                           and col.column_id = ic_data.column_id
                                   where ic_data.object_id = i1.object_id 
                                   and ic_data.index_id = i1.index_id
                                   and ic_data.is_included_column = 0
                                   order by ic_data.key_ordinal
                                   for xml path ('')
                                   ) as index_key,
                             
                                   (
                                    select col.name as [text()], ',' as [text()]
                                    from 
                                        sys.index_columns ic_data 
                                        inner join sys.columns col on ic_data.object_id = col.object_id
                                                                  and ic_data.column_id = col.column_id
                                    where
                                         ic_data.object_id = i1.object_id
                                     and ic_data.index_id = i1.index_id
                                     and ic_data.is_included_column = 1
                                     order by col.name
                                     for xml path('')
                                   ) as index_included
                             )
                             as i1_stuff
                 cross apply
                           (
                            select (
                                   select col.name as [text()],IIF(ic_data.is_descending_key = 1,'desc','') as [text()], ',' as [text()]
                                   from sys.index_columns as ic_data inner join sys.columns col on col.object_id = ic_data.object_id and col.column_id = ic_data.column_id
                                   where ic_data.object_id = t.object_id
                                   and ic_data.index_id = duplicates.index_id
                                   and ic_data.is_included_column = 0
                                   order by ic_data.key_ordinal
                                   for xml path('')
                                   )index_key
                                   ,
                                  (
                                  select col.name as [text()], ',' as [text()]
                                  from sys.index_columns ic_data 
                                       inner join sys.columns col on col.object_id = ic_data.object_id and col.column_id = ic_data.column_id
                                  where 
                                       ic_data.object_id = i1.object_id
                                   and ic_data.index_id = duplicates.index_id
                                   and ic_data.is_included_column = 1
                                 order by 
                                     col.name
                                 for xml path('')
                                 ) as index_included
                            )i2_stuff
where i1.is_disabled = 0 and i1.is_hypothetical = 0 and i1.type in (1,2)
order by s.name,t.name,i1.index_id;

So there are about 100 lines in here, let’s break it down:

We are trying to find the table, index, and schema name where index 1 matches index 2 in the first key or included column. We use charindex to get them and compare each to the other one
So we join sys.tables with sys.indexes and sys.index_columns to get the index_id and the columns in it, so we could compare the keys
After that, we joined them with sys.columns so we could get the column they are comparing at
After that, we joined them with sys.schemas so we could get the schema they are comparing at
After that, we cross-applied so we could get the index 2 that meets the condition where everything matches except the index_id, where the second is bigger than the first and it is not a hypothetical or disabled index, and the indexes are either filtered or non-filtered
We called that duplicates
Then we tried to store the data from both into an XML file
Now we have two data sets, one for the included and one for the key columns
And we have two indexes
So we put each into one of the indexes
We made sure that the object_id matches the the sys.tables id, then we got the columns from the index_columns mother table or ic_data, and we got the index 1 data once for included and once for key columns
Then we did the same for two
Then after that in the select query, we looked at it if there is match in the prefix key either in the included or in the key columns, so it looks for the first key or included, if they match, then it is a duplicate
And we got our list of yes or no
Now if we run this in AdventureWorks we get:

So both have the same key_column, one is named uq_whatever, and the second looks like a good guy

We should verify what is the difference in use is between both, and what properties each retains, like uniqueness, so we could keep it in the remaining index

Now, if we run it in nothwind, we get this:

You see here, there are a lot of duplicates, and the thing is they all match in the first key

So if the second is the issue, then it is different leftmost, meaning different function, so no detection

meaning that it is an overlapping index, that has the same columns, but different key or included order

There, you have to decide whether you want to keep the index. Could you modify the coding styles in the application?

Otherwise, you might have to keep it

Like if pk_order_details had product_id as the first key, it would not be the containing index for ordered index, since the first key would be productid, but they would be overlapping since both could be covered by the first, if we query them together, but should we? We should look at the queries using them and decide there, there is no shortcut for this

Or if both had the same column like pk_order_details having the same keys but including customerid, would customerid index be redundant, or because it is an included one, we should create a different index, since it is requested individually a lot

Or if we had pk_order_details, which has the same definition but reversed, should we get rid of one of them

Or if they both had the same definition, but one had the customer as a key, and the other one had it as an included, they would be redundant, but not fully, since the prefix matches

But when the order is reversed, like:

create nonclustered index ix_reversed_pk_order_details on [dbo].[order details](productid,orderid)

As you can see, even if both contained the same columns, both are not fully redundant, they are overlapping

Now, the fully redundant one in our case is the ordered index, since it has the same columns

What about the overlapping ones? How could we detect them? Next blog.

consolidating indexes

Leave a Reply Cancel reply

Quick Links

Get In Touch