consolidating indexes
in most situations, we don’t need indexes that has the same columns in their definitions, it just add storage and update overhead, and slow down the systems, so dropping them is just about finding the definition characteristics other than its own definition like being unique, adding them to the staying index so we don’t miss on these features, and dropping the redundant one
Now, how could we do that:
use adventureworks2022
go
select
s.name + '.'+t.name as table_name,
i1.index_id as ix1_id,
i1.name as ix1_name ,
duplicates.index_id as ix2_id,
duplicates.name as ix2_name,
left(i1_stuff.index_key,len(i1_stuff.index_key)-1) as ix1_key,
left(i2_stuff.index_key,len(i2_stuff.index_key)-1) as ix2_key,
i1.filter_definition as ix1_filter,
left(i1_stuff.index_included,len(i1_stuff.index_included)-1) as ix1_included,
left(i2_stuff.index_included,len(i2_stuff.index_included)-1) as ix2_included,
duplicates.filter_definition as ix2_filter,
IIF
(
CHARINDEX(i1_stuff.index_key,i2_stuff.index_key) = 1 or CHARINDEX(i2_stuff.index_key,i1_stuff.index_key ) = 1 , 'yes','no'
) redundant_index
from sys.tables t inner join
sys.indexes i1 on t.object_id =i1.object_id
inner join sys.index_columns ic1 on
i1.object_id = ic1.object_id
and i1.index_id = ic1.index_id
and ic1.key_ordinal = 1
inner join sys.columns c on
ic1.object_id = c.object_id
and ic1.column_id = c.column_id
inner join sys.schemas s on
s.schema_id = t.schema_id
cross apply (
select i2.index_id,i2.name,i2.filter_definition
from
sys.indexes i2
inner join sys.index_columns ic2 on ic2.object_id =i2.object_id
and ic2.index_id = i2.index_id
and ic2.key_ordinal = 1
where
i2.object_id =i1.object_id
and i2.index_id > i1.index_id
and ic2.column_id = ic1.column_id
and i2.type IN (1,2)
and i2.is_disabled = 0
and i2.is_hypothetical = 0
and(
i1.has_filter = i2.has_filter
and isnull(i1.filter_definition,'') = ISNULL(i2.filter_definition,'')
)
) duplicates
cross apply
(
select
(
select col.name as [text()], IIF(ic_data.is_descending_key = 1,'desc','') as [text()],',' as [text()]
from
sys.index_columns ic_data
inner join sys.columns col on
col.object_id = ic_data.object_id
and col.column_id = ic_data.column_id
where ic_data.object_id = i1.object_id
and ic_data.index_id = i1.index_id
and ic_data.is_included_column = 0
order by ic_data.key_ordinal
for xml path ('')
) as index_key,
(
select col.name as [text()], ',' as [text()]
from
sys.index_columns ic_data
inner join sys.columns col on ic_data.object_id = col.object_id
and ic_data.column_id = col.column_id
where
ic_data.object_id = i1.object_id
and ic_data.index_id = i1.index_id
and ic_data.is_included_column = 1
order by col.name
for xml path('')
) as index_included
)
as i1_stuff
cross apply
(
select (
select col.name as [text()],IIF(ic_data.is_descending_key = 1,'desc','') as [text()], ',' as [text()]
from sys.index_columns as ic_data inner join sys.columns col on col.object_id = ic_data.object_id and col.column_id = ic_data.column_id
where ic_data.object_id = t.object_id
and ic_data.index_id = duplicates.index_id
and ic_data.is_included_column = 0
order by ic_data.key_ordinal
for xml path('')
)index_key
,
(
select col.name as [text()], ',' as [text()]
from sys.index_columns ic_data
inner join sys.columns col on col.object_id = ic_data.object_id and col.column_id = ic_data.column_id
where
ic_data.object_id = i1.object_id
and ic_data.index_id = duplicates.index_id
and ic_data.is_included_column = 1
order by
col.name
for xml path('')
) as index_included
)i2_stuff
where i1.is_disabled = 0 and i1.is_hypothetical = 0 and i1.type in (1,2)
order by s.name,t.name,i1.index_id;
So there are about 100 lines in here, let’s break it down:
- We are trying to find the table, index, and schema name where index 1 matches index 2 in the first key or included column. We use charindex to get them and compare each to the other one
- So we join sys.tables with sys.indexes and sys.index_columns to get the index_id and the columns in it, so we could compare the keys
- After that, we joined them with sys.columns so we could get the column they are comparing at
- After that, we joined them with sys.schemas so we could get the schema they are comparing at
- After that, we cross-applied so we could get the index 2 that meets the condition where everything matches except the index_id, where the second is bigger than the first and it is not a hypothetical or disabled index, and the indexes are either filtered or non-filtered
- We called that duplicates
- Then we tried to store the data from both into an XML file
- Now we have two data sets, one for the included and one for the key columns
- And we have two indexes
- So we put each into one of the indexes
- We made sure that the object_id matches the the sys.tables id, then we got the columns from the index_columns mother table or ic_data, and we got the index 1 data once for included and once for key columns
- Then we did the same for two
- Then after that in the select query, we looked at it if there is match in the prefix key either in the included or in the key columns, so it looks for the first key or included, if they match, then it is a duplicate
- And we got our list of yes or no
- Now if we run this in AdventureWorks we get:

So both have the same key_column, one is named uq_whatever, and the second looks like a good guy
We should verify what is the difference in use is between both, and what properties each retains, like uniqueness, so we could keep it in the remaining index
Now, if we run it in nothwind, we get this:

You see here, there are a lot of duplicates, and the thing is they all match in the first key
So if the second is the issue, then it is different leftmost, meaning different function, so no detection
meaning that it is an overlapping index, that has the same columns, but different key or included order
There, you have to decide whether you want to keep the index. Could you modify the coding styles in the application?
Otherwise, you might have to keep it
Like if pk_order_details had product_id as the first key, it would not be the containing index for ordered index, since the first key would be productid, but they would be overlapping since both could be covered by the first, if we query them together, but should we? We should look at the queries using them and decide there, there is no shortcut for this
Or if both had the same column like pk_order_details having the same keys but including customerid, would customerid index be redundant, or because it is an included one, we should create a different index, since it is requested individually a lot
Or if we had pk_order_details, which has the same definition but reversed, should we get rid of one of them
Or if they both had the same definition, but one had the customer as a key, and the other one had it as an included, they would be redundant, but not fully, since the prefix matches
But when the order is reversed, like:
create nonclustered index ix_reversed_pk_order_details on [dbo].[order details](productid,orderid)

As you can see, even if both contained the same columns, both are not fully redundant, they are overlapping
Now, the fully redundant one in our case is the ordered index, since it has the same columns
What about the overlapping ones? How could we detect them? Next blog.