now here we are going to start introducing the basics of the iterative process of index tuning

at the start of development, the application code is being modified too frequently, and the queries needed the most are always changing

so we introduce the basics of indexing at this stage

primary keys,

constraints like uniqueness

referential integrity etc.

then at later stages when we know the workload, we can estimate the most requested queries and tune accordingly

when we try to simulate data in development we must meet these two conditions:

enough representative data
simulating workload

now in an existing system(beside emergencies):

index analysis
remove redundant and inefficient
add new if necessary

Table of Contents

analysis of current indexes: detecting the unused

now we have reads and writes

every select just reads

but insert, update, delete, merge all read a certain amount of rows, then modify

for example, delete locates the row = reads it

then it deletes it

now if we are just reading, indexes are golden

but once we modify a row, we have to modify it in every index that contains it, at least clustered, plus any nonclustered that includes it

now to track that we have two DMVs

sys.dm_db_index_operation_stats and sys.dm_db_index_usage_stats

now let’s try both:

create table dbo.index_usage_demo(
id int not null,
col1 int,
col2 int,
col_str char(2000)
);
create unique clustered index ix_index_usage_demo_id_CI on dbo.index_usage_demo(id);
create unique nonclustered index ix_index_usage_demo_col1_nci on dbo.index_usage_demo(col1);
create unique nonclustered index ix_index_usage_demo_col2_nci on dbo.index_usage_demo(col2);
with
n1(c) as (select 0 union all select 0),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
ids(id) as (select row_number() over (order by (select null)) from n3)
insert into dbo.index_usage_demo(id,col1,col2)
select id,id*2,id*20
from ids;

now here we created a table with 1 id, 2 col1 and col2, and one empty string

then we created indexes on the id

and 2 on the integers

then we inserted 16 values each

now let’s see sys.dm_db_index_usage_stats:

sys.dm_db_index_usage_stats:

select*
from sys.dm_db_index_usage_stats
where database_id = db_id()

And we would get the following:

Now here we see:

database_id
object_id
index_id: only unique table-wise
user_seek:s how many seek operators happened in each execution plan
user_scans: how many scan operators happened per execution plan
user_lookup: the same here
user_updates: same here
last_….: it shows timestamps for the last of each
system…: for internal queries
then it has the same timestamps for system

Now we can filter the output just for our table like:

use testdatabase
select 
s.name +'.'+t.name as [schema.table],
i.name as index_name,
iu.user_seeks,
iu.user_scans,
iu.user_lookups,
iu.user_seeks + iu.user_scans+iu.user_lookups as total_user_reads,
iu.user_updates,
iu.last_user_seek,
iu.last_system_scan,
iu.last_user_lookup,
iu.last_user_update
from sys.tables t inner join sys.indexes i on 
                t.object_id = i.object_id
                  inner join sys.schemas s on
                t.schema_id = s.schema_id
                left outer join sys.dm_db_index_usage_stats iu on 
                iu.database_id = db_id()
                and 
                iu.object_id = i.object_id
                and 
                iu.index_id = i.index_id
where s.name = N'dbo' and t.name =N'index_usage_demo'
order by s.name,t.name,i.name;

now here we just added the schema, table, and index name so we could analyze in a more meaningful way, we excluded the system counters

now here we can evaluate the usage stats, how necessary each index is, why they were using it, is it seeks scans, whether we need a new one for key lookups, and how often they do happen

now here as you can see since we did one insert, we can see one update, and the timestamp for the last insertion

now this thing resets every time the server restarts, so we have to keep that in mind

plus, this DMV resets at every server restart

plus, since it only retains information since the last server restart, some indexes that are needed for monthly reports for example, can not be disregarded, since they are necessary but the DMV shows that they have not been used since the last month

now this index could be dropped every month for example

plus, it gets cleaned every time we detach or auto_close a database

plus, index rebuilds reset this DMV in SQL server 2012, 2014, this is a bug

plus,(paul white)it is per operator, not per operation

plus, generating an execution plan does not count

plus, even if the operator did not execute, but it was there during execution, it is counted

plus, index usage stats consider all seeks the same, which means singleton-lookup is counted the same as range-scan, even though they are totally different operations

also, range scans can scan the whole table except the first value, but that is still counted as one

now let’s give some examples:

Example 1: seeks, range-scan vs singleton-lookup

Now if we select like:

--- singleton-lookup
select col_str
from dbo.index_usage_demo
where id = 1

Now after this if we query the DMV:

Now if we do 5 singleton-lookups:

--- 5 singleton lookups counted as 1 
select col_str
from dbo.index_usage_demo
where id in ( 1,2,3,4,5)

Now in the properties of the execution plan:

Then DMV:

So as you can see here it only counts one operator, now if we wanted to see how many times it got executed per row, we need to use sys.dm_db_index_operational_stats more on it later

Now if we range-scan like:

--- range-scan
select col_str
from dbo.index_usage_demo
where id >1

so it is 3 now because it was a seek, even though the range-scan scanned the whole table except one value

Example 2: index scans

now we can:

select*
from dbo.index_usage_demo with (index = 1);

DMV:

we have our scan now

now here as you can see we only used the clustered index

Let’s force the others in the previous queries:

Example 3: singleton-lookup with Key-lookup

select col_str
from dbo.index_usage_demo with(index =ix_index_usage_demo_col1_nci)
where col1 in (2,4,6,8,10)

now here we used it once, it chose to scan the clustered index first so we had to use the hint

now if we DMV:

Example 5: updates, nonclustered, counting the updates even though the transaction did not commit


--- this one did not execute
update dbo.index_usage_demo
set col2 = -20 
--- now this one did
update dbo.index_usage_demo
set col2 = -20  where col2 = 20

Now as you can see it did show it twice even if we could not commit.

Now if an update statement does not execute, it would be counted as an update, and a read

this DMV shows an overview of indexes, if we want a more detailed outlook we should look at the other one:

sys.dm_db_index_operational_stats

Now first let’s select*:

select*
from sys.dm_db_index_operational_stats(db_id(),object_id(N'dbo.index_usage_demo'),NULL,NULL)

Now the last two parameters are for index_id and partition_id

so we have:

database_id
object_id
index_id
partition_number
hobt_id
leaf_insert_count how many inserts did happen on leaf level records row by row, unlike the other one usage stats since the last server restart
leaf_delete_count
leaf_update_count
leaf_ghost_count
nonleaf_……
leaf_allocation_count: how many pages have been allocated to an index or a heap, if a heap, how many pages were added, if an index, how many page splits
nonleaf…….

now we start seeing singleton lookups,range-scans, and forwarded fetch
lob_fetch, is how many times we retrieved
the lob_orphan stuff is for bulk inserts

now here we start to see the columns related to locks and latches

now here we see compression and row-versioning columns

now as you can see, the amount of data is overwhelming

so whenever we use it, we have to tailor it to our specific needs

now here we are going to create a table to demonstrate the seek operations:

Example: singleton-lookups and range scans

create table seeks_demo(
id int not null,
col1 int,
col2 int,
col3 char(7000)
);
create unique clustered index ix_seeks_demo_id_ci on dbo.seeks_demo(id);
create nonclustered index ix_seeks_demo_col1_nci on dbo.seeks_demo(col1);
create nonclustered index ix_seeks_demo_col2_nci on dbo.seeks_demo(col2);
with 
n1(c) as (select 0 union all select 0),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
ids(id) as (select row_number() over ( order by (select null)) from n3)
insert into dbo.seeks_demo(id,col1,col2)
select id, id*2,id*20
from ids

After that, we select with point-lookup and range-scan like:

--- range scan
select col1
from seeks_demo with ( index = ix_seeks_demo_col1_nci)
where col1>0
---- singleton-lookup with key-lookup
select *
from seeks_demo with(index = ix_seeks_demo_col1_nci)
where col1 = 2

Now if we index_operatinal_stats like:

select i.name as index_name,singleton_lookup_count,range_scan_count
from sys.dm_db_index_operational_stats(db_id(),object_id(N'dbo.seeks_demo'),NULL,NULL) op inner join 
     sys.indexes i on i.object_id = op.object_id
     and i.index_id = op.index_id

We would get:

But if we point-lookup only:

select col1
from col1 with(index = ix_seeks_demo_col1_nci)
where col1 = 1

We get:

even though it was a point-lookup

why is that? because it is not unique, it can return any number of values

but the clustered index is unique

So if we select like:

select id
from seeks_demo
where id = 2

So it records it when it is unique only.

And with that, we are done with DMV for usage and operational index stats.