Included Columns And Filtered Indexes Storage Engine Part 9

Indexes with included columns:

now before, we talked about creating composite indexes, there we provided two columns as keys for an index

we saw it had to store the keys and their ranges in the intermediate and root pages, plus the leaves

What if we just wanted to store them in the leaves, not in the upper levels?

we can use indexes with included columns, thus providing a new way to avoid key or RID-lookup

when we avoid key-lookup, by including some columns in the nonclustered definition,

we are making a query covering since it does not need to process any clustered index pages or levels

now if we need a key-lookup SQL server tends to favor an index scan of the clustered instead, since it has sequential i/o and would get all the rows, instead of running the results of the nonclustered with a nested loop join.

now if we implement the covering queries technique we are golden, only leaves need to be modified and we don’t need to read the intermediate and root pages again

but if we include as the rightmost key in composite indexes we have to include them there too

which means bigger storage, and heavier reads of intermediate and root pages

plus we have a size limitation in composite which was 1700 bytes

plus we can’t store LOB storage as a key

but in included columns, we can do both

plus if we modify the included column definition, we only have to add it the data pages, meaning we don’t need to modify the root and intermediate, meaning fewer page splits and fragmentation

let’s try it

use TestDatabase
go
create table inc_ix_ex(
col1 int,
col2 int,
col3 int);
with
n1(c) as (select 0 union all select 0),
n2(c) as (select 0 from n1  as f1 cross join n1 as f2),
n3(c) as ( select 0 from n2 as f1 cross join n2 as f2),
n4(c) as (select 0 from n3 as f1 cross join n3 as f3),
n5(c) as (select 0 from n4 as f1 cross join n4 as f2),
ids(id) as (select row_number()over(order by(select null)) from n5)
insert into inc_ix_ex 
select id,id*2,id*4
from ids 

Now let’s create our nonclustered:


create nonclustered index  ix_include_e on inc_ix_ex(col1)

Now if we select like:

select*
from inc_ix_ex 
where col1 = 20

we get a RID-lookup

now we have already tried the composite index before, now let’s try the include option

create nonclustered index ix_include_e on inc_ix_ex(col1) include(col2,col3) with(drop_existing = on)

Now let’s run the select again:

select*
from inc_ix_ex 
where col1 = 20

now no RID-LOOKUP

now if we dbcc ind the index with its id(we can get it from sys.indexes but the index_id for the first nonclustered index is always 2):


dbcc ind('TestDatabase','dbo.inc_ix_ex',2)

now as you can see we have one root page and no intermediate pages

now if we dbcc page the root:

dbcc traceon(3604)
dbcc page('TestDatabase,560728,1)

now you can see it only stores the first column or col1 in the root page unlike the composite

and since it is a heap it stores its physical location as RID or row id now if we check the output:


Allocation Status

GAM (1:511232) = ALLOCATED          SGAM (1:511233) = NOT ALLOCATED     PFS (1:558072) = 0x40 ALLOCATED   0_PCT_FULL
DIFF (1:511238) = CHANGED           ML (1:511239) = NOT MIN_LOGGED      

Slot 0 Offset 0x60 Length 22

Record Type = INDEX_RECORD          Record Attributes =  NULL_BITMAP    Record Size = 22

Memory Dump @0x00000031E3376060

0000000000000000:   16010000 00984d08 00010000 00388e08 00010002  ......M......8Ž.....
0000000000000014:   0000                                          ..     

Slot 1 Offset 0x76 Length 22

Record Type = INDEX_RECORD          Record Attributes =  NULL_BITMAP    Record Size = 22

Memory Dump @0x00000031E3376076

0000000000000000:   16380100 00984d08 00010037 01398e08 00010002  .8....M....7.9Ž.....
0000000000000014:   0000                                          ..     

Now we have two slots:

00088E38 = 388e0800= page id in the first and 00088E39 = 398e08 00 = page id for the second

the last are for the file id

0138 is the 2-byte presentation of 317 which is the second’s key and the first has the null with 16 for the first byte

rid should be stored which in hex was 0x984D080001003701 for the first which is equal to 00984d08 00010037 01 same for the second

and that is it, so it only stores the col1’s value and RID

now if we look into the first index page like:


Slot 0 Offset 0x60 Length 24

Record Type = INDEX_RECORD          Record Attributes =  NULL_BITMAP    Record Size = 24

Memory Dump @0x00000031DADF6060

0000000000000000:   16010000 00984d08 00010000 00020000 00040000  ......M.............
0000000000000014:   00040000                                      ....   

Slot 1 Offset 0x78 Length 24

Record Type = INDEX_RECORD          Record Attributes =  NULL_BITMAP    Record Size = 24

Memory Dump @0x00000031DADF6078

0000000000000000:   16020000 00984d08 00010001 00040000 00080000  ......M.............
0000000000000014:   00040000                                      ....   

so we have the RID in the 4th to the 12th byte 0x984D080001000000 and 1

we have 4 bytes for col1 1 in the after 16 the first byte, col2 2, col3 4 as we expected

we have the 16 at the first

and I don’t know what 16 means

or 4 at the end

so it stored the whole thing, it stored the col1 and row id first then the columns as we expected

what about SARGability:

Example 1

SELECT *
FROM inc_ix_ex WITH
WHERE col2 = 4;

so it scans here as in the composite, so without the leftmost we can’t

we can prove that by hinting like this:

SELECT *
FROM inc_ix_ex WITH (INDEX (ix_include_e), FORCESEEK)
WHERE col2 = 4;

We would get the following error:

Query processor could not produce a query plan because of the hints defined in this query.
 Resubmit the query without specifying any hints and without using SET FORCEPLAN.

Use it but:

Example 2

If we use inequality predicates in all of them in any order like:


select*
from inc_ix_ex with(index = ix_include_e)
where col2<>2 and col3<> 4 and col1<> 1
and col1> 4 

it still uses the clustered no need for RID lookups

so just like the composite in terms of SARGability just with fewer index pages

we can prove that by forcing the seek hint:

select*
from inc_ix_ex WITH (INDEX (ix_include_e) FORCESEEK)
where col2<>2 and col3<> 4 and col1<> 1
and col1 >4 

And we would get the following:

so it seeks in ranges, but it still seeks

for comparing performance hits between an index scan and the range scan in index seeks go to our blog( ways to read indexes )

However, there is a difference in predicates:

Difference in seeking between included and composite indexes

Now if we look at the properties of our previous seek:

you can see, it seeks on col1>4 operator first, then it sees in the output rows of the seek predicate if the predicate( the upper stuff, the other stuff in the where clause) if they fit that, then it outputs them

now if we set statistics i/o on

Now if we try another example like:

select*
from inc_ix_ex
where col1 = 1 and col2 = 2 

we can see the same problem

now if we set statistics i/o on:

go 
set statistics io on;
go
select*
from inc_ix_ex
where col1 = 1 and col2 = 2 
go
set statistics io off;
go
Table 'inc_ix_ex'. Scan count 1, 
logical reads 2, physical reads 2, 
page server reads 0, read-ahead reads 0,

Now let’s drop the index and create a composite one:


drop index ix_include_e on dbo.inc_ix_ex

Now let’s create a composite one:

create nonclustered index not_inc on dbo.inc_ix_ex(col1,col2,col3)

Then do the same select:


go 
set statistics io on;
go
select*
from inc_ix_ex
where col1 = 1 and col2 = 2 
and 1 = (select 1)
go
set statistics io off;
go
Table 'inc_ix_ex'. Scan count 1, logical reads 2, physical reads 0, 

now here we might not notice the difference, so we are going to do another select

a bigger table

, but before that, we can look into the seek predicates: Now as you can see here it seeks on both keys right away unlike the included columns which had to compare the predicates after seeking on the first, which make them better for a single query

now as you can see here it seeks on both keys right away unlike the included columns which had to compare the predicates after seeking on the first, which make them better for a single query

now if we select like this:

go 
set statistics io on;
go
select*
from inc_ix_ex with(index(not_inc) forceseek)
where col1 between  1 and 10000 
and col2 = 2 
and 1 = (select 1)
go
set statistics io off;
go

this is using the composite:

we get:

Table 'inc_ix_ex'. Scan count 1, logical reads 35, physical reads 0

the output was like:

drop index not_inc on inc_ix_ex

now recreate the including like:

create nonclustered index ix_include_e on inc_ix_ex(col1) 
include(col2,col3)

now reselect:


go 
set statistics io on;
go
select*
from inc_ix_ex with(index(ix_include_e) forceseek)
where col1 between  1 and 10000 
and col2 = 2 
and 1 = (select 1)
go
set statistics io off;
go

we get:

Table 'inc_ix_ex'. Scan count 1, logical reads 35,

now it is still the same, but the seek predicates differ in each, so if we had a value that had a col1 not selective enough it would have blown off the results since it can’t filter by col2, it just compares it afterward

Filtered indexes

now if we wanted to index a subset of column or columns related data, not the whole thing, we can use filtered indexes

for example:

create table fi_demo(
id int,
col1 varchar(50)
);
with 
n1(c) as (select 0 union all select 0),
n2(c) as ( select 0 from n1 as f1  cross join n1 as f2 ),
n3( c) as ( select 0 from n2  as f1 cross join n2 as f2),
n4( c) as ( select 0 from n3 as f1 cross join n3 as f2) ,
n5( c) as ( select 0 from n4 as f1 cross join n4 as f2),
ids(id) as (select row_number() over (order by (select null)) from n5)
insert into fi_demo(id,col1)
select id, replicate('s',20)
from ids

and you wanted an index on customers that has an id between 50 000 and 60 000 since they were the ones with most of the money(😄)

you can create a filtered index like:

create nonclustered index ix_filtered_id_col1 on fi_demo(id)
include(col1)
where id>=50000 and id<=60000

now if we select like:


select*
from fi_demo 
where id = 50000
and 1 = (select 1)

we would get the following

now if the query got parameterized by the engine like:


select*
from fi_demo 
where id = 50000

we would get the unmatched indexes warning

as you can see in the statement, we have @1 parameter since the plan is simply parameterized by the engine,it is not supposed to use the index, but if we use (USEPLAN) hint or compare both plans, or run dbccs that shows us the optimizer’s work, we did not see a difference in plans anywhere in the process except in the warning now we feel here that the engine uses the index, but it warns us that this plan could not be cached so we have to recompile it each time we use it

otherwise, we don’t have an explanation for this warning here other than this, you could check our tests on this in methods used in troubleshooting queries

now when the plan is not a candidate for simple parameterization, we don’t see the warning, like we saw before,

but if we do it manually:


declare @idv int
set @idv = 50000
select*
from fi_demo 
where id= @idv

it had to go and scan and it shows us the same warning, but here, or in the traces we can see that it does not use the index seek, unlike in simple parameterization

why it does this?

let’s assume instead of 50000 we passed the value 2

would the index be used? the answer is no

so we can’t use this plan twice

so the optimizer chooses to ignore the index and see any other alternative, in this case it was the scan

how could we solve it? option(recompile):


declare @idv int
set @idv = 50000
select*
from fi_demo 
where id= @idv
option(recompile)

we can see the seek

now filtered statistics( we mentioned it in statistics blog)

does not get affected by updates

for example:


update fi_demo 
set id = 2
where id >=50000 and id <=60000

now if we query the previous query:


declare @idv int
set @idv = 50000
select*
from fi_demo 
where id= @idv
option(recompile)
--- with out recompile

declare @idv int
set @idv = 50000
select*
from fi_demo 
where id= @idv
option(recompile)

we would get the following:

for the second

and

which means SQL server still thinks we have the rows since the index has not been updated

even though the stats are:


select name, stats_id, has_filter,auto_created,auto_drop
from sys.stats
where object_id=OBJECT_ID('dbo.fi_demo')

we can see that the stats has been dropped once in auto_drop

now if we dbcc show statistics for the index and the last auto stats:

dbcc show_statistics('dbo.fi_demo',ix_filtered_id_col1)

we can see it still thinks we have the rows, even though we met the update threshold

but in the new one:

dbcc show_statistics('dbo.fi_demo',_WA_Sys_00000001_2610A626)

we can see it dropped our rows from distinct_range_rows

now we still have them, since they all are 2 as we can see from the first two ranges

but they have 2 = id not what the filtered shows, now if we update the filtered manually:


update statistics fi_demo ix_filtered_id_col1 

the dbcc again

dbcc show_statistics('dbo.fi_demo',ix_filtered_id_col1)

We get:

Now if we select with option recompile:

select id
from fi_demo 
where id= 50000
option(recompile)

now it knows there is nothing in the index so it does not read it, but still shows us the index

now if we had a clustered index like:


create clustered index ix_id on fi_demo(col1)

And we did the select like:


declare @idv int
set @idv = 50000
select*
from fi_demo 
where id= @idv
option(recompile)

It still favors the index seek, but noncosty, dummy operator, since the index has only one page, that has nothing in it,

so as you can see, we should manually update the stats on the filtered to get the desired result.

With that, we finish this blog, see you in the next one.

Leave a Reply

Your email address will not be published. Required fields are marked *