Operations That Causes Index Fragmentation Storage Engine Part 11

Inserts

now, it might not be the most common practice that causes fragmentation, for instance, in our previous blog, we showed an ever-increasing key example, now there since each key is inserted at the end of the page there is no way that would lead to any fragmentation.

Clustered index fragmentation using NEWID()

the most common practice is to create a surrogate key, a synthetic key that provides a way of physically sorting the data without the risk of duplicates like first + last names

now we can use an ever-increasing value like the following:


create table surrogate_key_test(
id int identity(1,1) not null primary key clustered,
col1 int) 
--- now if we insert into col1 like 
insert into surrogate_key_test(col1) values (20),(30),(40),(50)

Now if we select* like:

select*
from surrogate_key_test

now as you can see here it adds from 1 up to N values, so there is no way a duplicate would be there

or a key might increase in bytes, since you can’t change it,

so no fragmentation

now you can delete it, but since the keys have to be in sync on page, it would be inserted at the end,

that is the most common best practice, no fragmentation

but if we use the NEWID() function

now this is what you call a uniqueidentifier value that is unique instance wise

now this would guarantee uniqueness in the server, not worldwide

at the cost of inserting a totally different value each time it does it

it could be before the last key

or after at

which leads to random inserts

with no time each page gets full and some of them never get filled

which leads to external and internal fragmentation

(for more information check the docs)

let’s try it:

use testdatabase
create table new_id_demo(
UserID uniqueidentifier not null default newid(),
purchase_amount int);
with 
n1(c) as (select 0 union all select 0 ),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
n4(c) as (select 0 from n3 as t1 cross join n3 as t2),
n5(c) as (select 0 from n4 as t1 cross join n4 as t2),
ids(id) as (select row_number() over (order by (select null)) from n5)
insert into new_id_demo(purchase_amount)
select id*20
from ids; 
create clustered index ix_userid on new_id_demo(userid)
with( fillfactor = 70, pad_index =on);

Now if we select*:

select*
from new_id_demo

now these are the kind of values that are generated

now if we check sys.dm_db_index_physical_stats:


select page_count, avg_page_space_used_in_percent
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.new_id_demo'),1,null,'detailed')

The output would be like:

now as you can see we recently created the index so no issues there, no fragmentation

but happens if we insert the same amount of values again

when we did in the last demo, we had to do it over 12 times to get some fragmentation

Here we are going to insert the same amount of rows we inserted twice:


go
with 
n1(c) as (select 0 union all select 0 ),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
n4(c) as (select 0 from n3 as t1 cross join n3 as t2),
n5(c) as (select 0 from n4 as t1 cross join n4 as t2),
ids(id) as (select row_number() over (order by (select null)) from n5)
insert into new_id_demo(purchase_amount)
select id*20
from ids;
go 2

And query the DMF again:

select page_count, avg_page_space_used_in_percent,fragment_count
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.new_id_demo'),1,null,'detailed')

now as you can see here it started fragmenting, now our int is 4 bytes and uniqueidentifier is 16bytes

the row count is:

SELECT COUNT(*)
FROM new_id_demo;
---199168

which used 1193 pages

now each row has 24 bytes, and each page can store 335 rows, now initially with fillfactor it tolerated the inserts but at a certain point it had to split

now normally with 100 percent fillfactor we need 595 pages

but here it is almost double plus the intermediate levels, but since we had pad_index on it tolerated that a little bit

so even with fillfactor it had to fragment to have percent

plus

if we wanted to scan a range the whole thing would be filled with logical pointers, thus ramping up random i/o

now if we had the index with no fill factor like:

create table new_id_demo_2(
UserID uniqueidentifier not null default newid(),
purchase_amount int);
with 
n1(c) as (select 0 union all select 0 ),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
n4(c) as (select 0 from n3 as t1 cross join n3 as t2),
n5(c) as (select 0 from n4 as t1 cross join n4 as t2),
ids(id) as (select row_number() over (order by (select null)) from n5)
insert into new_id_demo_2(purchase_amount)
select id*20
from ids; 
create clustered index ix_userid2 on new_id_demo_2(userid)
with( fillfactor = 100);

Now if we look up the DMF:

select page_count, avg_page_space_used_in_percent,fragment_count
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.new_id_demo_2'),1,null,'detailed')

Now if we insert the same amount twice:

go
with 
n1(c) as (select 0 union all select 0 ),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
n4(c) as (select 0 from n3 as t1 cross join n3 as t2),
n5(c) as (select 0 from n4 as t1 cross join n4 as t2),
ids(id) as (select row_number() over (order by (select null)) from n5)
insert into new_id_demo_2(purchase_amount)
select id*20
from ids; 
go 2 

dmf again like:

select page_count, avg_page_space_used_in_percent,fragment_count
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.new_id_demo_2'),1,null,'detailed')

Now for the row count here :


SELECT COUNT(*)
FROM new_id_demo_2;
---196608

Now if we insert it like this:

go
with 
n1(c) as (select 0 union all select 0 ),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
n4(c) as (select 0 from n3 as t1 cross join n3 as t2),
ids(id) as (select row_number() over (order by (select null)) from n4)
insert into new_id_demo_2(purchase_amount)
select id*20
from ids; 
go 10

Then we look up the DMF again:

now here as you can see the pages have a lot of splits now here the whole thing could have been stored in 595

and fill factor did help at first, but once passed a certain threshold, the the whole thing blew over

and we needed almost double the pages to store the values

and the process is entirely unpredictable, we can’t know when the next random unique value will be inserted, so it is best to avoid these values, or at least indexing them

now if try inserting the whole thing again twice:

So fragmentation increases, if we do it again:

so unpredictable

same could happen if our values in the nonclustered have the same property of total unpredictability

each insert will cause fragmentation

Updates

now updates can cause index fragmentation in two ways

increasing the size of a record

now if we had a record, that has varchar in it, then for some reason we updated that value to a higher one we could induce pagesplits, since that value could no longer fit on that page

for example:

create table dbo.updates_frag
(
id int not null,
col1 varchar(7000)
)
insert into dbo.updates_frag(id,col1) values
(1,REPLICATE('a',900)),
(2,REPLICATE('B',900)),
(3,REPLICATE('C',900))

Now let’s create a clustered:

create clustered index ix_uf_id on dbo.updates_frag(id)

Now let’s see the DMF:

select index_type_desc,page_count, avg_page_space_used_in_percent,fragment_count,avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.updates_frag'),null,null,'detailed')

We get:

Now if we do the following:

update dbo.updates_frag
set col1 = replicate('x',7000)
where id = 2

Then DMF:

select index_type_desc,page_count, avg_page_space_used_in_percent,fragment_count,avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.updates_frag'),null,null,'detailed')

The output would be like:

Now here since we can see that an update could cause fragmentation in clustered indexes same thing, could happen if this column was a nonclustered key, not just a regular row

DELETE

now anything we delete before reclaiming the space has fragmentation:


use testdatabase
create table delete_frag(
id int identity(1,1),
col1 int);
with
n1(c) as (select 0 union all select 0),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
n4(c) as (select 0 from n3 as t1 cross join n3 as t2),
n5(c) as (select 0 from n4 as t1 cross join n4 as t2),
ids(id) as(select row_number() over(order by(select null)) from n5)
insert into delete_frag(col1)
select id*4 from ids ;
create clustered index ix_delete on delete_frag(id)

Now after that if we DMF:

select index_type_desc,page_count, avg_page_space_used_in_percent,fragment_count,avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.delete_frag'),null,null,'detailed')

Now if we delete half the rows like:

delete from dbo.delete_frag where id%2 = 0

Now we DMF again:

Internal fragmentation doubled

Now if we hit index rebuild like:

alter index ix_delete 
on dbo.delete_frag
rebuild

Then DMF again like:

select index_type_desc,page_count, avg_page_space_used_in_percent,fragment_count,avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.delete_frag'),null,null,'detailed')

We would get the older results like:

Now, we can configure it like this:

alter index ix_delete 
on dbo.delete_frag
rebuild  with(fillfactor =100 ,pad_index= on)

Now if we DMF again:

select index_type_desc,page_count, avg_page_space_used_in_percent,fragment_count,avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.delete_frag'),null,null,'detailed')

Variant of fragmentation: Heap bloat, or unused deletes, or ghosted that is yet to be cleaned

now here we have a heap, this heap has rows, now these rows get inserted first, after that somebody deletes them, they are waiting to be used, now sometimes they do get reused, and sometimes they don’t, sometimes this space can not be ever reclaimed unless we create a clustered rebuild then drop.

The state where the heap deleted rows still occupy space = the size is still the same but the data is less = bloat state, since the heap is bloated with empty records

to demonstrate the effects:

use testdatabase
create table heap_bloat(
id int not null,
col1 int,
col2 int);
with 
n1(c) as (select 0 union all select 0),
n2(C) as ( select 0 from n1 as t1 cross join n1 as t2 ),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
n4(c) as ( select 0 from n3 as t1 cross join n3 as t2),
n5(c) as (select 0 from n4 as t1 cross join n4 as t2),
ids(id) as( select row_number() over (order by (select null)) from n5)
insert into heap_bloat(id,col1,col2)
select id,id*2,id*4
from ids

Now if we DMF like:

select OBJECT_NAME(object_id),page_count,index_type_desc,index_level,avg_fragmentation_in_percent,avg_page_space_used_in_percent
from sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID(N'dbo.heap_bloat'),null,null,'detailed')

We get:

Now if we count with statistics:

set statistics io on;
go
select count(*)
from dbo.heap_bloat
go
set statistics io off;

We get the following:

--- row count 65536
---Table 'heap_bloat'. Scan count 1, logical reads 171, physical reads 0

Now if we delete:

delete from dbo.heap_bloat 
where id%2 = 0

Now if we check the DMF:

select OBJECT_NAME(object_id),page_count,index_type_desc,index_level,avg_fragmentation_in_percent,avg_page_space_used_in_percent
from sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID(N'dbo.heap_bloat'),null,null,'detailed')

now as you can see the pages are the same but the avg_page_space_used_in_percent = 54 percent so it is half empty now

now if we count with stats:

set statistics io on;
go
select count(*)
from dbo.heap_bloat
go
set statistics io off;

We get:

--32768
--Table 'heap_bloat'. Scan count 1, logical reads 171, physical reads 0, page server reads 0, read-ahead reads 0

Now as you can see even though we have the rows we still need to read 171 pages, even though they are empty, why? heap_bloat

now if we rebuild the table like this:


alter table dbo.heap_bloat rebuild

Now if we check the DMF:

select OBJECT_NAME(object_id),page_count,index_type_desc,index_level,avg_fragmentation_in_percent,avg_page_space_used_in_percent
from sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID(N'dbo.heap_bloat'),null,null,'detailed')

now we have half the page count, and most of the space is full, no fragmentation

and if we read with counts like:


set statistics io on;
go
select count(*)
from dbo.heap_bloat
go
set statistics io off;

We get:

--32768
--Table 'heap_bloat'. Scan count 1, logical reads 86, physical reads 0, page server reads 0, read-ahead reads 0, 

Now as you can see we have almost half the reads now.

Variant of fragmentation: forwarded records

now in Heaps, we talked about forwarding records, and we said, when SQL server can not fit an updated record into a page, instead of taking the whole thing to a new page, and assigning a new RID, it moves to another one, leaves what you call a forwarding stub, that tells where is the new record and how to get it, then it forwards, moves the record to another page that can fit it on it, so it becomes a forwarded record.

Now this serves the purpose of not updating nonclustered indexes on heaps, since there is no need to update it, if RID is still the same, however this introduces overhead of random I/O, since SQL server has to look for another random page anytime it encounters such an issue.

Let’s see that in action:

use testdatabase
create table forwarding (
id int identity(1,1),
col1 varchar(6000)
)
create nonclustered index ix_idd on dbo.forwarding(id);
go
insert into dbo.forwarding(col1) 
values (replicate('x',500))
go 1000

Now we want to create a baseline for index scans, singleton-lookup, and range seeks using set statistics i/o:


go
set statistics io on;
go
select*
from forwarding;
select*
from forwarding 
where id = 200;
select*
from forwarding
where id between 200 and 210;
go
set statistics io off

We get the following:

(1000 rows affected)
Table 'forwarding'. Scan count 1, logical reads 85, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

(1 row affected)

(1 row affected)
Table 'forwarding'. Scan count 1, logical reads 3, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

(1 row affected)

(11 rows affected)
Table 'forwarding'. Scan count 1, logical reads 13, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

Now let’s update:


update dbo.forwarding
set col1 = replicate('x',3000)
where id % 3 = 1

Now if we do the same again:


go
set statistics io on;
go
select*
from forwarding;
select*
from forwarding 
where id = 200;
select*
from forwarding
where id between 200 and 210;
go
set statistics io off

We get:

(1000 rows affected)
Table 'forwarding'. Scan count 1, logical reads 460, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

(1 row affected)

(1 row affected)
Table 'forwarding'. Scan count 1, logical reads 3, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

(1 row affected)

(11 rows affected)
Table 'forwarding'. Scan count 1, logical reads 15, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

(1 row affected)

so singleton lookups did not have much change, maybe because the range we looked into did not have any ids that are divided by 3 so no need to forward anything

but the scan definitely increased from 85 to 460 which is like 5 times

and the range scan with nested increased from 13 to 15

so it affects performance a lot

and with this, we are done with this one, see you in the next one.

2 responses to “Operations That Causes Index Fragmentation Storage Engine Part 11”

Leave a Reply

Your email address will not be published. Required fields are marked *