now in part 13, we stopped at comparing performance hits with NEWID() and ever-increasing keys,

now sometimes the developers might decide to choose NEWID(), today we are going to discuss ways to enhance performance using it as a clustered key

after that, we are going to introduce some design considerations while dealing with nonclustered indexes.

Table of Contents

NEWSEQUENTIALID(): Guaranteeing uniqueness across databases

now since developers use it to generate unique values across the application, the main issue was the values were random, what happens if we generated non-random values, meaning newsequentialid() generates the next value in line, then at some point, when we shut down windows and restart, it would start generating from a lower range, but it still unique,

so it generates sequential values until the restart

so we are golden most of the time

but, this has a security concern,

you can guess the next value most of the time, later in this blog we will talk about this

for more information on it check out the docs

now let’s compare the fragmentation of both:

USE TESTDATABASE

create table newid_demo(
UserID uniqueidentifier not null default newid(),
purchase_amount int);
----
create clustered index ix_userid on newid_demo(userid)
with( fillfactor = 100, pad_index =on);

-----------------
create table newsequentialid_demo(
UserID uniqueidentifier not null default newsequentialid(),
purchase_amount int);
----
create clustered index ix_userid on newsequentialid_demo(userid)
with( fillfactor = 100, pad_index =on);

So we create the tables with indexes, now let’s insert with, then physical stats at the end:

----------------inserts 
with 
n1(c) as (select 0 union all select 0 ),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
n4(c) as (select 0 from n3 as t1 cross join n3 as t2),
n5(c) as (select 0 from n4 as t1 cross join n4 as t2),
ids(id) as (select row_number() over (order by (select null)) from n5)
insert into newid_demo(purchase_amount)
select id*20
from ids;
-----
with 
n1(c) as (select 0 union all select 0 ),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
n4(c) as (select 0 from n3 as t1 cross join n3 as t2),
n5(c) as (select 0 from n4 as t1 cross join n4 as t2),
ids(id) as (select row_number() over (order by (select null)) from n5)
insert into newsequentialid_demo(purchase_amount)
select id*20
from ids;

and physical:


-----physical
select 
index_level,
page_count,
avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats(db_id(),OBJECT_ID(N'dbo.newid_demo'),1,null,'Detailed')
---
select 
index_level,
page_count,
avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats(db_id(),OBJECT_ID(N'dbo.newsequentialid_demo'),1,null,'Detailed')

The output was like:

Now if we insert again with physical:

with 
n1(c) as (select 0 union all select 0 ),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
n4(c) as (select 0 from n3 as t1 cross join n3 as t2),
n5(c) as (select 0 from n4 as t1 cross join n4 as t2),
ids(id) as (select row_number() over (order by (select null)) from n5)
insert into newid_demo(purchase_amount)
select id*20
from ids;
-----
with 
n1(c) as (select 0 union all select 0 ),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
n4(c) as (select 0 from n3 as t1 cross join n3 as t2),
n5(c) as (select 0 from n4 as t1 cross join n4 as t2),
ids(id) as (select row_number() over (order by (select null)) from n5)
insert into newsequentialid_demo(purchase_amount)
select id*20
from ids;
-----physical
select 
index_level,
page_count,
avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats(db_id(),OBJECT_ID(N'dbo.newid_demo'),1,null,'Detailed')
---
select 
index_level,
page_count,
avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats(db_id(),OBJECT_ID(N'dbo.newsequentialid_demo'),1,null,'Detailed')

The output was like:

Now as you can see no fragmentation

Using Composite key as an alternative to guarantee uniqueness

now instead of creating just one key for the clustered, we can use the second key as identity or sequence, thus guaranteeing uniqueness(for more on composite keys check out our blog)

for example:


use testdatabase
go
create table unique_demo_composite(
id_1 int not null,
id_2 int identity not null
)
create clustered index ix_unique_demo_composite_ci_id_1_id_2 on 
dbo.unique_demo_composite(id_1,id_2);
go
insert into dbo.unique_demo_composite(id_1) values(1)
go 50

select*
from unique_demo_composite

We can see:

It is always unique

Security concern solution: creating a calculated column on newid() using checksum

now sometimes you have to use newid() as a security token

but indexing such column is pointless or at least costly and introduces a lot of blocking,

one way to get around it is to get around it is to create a calculated column using checksum

now this column will be:

fragmented

and non-unique

but it will be more compact

and the second issue could be solved by creating a surrogate key like identity(1,1)

which would be unique and asking for both in the where clause

For example:


use testdatabase
create table checksum_newid_demo(
id_surrogate int identity(1,1) not null,
id_newid uniqueidentifier not null 
constraint default_checksum_newid_demo_id_newid
default newid(),
id_newid_checksum as checksum(id_newid)
);
create clustered index ix_cheksum_newid_demo_id_surrogate_CI on 
dbo.checksum_newid_demo(id_surrogate);
create nonclustered index ix_id_newid_checksum_nci on 
dbo.checksum_newid_demo(id_newid_checksum);

Now let’s insert some values with physical afterward:

with 
n1(c) as (select 0 union all select 0 ),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
n4(c) as (select 0 from n3 as t1 cross join n3 as t2),
n5(c) as (select 0 from n4 as t1 cross join n4 as t2),
ids(id) as (select ROW_NUMBER() over(order by (select null)) from n5)
insert into dbo.checksum_newid_demo(col1)
select id 
from ids
---- physical 
select
index_level,
page_count,
avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats(db_id(),OBJECT_ID(N'dbo.checksum_newid_demo'),2,null,'Detailed')

The output was like:

Newly formed now if we insert again with physical:

--- let's insert again
with 
n1(c) as (select 0 union all select 0 ),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
n4(c) as (select 0 from n3 as t1 cross join n3 as t2),
n5(c) as (select 0 from n4 as t1 cross join n4 as t2),
ids(id) as (select ROW_NUMBER() over(order by (select null)) from n5)
insert into dbo.checksum_newid_demo(col1)
select id 
from ids
--- physical again
select
index_level,
page_count,
avg_fragmentation_in_percent
from sys.dm_db_index_physical_stats(db_id(),OBJECT_ID(N'dbo.checksum_newid_demo'),2,null,'Detailed')

The output was like:

Now this is way too fragmented but, it is way too smaller than the clustered index that would be on newid() and it is unique if we query like this:

select*
from dbo.checksum_newid_demo
where id_surrogate = 1
and id_newid_checksum =  -1833764562

and with that, we finish our small talk on clustered index design

Nonclustered index design index intersection(Dmitri Korotkevitch)

the main thing here, should I have multiple indexes that cover every query, or small single column indexes that covers most of my needs and gets slow sometimes

there is no specific rule we need to test

if the index’s selectivity is high, and a small number of rows will be returned, then a key-lookup would be sufficient, since we don’t need that many rows, going through the root and intermediate levels of the clustered would not be that expensive

but once the row number increases and the selectivity decreases, it would choose the scan, at that point, we would need a new covering nonclustered

now let’s try that:

we will create a table that has 4 different columns each with a different amount of selectivity

and create 4 separate nonclustered indexes on each

And watch what the optimizer would do when we try each as a predicate:

use testdatabase
create table demo_index_int(
id int not null,
col1 int not null,
col2 int not null,
col3 int not null,
col4 int not null
);
create unique clustered index ix_demo_index_int_id on demo_index_int(id);
with 
n1(c) as (select 0 union all select 0 ),
n2(c) as (select 0 from n1 as t1 cross join n1 as t2),
n3(c) as (select 0 from n2 as t1 cross join n2 as t2),
n4(c) as (select 0 from n3 as t1 cross join n3 as t2),
n5(c) as (select 0 from n4 as t1 cross join n4 as t2),
ids(id) as (select row_number() over (order by (select null)) from n5)
insert into dbo.demo_index_int(id,col1,col2,col3,col4)
select id,id%50,id%100,id%150,id%200
from ids;

now here it would generate the same values, as remainder every 50 in the first

and 100 in the second

and 150 in the third

and 200 at 4th

now if we select like:


select*
from dbo.demo_index_int

We can see:

for col1, and:

for col2 and the same goes for the other two,

so there are a lot of duplicates at the first and it decreases going up since there are more distinct values,

Now if we create nonclustered indexes like:

create nonclustered index ix_demo_index_int_col1 on dbo.demo_index_int(col1);
create nonclustered index ix_demo_index_int_col2 on dbo.demo_index_int(col2);
create nonclustered index ix_demo_index_int_col3 on dbo.demo_index_int(col3);
create nonclustered index ix_demo_index_int_col4 on dbo.demo_index_int(col4);

Now if we check the dbcc show statistics for each like:


dbcc show_statistics('dbo.demo_index_int',ix_demo_index_int_col1);
dbcc show_statistics('dbo.demo_index_int',ix_demo_index_int_col2);
dbcc show_statistics('dbo.demo_index_int',ix_demo_index_int_col3);
dbcc show_statistics('dbo.demo_index_int',ix_demo_index_int_col4);

now the only one that had the whole histogram 200 steps used is the last one

so the only one that was selective enough to use the whole thing is the last one

so from that, only we can decide that the last one is the most selective

plus from the eq_rows 328 is lower than the others

and all_density = 1/ n where n is the number of distinct rows, so the more selective is the index, the lower the value

so in the last case 1/200 since we have only 200 distinct values that keep on generating = 0.005

now they are all unique with id, since it is unique, ever-increasing, and follows the standards we talked about like tight, static, etc.

for more details on statistics check out our blog

now if we select using the 4 predicates, so we could join the 4 indexes:

SELECT id
FROM demo_index_int
WHERE col1 = 1
  AND col2 = 1
  AND col3 = 1
  AND col4 = 1

now how did it get this plan,

now the estimates for the index seek were accurate since it just looked for histogram steps

took the value

since we did not have many distinct values, but at the same time we could get an accurate estimate of the values

it used it

what about the others

(more on it here )

but the main thing is that with seeks it had to get 1311 rows from col1 and 656 from the col2

which is way more than 2 so instead it went and looked up the estimated 2 rows as key lookup

way less than that it just got 2 (110 in our case) after they matched the id of the merge rows

using the nested loop row by row

it went and looked if they matched the first and the second predicates col1= 1 and col2 = 1

now if we set statistics io on like:

go
set statistics io on
go
select id 
from demo_index_int
where col1 =1 and col2 =1 and col3 =1 and col4 = 1 
go
set statistics io off

We would get:

(110 rows affected)
Table 'demo_index_int'. Scan count 2, logical reads 226,

and time on :

go
set statistics time on
go
select id 
from demo_index_int
where col1 =1 and col2 =1 and col3 =1 and col4 = 1 
go
set statistics time off

We would get:

(1 row affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 46 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

Now if we create an index that includes all of these like:


create nonclustered index ix_demo_index_int_col4_inc_col2_col3_col1 
on demo_index_int(col4)
include(col1,col2,col3)

Now if we select:

We get one seek and with statistics time on:

go
set statistics time on
go
select id 
from demo_index_int
where col1 =1 and col2 =1 and col3 =1 and col4 = 1 
go
set statistics time off

The output was like:

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 42 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

and io on:

Table 'demo_index_int'. Scan count 1, logical reads 3,

So the number of reads is way too decreased, and we had some decrease in the CPU time, possibly due to capturing the execution plan now I am going to turn it off and capture time again:

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

And drop the new index and select again:

drop index  ix_demo_index_int_col4_inc_col2_col3_col1 
on demo_index_int

and the output was like:

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 2 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

so it decreased twice or more or less but the point is that it is less with the covering index, plus we have no reads compared to the original one

so it is way more efficient

the place where we join indexes is called index intersection

and we could use it sometimes

but would increase performance by a covering one

now this depends on the workload, since maintaining an index means update and storage overhead

on the other hand, if this query is critical and the waits can not be tolerated we should create one

and with that, we finish our small talk in this blog, catch you in the next one

One response to “Tuning Indexes(B): Clustered And Nonclustered Index Design Storage Engine Part 14”

Tuning Indexes(N): Schema Analysis(2)Part 26 Storage Engine - Suleyman Essa says:

March 29, 2025 at 5:12 pm

[…] as we introduced it before, NEWID() function introduces a lot of fragmentation if indexed, since there is no way to know what […]

NEWSEQUENTIALID(): Guaranteeing uniqueness across databases

Using Composite key as an alternative to guarantee uniqueness

Security concern solution: creating a calculated column on newid() using checksum

Nonclustered index design index intersection(Dmitri Korotkevitch)

One response to “Tuning Indexes(B): Clustered And Nonclustered Index Design Storage Engine Part 14”

Leave a Reply Cancel reply

Quick Links

Get In Touch