Clustered Index Storage Engine Part 5

Hi everyone, today we are going to explain:

  • the difference between a heap and a clustered index
  • b-trees
  • root level
  • intermediate level
  • double-linked pages
  • and leaf-level pages in a clustered index
  • dbcc page of all of them
  • and the difference in select* performance(basic)

Intro

in the previous blog, we talked about heaps, and we said that the data has no order, and is inserted as easy as it could be.

SQL Server can not guarantee any order

but what if we wanted order in our data, what if we wanted to reach a row more quickly, and modify it more easily

what if we wanted a single row, or a range of rows, how could we do that?

Clustered indexes would physically reorder the pages for us

now we had regular data pages for our heaps, or type 1

here in the leaf level( more on it later) you would have the same type but in the root or intermediate levels(more on them later) we would have index pages or type 2

now, we said that it would physically re-write the whole thing, so instead of having the rows put randomly in our pages we would maintain their order in our pages, this would improve performance significantly since we know where to look

for example, if we had the following names:

adrian, amer, asher, axel , mehmet, khalid, george, and melissa

now in the clustered index if we stored it based on the first name we would have it in its alphabetical order, however in a heap there is no order so we would have something like the following

so we get the order, and if we wanted the row we would get it immediately

so let’s create a table with some rows id and col1:

create table dbo.heapVSclustered
(id int not null,
col1 char(200) null
);
with N1(C) AS(select 0 union all select 0)
,N2(C) AS (SELECT 0 FROM N1 AS T1 CROSS JOIN N1 AS T2)
,N3(C) AS (SELECT 0 FROM N2 AS T1 CROSS JOIN N2 AS T2)
,N4(C) AS (SELECT 0 FROM N3 AS T1 CROSS JOIN N3 AS T2)
,N5(C) AS (SELECT 0 FROM N4 AS T1 CROSS JOIN N4 AS T2)
,ids(id) as (select row_number() over(order by(select null)) from n5)
INSERT INTO dbo.heapVSclustered(id,col1)
select id,'sssss' from ids

Now if we select id between 100 and 5000 like:

select*
from dbo.heapVsclustered 
where id >100 
and id < 5000

Now if we do the same but with a clustered index like:


create  clustered index  clus on  dbo.heapVSclustered(id)

and do the same select

as you can see the cost has been cut down a lot and it only seeked the needed rows, it did not scan the whole table so it is way more efficient to select with a clustered

plus we got the rows ordered as shown at the end of the properties

in the table scan, it just went to the table read all the rows randomly, then put the ones we need

in the index seek, it just seeked the rows we need by scanning the range( in the execution plan itself)(more on in an upcoming blog)

and gave it back only the necessary rows and needed page, and how did it do that

Structure of the clustered index

now, instead of storing the rows randomly, we could store them based on one column or more, and with that structure, we could reach more quickly,

now how does it do that?

first, it creates another copy of the data

then it sorts them based on the value provided in the syntax, in our example, it was the id

then it links each page to the other one using double links so each page could point to the next and previous one

so we would have a doubly linked list, this list of pages, data pages, regular old-fashioned data pages, but sorted ones, is called the leaf-level pages

now how does SQL server go to the leaf we want, i mean Here we just ordered them?

when we start having multiple pages like here

we can check with dbcc ind like:

dbcc ind('testdatabase','dbo.heapvsclustered',-1)

in our case, it has 1729

but we have a new page type = 2

which is the index page type

and its index level is = 2 or 1 unlike 0 which is the data level = leaf level pages

what does this mean?

it means we have pages other than the data pages

what do we call these two levels? root and intermediate levels

and what do we have there?

in intermediate levels, it stores one row per leaf-level page

it stores two pieces of information the physical address and the lowest key or row value for the clustered index

the only exception is the first row in the first page

there it stores null instead of the actual value

with that, it does not need to update the non-leaf when we insert in there since it does not need to be updated

these pages are double-linked too

and it keeps on adding intermediate levels until one page fits the very top level

and that page is called the root page

root level page

now if we dbcc page the root level like:(in our case)

-- dbcc page the root 
dbcc traceon(3604)
dbcc page('testdatabase',1,167514,3)

we would get these

and


PAGE: (1:167514)

PAGE HEADER:

Page @0x0000020B647A8000

m_pageId = (1:167514)               m_headerVersion = 1                 m_type = 2
m_typeFlagBits = 0x0                m_level = 2                         m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 284   m_indexId (AllocUnitId.idInd) = 256 Metadata: AllocUnitId = 72057594056540160
Metadata: PartitionId = 72057594049069056                                Metadata: IndexId = 1
Metadata: ObjectId = 2069582411     m_prevPage = (0:0)                  m_nextPage = (0:0)
pminlen = 11                        m_slotCnt = 3                       m_freeCnt = 8057
m_freeData = 129                    m_reservedCnt = 0                   m_lsn = (67:81320:8)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = -2109005478            DB Frag ID = 1                      

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = NOT ALLOCATED          PFS (1:161760) = 0x40 ALLOCATED   0_PCT_FULL
DIFF (1:6) = CHANGED                ML (1:7) = NOT MIN_LOGGED           

Slot 0 Offset 0x60 Length 11

Record Type = INDEX_RECORD          Record Attributes =                 Record Size = 11

Memory Dump @0x000000ACCB7F6060

0000000000000000:   06010000 00588e02 000100                      .....XŽ....

Slot 1 Offset 0x6b Length 11

Record Type = INDEX_RECORD          Record Attributes =                 Record Size = 11

Memory Dump @0x000000ACCB7F606B

0000000000000000:   062f5c00 00598e02 000100                      ./\\..YŽ....

Slot 2 Offset 0x76 Length 11

Record Type = INDEX_RECORD          Record Attributes =                 Record Size = 11

Memory Dump @0x000000ACCB7F6076

0000000000000000:   065db800 005b8e02 000100                      .]¸..[Ž....

so it shows what this root points to in terms of intermediate-level pages, so we have 3 pages like we saw before in dbcc ind,

it shows us the range that this intermediate-level page covers

the last 4 bytes shows us the file id = 000100

the middle 4 bytes shows us the page id

slot 2 = 005b8e02 swapped 00028E5B= 167515

slot 1 = 00598e02 swapped 00028E59 = 167513

slot 0= 00588e02 swapped 00028E58 = 167512

and the first stores the lowest value in the range

for slot 2 = 065db800 swapped 0000B85D now I don’t know what 65 means but the others mean = 47197

for slot 1 = 062f5c00 swapped 5C2F and I don’t know what 006 means but this means = 23599

for slot 0 = 06010000 swapped equals null in the hex editors

so we know the following structure

and the page is like this

intermediate pages

now if we dbcc page one of all of them like:

--
dbcc traceon(3604)
dbcc page('testdatabase',1,167515,3)
--
dbcc traceon(3604)
dbcc page('testdatabase',1,167513,3)
--
dbcc traceon(3604)
dbcc page('testdatabase',1,167512,3)

We will get the following:

so each parent shows its children from data pages and the minimum value for each except for the first one

it is null

Now for the dbcc output:

PAGE HEADER:

Page @0x0000020B63D88000

m_pageId = (1:167512)               m_headerVersion = 1                 m_type = 2
m_typeFlagBits = 0x0                m_level = 1                         m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 284   m_indexId (AllocUnitId.idInd) = 256 Metadata: AllocUnitId = 72057594056540160
Metadata: PartitionId = 72057594049069056                                Metadata: IndexId = 1
Metadata: ObjectId = 2069582411     m_prevPage = (0:0)                  m_nextPage = (1:167513)
pminlen = 11                        m_slotCnt = 621                     m_freeCnt = 23
m_freeData = 6927                   m_reservedCnt = 0                   m_lsn = (67:62360:9)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 142756749              DB Frag ID = 1                      

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = NOT ALLOCATED          PFS (1:161760) = 0x40 ALLOCATED   0_PCT_FULL
DIFF (1:6) = CHANGED                ML (1:7) = NOT MIN_LOGGED           

Slot 0 Offset 0x60 Length 11

Record Type = INDEX_RECORD          Record Attributes =                 Record Size = 11

Memory Dump @0x000000ACD9676060

0000000000000000:   06010000 00388e02 000100                      .....8Ž....

Slot 1 Offset 0x6b Length 11

Record Type = INDEX_RECORD          Record Attributes =                 Record Size = 11

Memory Dump @0x000000ACD967606B

0000000000000000:   06270000 00398e02 000100                      .'...9Ž....
Slot 2 Offset 0x76 Length 11

Record Type = INDEX_RECORD          Record Attributes =                 Record Size = 11

Memory Dump @0x000000ACD9676076

0000000000000000:   064d0000 003a8e02 000100                      .M...:Ž....

Slot 3 Offset 0x81 Length 11

Record Type = INDEX_RECORD          Record Attributes =                 Record Size = 11

Memory Dump @0x000000ACD9676081

0000000000000000:   06730000 003b8e02 000100                      .s...;Ž....

Slot 4 Offset 0x8c Length 11

Record Type = INDEX_RECORD          Record Attributes =                 Record Size = 11

Memory Dump @0x000000ACD967608C

0000000000000000:   06990000 003c8e02 000100                      .....<Ž....

Slot 5 Offset 0x97 Length 11

Record Type = INDEX_RECORD          Record Attributes =                 Record Size = 11

Memory Dump @0x000000ACD9676097

0000000000000000:   06bf0000 003d8e02 000100                      .¿...=Ž....

Slot 620 Offset 0x1b04 Length 11

Record Type = INDEX_RECORD          Record Attributes =                 Record Size = 11

Memory Dump @0x000000ACD9677B04

0000000000000000:   06095c00 00c49002 000100                      .	\\..Ä.....

we can see the same here file at the end page number in the middle and the lowest key at the first and the page id in the middle

all the slots have the last 3 bytes set at one since they are all in the same file

the lowest key for:

  • slot 0 = 06010000 = null like before
  • slot 1 = 06270000 = 0027 with this 6 byte = 39
  • slot 2= 064d0000 =004D = 77
  • slot 3 = 06730000 = 0073 = 115
  • slot 4 = 06990000 = 0099 = 153
  • slot 5 = 06bf0000 = 00bf = 191
  • slot 620 = 06095c00 = 5C09 = 23561

now if we check the other output for this page

you can see that it matches it exactly

and we would have the same for the page numbers in the middle except for the first it would be null

and we could find the next page and the previous page information in the page header

and dump style 1 does not have the double-linked bytes

and dump style 2 here too hard to decipher

so we feel that finding the double-linked list in text is enough for us here

Now we can develop this:

Now these are the index pages in the clustered, in a non-clustered index the whole thing would be index pages, but since the clustered is the physical data itself, the leaf level would be the actual stuff =data pages, but the other levels would be index pages= type 2

Leaf pages

now let’s explore the dbcc page output for our first data page:

use TestDatabase
dbcc traceon(3604)
dbcc page('testdatabase',1,167480,3)
Slot 0 Offset 0x60 Length 211

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 211

Memory Dump @0x000000ACCD5F6060

0000000000000000:   1000d000 01000000 73737373 73202020 20202020  ..Ð.....sssss       
0000000000000014:   20202020 20202020 20202020 20202020 20202020                      
0000000000000028:   20202020 20202020 20202020 20202020 20202020                      
000000000000003C:   20202020 20202020 20202020 20202020 20202020                      
0000000000000050:   20202020 20202020 20202020 20202020 20202020                      
0000000000000064:   20202020 20202020 20202020 20202020 20202020                      
0000000000000078:   20202020 20202020 20202020 20202020 20202020                      
000000000000008C:   20202020 20202020 20202020 20202020 20202020                      
00000000000000A0:   20202020 20202020 20202020 20202020 20202020                      
00000000000000B4:   20202020 20202020 20202020 20202020 20202020                      
00000000000000C8:   20202020 20202020 030000                              ...

Slot 0 Column 0 Offset 0x0 Length 4 Length (physical) 0

UNIQUIFIER = 0                      

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

id = 1                              

Slot 0 Column 2 Offset 0x8 Length 200 Length (physical) 200

col1 = sssss                                                                                                                                                                               
                                    

Slot 0 Offset 0x0 Length 0 Length (physical) 0

KeyHashValue = (de42f79bc795)       
Slot 1 Offset 0x133 Length 211

key hash value helps in logging and holding the row in memory, sometimes it could get duplicated in extremely large tables(Paul Randal)

we obviously did not create column 2 but since we had fixed data length and we did not fill it all just 5 bytes sql server created a placeholder for it and filled it for us, now if we modify the col1 in id 1 like:

update dbo.heapvsclustered
set col1 = replicate('s',200)
where id = 1 
--- dbcc page again 
use TestDatabase
dbcc traceon(3604)
dbcc page('testdatabase',1,167480,3)

The output would be like:

Slot 0 Offset 0x60 Length 211

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 211

Memory Dump @0x000000ACCEDF6060

0000000000000000:   1000d000 01000000 73737373 73737373 73737373  ..Ð.....ssssssssssss
0000000000000014:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
0000000000000028:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
000000000000003C:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
0000000000000050:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
0000000000000064:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
0000000000000078:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
000000000000008C:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
00000000000000A0:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
00000000000000B4:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
00000000000000C8:   73737373 73737373 030000                      ssssssss...

Slot 0 Column 0 Offset 0x0 Length 4 Length (physical) 0

UNIQUIFIER = 0                      

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

id = 1                              

Slot 0 Column 2 Offset 0x8 Length 200 Length (physical) 200

so it is ghosted now, not there, and we only have 2 columns id and col1

now it is the regular row:(the first output would be explained):

status bits a = 10 = 00010000= it has a null bitmap

status bits b = 00 = ghosted

fsize = d000 =208 = out data row size

fdata first column= 01000000= 1 = id

fdata second column =73737373=col1

then we have the placeholder with nothing in it

ncol = 0300 = 3 columns one for the placeholder and 2 for ours it was still 3 in our second output

then we have the nullbits = 00 = we don’t have any nulls in both outputs

now we can construct this

And with this, we introduced the difference between a heap and a clustered index, and how each is designed, see in the next blog

Leave a Reply

Your email address will not be published. Required fields are marked *