Storage Engine In SQL Server Part 2

in the previous blog, we said we were going to explore other data type and their dbcc page output

  • datetime
  • after that, we are going to explore the slot array
  • large object storage
  • the impact of select* on i/O
  • page header information

Row structure of datetime datatype

so like we did before

we are going to create a simple table

then dbcc ind

dbcc page

and look at the data

USE TestDatabase
CREATE TABLE datetime_test(
Date_col datetime)
insert into datetime_test(Date_col) values ('2023-01-01 12:34:56')
dbcc ind(
'TestDatabase',
'dbo.datetime_test',
-1)
dbcc traceon(3604)
dbcc page(
'TestDatabase',
1,688,3)

The row was like

   10000c00 4059cf00 7daf0000 010000

Slot 0 Column 1 Offset 0x4 Length 8 Length (physical) 8

Date_col = 2023-01-01 12:34:56.000  

status bits a = 10 = it has a null bitmap

status bits b = 00 = it is not ghosted

now we have two bytes for fixed data length size 0c00

then we have 8 bytes for our datetime

13588800 = seconds in decimal since it is the last precision in datettime

now if add that the first date in datetime = 1753-01-01 through 9999-12-31

it would end up in our year

after that ncol = 2 bytes = 0100 = 1 so we have one row

and one byte for the null bitmap = 00 meaning there are no nulls

Slot array

it represents the place where our data starts

for example in our previous row

it was 0x4 = 4 so count 4 bytes from the first row you will find your data

this is where the logical order is preserved since in the page data itself the rows are not sorted, they are just stored in the most convenient way the data engine would store them

but in the slot array, it stores them according to the index structure

no, if there is no index it stores them randomly(paul Randal)

but if there is then it is preserved

now as we saw in the preceding example our data was there indeed

PFS page free space

now between the slot array and the actual data we have our free space where there is nothing and the data is expected to be filled

like in the previous dbcc page output:

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

PAGE: (1:688)

BUFFER:

BUF @0x000001B62FCCE580

bpage = 0x000001AFF2884000          bPmmpage = 0x0000000000000000       bsort_r_nextbP = 0x0000000000000000
bsort_r_prevbP = 0x0000000000000000 bhash = 0x0000000000000000          bpageno = (1:688)
bpart = 4                           bstat = 0xb                         breferences = 0
berrcode = 0                        bUse1 = 35188                       bstat2 = 0x0
blog = 0x1cccc                      bsampleCount = 0                    bIoCount = 0
resPoolId = 0                       bcputicks = 0                       bReadMicroSec = 1764
bDirtyPendingCount = 0              bDirtyContext = 0x000001AFB1E9C6B0  bDbPageBroker = 0x0000000000000000
bdbid = 18                          bpru = 0x000001AFB0D18040           

PAGE HEADER:

Page @0x000001AFF2884000

m_pageId = (1:688)                  m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x8000
m_objId (AllocUnitId.idObj) = 241   m_indexId (AllocUnitId.idInd) = 256 Metadata: AllocUnitId = 72057594053722112
Metadata: PartitionId = 72057594046840832                                Metadata: IndexId = 0
Metadata: ObjectId = 1509580416     m_prevPage = (0:0)                  m_nextPage = (0:0)
pminlen = 12                        m_slotCnt = 1                       m_freeCnt = 8079
m_freeData = 111                    m_reservedCnt = 0                   m_lsn = (43:51080:26)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 0                      DB Frag ID = 1                      

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = NOT ALLOCATED          PFS (1:1) = 0x41 ALLOCATED  50_PCT_FULL
DIFF (1:6) = CHANGED                ML (1:7) = NOT MIN_LOGGED           

Slot 0 Offset 0x60 Length 15

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000159EFF6060

0000000000000000:   10000c00 4059cf00 7daf0000 010000             ....@YÏ.}¯.....

Slot 0 Column 1 Offset 0x4 Length 8 Length (physical) 8

Date_col = 2023-01-01 12:34:56.000  

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Completion time: 2025-02-13T20:30:59.9587188+03:00

we could put 1000 more rows in here and have the same amount in the slot array

Large Object Storage

most of the data we need could be stored in 8kb pages

but sometimes we need it to store bigger objects that can’t fit into the page itself

how do we do that? row_overflow or large object pages

Row-Overflow

let’s create a table that has rows that can’t fit on the page

use TestDatabase
CREATE TABLE row_overflow_test
(id int,
COL1 VARCHAR(8000),
COL2 VARCHAR(8000)
)
insert into row_overflow_test(id,col1,COL2) values (1,replicate('s',8000),REPLICATE('f',8000))
DBCC IND (
'TestDatabase',
'dbo.row_overflow_test',
-1)

The output would be like:

now as you can see it has two index allocation maps now

one for the row_overflow_map and one for the in-row data

now what happened here

we had data that could fit in the row and one that could not

we know that from the type

and the type 1 is for normal data pages

and 3 is for the row_overflow_data

now let’s explore the in_row data page

dbcc traceon(3604)
dbcc page ('TestDatabase',1,704,3)

Now the slot array was like:

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

id = 1                              

Slot 0 Column 2 Offset 0x11 Length 8000 Length (physical) 8000

COL1 = sssssssssssssssssssssssssssssss........

COL2 = [BLOB Inline Root] Slot 0 Column 3 Offset 0x1f51 Length 24 Length (physical) 24

so it stored the first two in here, but then it moved the third to the row_overflow_data page since we have only 8k

now if look up the rows in the output


0000000000000000:   30000800 01000000 03007802 00511f69 9f737373  0.........x..Q.iŸsss
0000000000000014:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
0000000000000028:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
 -- omitted since it is all 73 = s
0000000000001F40:   73737373 73737373 73737373 73737373 73020000  sssssssssssssssss...
0000000000001F54:   00010000 00be1800 00401f00 00b80200 00010000  .....¾...@...¸......
0000000000001F68:   00 

so this is varchar

so the first is status bits a = 30 =00110000 = so there is a null bitmap and there is a variable length column

status bits b = 00 = not ghosted

fsize = 2 bytes = 0800 = int size

after that we have our 01000000 = 4 bytes like expected

then we have our ncol = 2 bytes = 0300 about what we expected

then we have the null bitmap= 78 =01111000 = the first 3 rows are not null

then we have the varcount = 2 bytes =02 00 = 2 like we did create

then we have varoffset =4 bytes since we have two var count = 511f69 9f

then we have all 73s which 8000 to be exact and each one of them is an s

then we have 24 bytes totally new stuff

so as you can see in the graph:

020000  00010000 00be1800 00401f00 00b80200 00010000 00

the first byte shows us what is the type of the lob =02 = row_overflow

the level in the b tree = always 0 in blob = 0000

then 00 = unused byte

then 0000 00be = xsn

then 1800 0040 = dbcc checktable value

then the length of the object = 1f00 00b = 8000 thousand(through the hex editor) = decimal value

then we have in the last 2 byte 0000 = the first slot in the row_overflow_data page

then before that, we have 0100 = 1 = file number is 1 as indicated in dbcc ind

then before that, we have 4 bytes pointing toward page number

then we have b802 = swapped equals our page number with a lot of zeros

now if we dbcc page the row_overflow_page

dbcc page('TestDatabase',1,696,3)

We get


Blob row at: Page (1:696) Slot 0 Length: 8014 Type: 3 (DATA)

Blob Id:415105024

000000ACD0FF606E:  66666666  66666666  66666666  66666666 ffffffffffffffff

000000ACD0FF607E:  66666666  66666666  66666666  66666666 ffffffffffffffff

it shows us the length and just lists the data, not m

LOB storage

now we have data types that can’t fit into the 8k page like text, ntext, and image

now let’s create a table and examine

use TestDatabase
CREATE TABLE LOB_EXAMINE(
ID INT,
SomeText TEXT)
insert into LOB_EXAMINE(ID,SOMETEXT)VALUES(1,replicate(convert(varchar(max),'s'),20000))
dbcc ind
('TestDatabase',
 'LOB_EXAMINE',
 -1)
 dbcc traceon(3604)

the output was like

we have lob data as iam_chain_type and a separate IAM for the lob data

we have one parent of the b tree of the blob storage(more on that in an upcoming blog)

now let’s examine in_row_data first then move on to the lob

dbcc page('TestDatabase',1,720,3)

The output:


0000000000000000:   30000800 01000000 02007001 001f8000 0012a400  0.........p.......¤.
0000000000000014:   000000c8 02000001 000100                      ...È.......

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

ID = 1                              

SomeText = [Textpointer] Slot 0 Column 2 Offset 0xf Length 16 Length (physical) 16

TextTimeStamp = 2752643072          RowId = (1:712:1)                   

so if we look into status bits a 30 =00110000 there is a null bitmap and variable column which is text since there is text datatype

00 not ghosted = status bits b

0800= fixed data length

the next 4 bytes equals the id column value

ncol has two bytes = 0200 = 2 = 2 columns

null bitmap = 1 byte = 70 = the first binaries here are 0 meaning the first 3 columns are not null the others are for internal stuff

then we have the varcount = 01 = 1 columns is variable

then we have varoffset = 2 bytes 001f the same as 0xf = 15 after it

so the 16th byte and the rest 16 bytes point toward the text

now we know the file number which is 1

so 0001 = file number

the type of lob = 00

b tree level = now we have 3 b tree pages and the level is 0001 I guess so this could be the 1

we could have the unused byte 00 or not

we could or could not have it since it could be stored in each part of the object according to the docs

we could or could not have dbccchecktable information timestamp

we could or could not have the length which should be 4E20

we could or could not have the page number

we could or could not have the slot number

if we shave the 0001 and the other for the slot we get rid of the last 4 bytes

so we end up with 0012a400 000000c8 020000

we should have the type 1 byte it was 02 for the overflow

now let’s explore the other pages:

 dbcc page('TestDatabase',1,712,3)

When we explored this page:


Blob row at: Page (1:712) Slot 1 Length: 84 Type: 5 (LARGE_ROOT_YUKON)

Blob Id: 2752643072 Level: 0 MaxLinks: 5 CurLinks: 3

	Child 0 at Page (1:713) Slot 0 Size: 8040 Offset: 8040

	Child 1 at Page (1:714) Slot 0 Size: 8040 Offset: 16080

	Child 2 at Page (1:712) Slot 0 Size: 3920 Offset: 20000

so as you can see we have 3 child pages like a b tree each representing part of the data

and the rows extend until half of the third

there is no interesting output here

these data types are deprecated

SELECT* effects on I/o

Now we all know that we should not use this since it introduces unnecessary hits on the system while the resources that should have been used in another query, this is highly significant for LOB

let’s examine it:

CREATE TABLE sel_star_lob(
ID INT,
some_row text);
with S1(Col) as (select 0 union all select 0) -- 2 zeros 
, S2(col)as (select 0 from S1 as f1 cross join S1 as f2) -- more zeros 
,S3(col) as(select 0 from s2 as f1 cross join  s2 as f2)
,s4(col) as (select 0 from s3 as f1 cross join s3 as f2)
,s5(col) as (select 0 from s4 as f1 cross join s4 as f2)
,ID(ID) as (select row_number()over (order by (select null)) from s5)
insert into dbo.sel_star_lob(ID,some_row)
select id ,convert(nvarchar(max),replicate(convert(varchar(max),'s'),20000))
from id

Now let’s stats on and select*

set statistics io on;
go
select*
from sel_star_lob
go
select id 
from sel_star_lob
go
set statistics io off 

Now this was the output


Table 'sel_star_lob'. 
Scan count 1,
 logical reads 268, 
  lob logical reads 648241, 

  lob read-ahead reads 131072, 

Table 'sel_star_lob'. 
Scan count 1, 
logical reads 268,  
lob logical reads 0, 
lob read-ahead reads 0, 

now as you can see it had to read 3/4 of a million more pages

so we should be more specific about what we want

Exploring Page Header

now before we explored most aspects of the 8kb page except for the page header

so let’s do it (Paul Randal )

Use TestDatabase
CREATE TABLE PAGE_HEADER_DETAILS(
COL INT)
INSERT INTO PAGE_HEADER_DETAILS(COL)VALUES(25)
DBCC IND(
'TestDatabase',
'dbo.PAGE_HEADER_DETAILS',
-1)
DBCC TRACEON(3604);
DBCC PAGE(
'TestDatabase',
1,165616,3)

the output was like:


PAGE: (1:165616)

PAGE HEADER:

Page @0x0000020BE4E54000

m_pageId = (1:165616)               m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x8000
m_objId (AllocUnitId.idObj) = 255   m_indexId (AllocUnitId.idInd) = 256 Metadata: AllocUnitId = 72057594054639616
Metadata: PartitionId = 72057594047365120                                Metadata: IndexId = 0
Metadata: ObjectId = 1701581100     m_prevPage = (0:0)                  m_nextPage = (0:0)
pminlen = 8                         m_slotCnt = 1                       m_freeCnt = 8083
m_freeData = 107                    m_reservedCnt = 0                   m_lsn = (66:111336:27)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 0                      DB Frag ID = 1                      

so:

  • m_pageid=(file:page number in the file) as you can see our file is 1 and the number is our page’s number
  • m_headerversion= always 1
  • m_pagetype:
    • 1- data type = holds data
    • 2-index page =upper levels other than the leaf in the clustered and all levels in the nonclustered since it is not storing the actual data, just pointing to some more on it in upcoming blogs
    • 3-Text mix page = so we have and index or a heap that has TEXT lob as data, and it has other stuff with it
    • 4-Text tree page= only text from a single column
    • 5-sort page = worktable page = tempdb page = page that has intermediate results for a sort operator
    • 8-GAM: holds allocation information about extents in a GAM interval( 4gb intervals )this page tells us if this file part has been filled to the 4gbs or not, global allocation map, it is reserved for one database
    • 9-SGAM: shared GAM so it is shared between database
    • 10-IAM: It tells the gam what is going on in the interval
    • 11-PFS: page free space = which spaces are used, ghosted etc. its interval is 64 mb
    • 13- boot page: database information page
    • 15- file header page: holds information about the file
    • 16-diff map page: holds information about changes in extents since the last full or differential backup
    • 17- ML map page: bitmap for minimally logged pages like bulk insert
    • 18- page are going to be removed by dbcc checkdb
    • 19- alter index work pages
    • 20 - bulk page that is to be permanent

we will talk them about them in more detail in future blogs

  • m_typeflagbits: holds some information about the page, for example pfs page that has a 1 value means there is at least one ghost record in that page
  • m_level: is it a leaf or a higher-up page, 0 means leaf, N means at the N level
  • m_flagbits: some information, for example 0x200 means = we have a checksum on the page
  • m_objid
  • m_indexid = IAM id (Paul Randal)
  • m_prev and m_nextpage: points to the next page in an index more on it in indexes
  • pminlen: the size of the fixed data
  • m_slotcnt: count of records
  • m_freecnt: free bytes
  • m_freedata: the space from the start of the page to the first free space
  • m_resrevedcnt: how many rows has been deleted but is yet to be committed
  • m_xdesid: the internal transaction that added reservedcnt
  • m_ghostreccnt: how many ghosts
  • m_torebits: which rows where corrected by internal checks checksum or torn page

and here we are done, see you in the next blog

Leave a Reply

Your email address will not be published. Required fields are marked *