Heaps Storage Engine Part 4

today we are going to introduce

  • Heap Table
  • sys.dm_db_index_physical_stats
  • Forwarding pointer
  • Forwarded stub
  • Forwarded record
  • updates in heaps
  • deletes in heaps

and how does SQL server organizes data

Heap table

Table that does not have any sort order

when we create a table, without any primary keys or indexes = heap table

there is no guarantee that you would get the same order between different selects

and when inserting SQL server does not even look at the pages

it just asks PFS(page free space, check out the previous blog for more details) page, who is free

according to the bits provided above and hangs on the safe side so if it is 001 so 1-50 it assumes that there are 50 percent full pages

so it does not look into the actual data while inserting, just show me where I can go, and boom!

each page has 8,060 bytes

let’s do an example:

sys.dm_db_index_physical_stats

CREATE TABLE heap_insertion(
    id INT,
    col1 VARCHAR(8000)
);

INSERT INTO heap_insertion(id, col1) VALUES(1, REPLICATE('s', 4023));
--- new dmf
SELECT *
FROM sys.dm_db_index_physical_stats(
    DB_ID(),
    OBJECT_ID(N'dbo.heap_insertion'),
    NULL,
    NULL,
    'DETAILED'
);

now before diving into the example let’s walk through this DMF

so:

So db_id, object_id, and index_id are self-explanatory now since we don’t have an index it is 0

partition number = 1, SQL server treats every table as a partitioned one internally even nonpartitioned so this one has 1 partition

it is an in_row_data allocation unit

index_depth = would show us how many levels we have from the root to the leaf here it is just one since we only have regular data pages

avg_fragmentation_in_percent = how many page splits We will dive into this in a future blog

page_count= how many pages for this table do we have

avg_page_space_used_in_percent = how much of the page is used using the PFS page data that we explored in the previous blog now here it is almost 50 percent since we have the int 4 and variable + 2= almost 50

record_count = how many rows

ghost_record_count= how many records are ghosted, meant to be deleted but yet to be

version_ghost_record_count= in snapshot isolation and RCSI

then we have the size

forwarded_record_count= more on later in this blog

compressed_page_count= is page compression applied and how many pages do have it

hobt_id = the id of the heap or the b-tree

now that we are done with the DMF as you can see 50 percent is full

so SQL server opts on the safe side and picks the highest number between 1- 50 which is 50

so if we insert a row that has 4031 bytes like:

insert into heap_insertion(id,Col1)values(2,replicate('s',4033))

And then sys.dm_db_index_physical_stats:

SELECT page_count,avg_record_size_in_bytes,avg_page_space_used_in_percent
FROM sys.dm_db_index_physical_stats(
    DB_ID(),
    OBJECT_ID(N'dbo.heap_insertion'),
    NULL,
    NULL,
    'DETAILED'
);

The output would be like:

So as you can see here it matched the 50 percent and since we just filled it up to 50 we have no problem it is an accurate estimate

But what if we had just 2 or 20 percent full like:

create table heap_insertion_2(
id int,
col1 varchar(8000)
)
insert into heap_insetion_2(id,col1)values(1,replicate('s',200))
--then dmf
select page_count,avg_record_size_in_bytes,avg_page_space_used_in_percent
FROM sys.dm_db_index_physical_stats(
    DB_ID(),
    OBJECT_ID(N'dbo.heap_insertion_2'),
    NULL,
    NULL,
    'DETAILED'
);

The output would be like:

So it is 2 percent, so it obviously can accommodate 4033 bytes row but:

insert into heap_insertion_2(id,col1) values(2,replicate('s',4033))
select page_count,avg_record_size_in_bytes,avg_page_space_used_in_percent
sys.dm_db_index_physical_stats(db_id(),
object_id(N'dbo.heap_insertion_2'),
null,
null,
'detailed')

so as you can see it just opted on the safe side and assumed that 50 percent is full even though nothing is full

now what updates on heaps if we had a row that was small but then got bigger,

what would SQL server do in that case =

Forwarded pointer

let’s explore it:

create table forwarded_pointer(
id int,
col1 varchar(8000));
INSERT INTO forwarded_pointer(id, col1) VALUES
(1, REPLICATE('s', 1500)),
(2, REPLICATE('s', 1500)),
(3, REPLICATE('s', 1500)),
(4, REPLICATE('s', 1500));

select page_count,avg_record_size_in_bytes,avg_page_space_used_in_percent
from
sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.forwarded_pointer'),null,null,'detailed')

the output was like

if we dbcc ind and page the following:


dbcc ind('TestDatabase','dbo.forwarded_pointer',-1)
dbcc traceon (3604)
dbcc page('TestDatabase',1,165728--INSERT YOUR OWN VALUE FOR MORE DETAILS CHECK OUT THE PREVIOUS BLOG
,3)

The first row was like:

0000000000000000:   30000800 01000000 02000001 00eb0573 73737373  0............ë.sssss
0000000000000014:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

id = 1                              

Slot 0 Column 2 Offset 0xf Length 1500 Length (physical) 1500

status bits a = 30 = 00110000 = there is a null bitmap and a variable column in this row

status bits b = 00 = not ghosted

fsize = 0800

fdata= 01000000

ncol = 0200 = 2 columns like we created

nullbits = 00= zeros for our values indicating that there are no nulls

varcount = 01 00 = how many variable columns are in this row

4 bytes for varoffset

then we have our 73*1500 = s *1500

perfect

now if we update the same row like:

update dbo.forwarded_pointer
set col1 = replicate('s',4000)
where id = 1 

Then the DMF:

select page_count,avg_record_size_in_bytes,avg_page_space_used_in_percent
from
sys.dm_db_index_physical_stats(db_id(),object_id(N'dbo.forwarded_pointer'),null,null,'detailed')

as you can see since the row could not be accommodated on the same page SQL server created a new page for it and just put it there, but at the same time it kept a pointer toward the new page

we call that the forwarded pointer

now how could we see that

if we dbcc ind again,

as you can see, 165729 is the new page but if we dbcc page the old one like:

dbcc traceon (3604)
dbcc page('TestDatabase',1,165728--INSERT YOUR OWN VALUE FOR MORE DETAILS CHECK OUT THE PREVIOUS BLOG
,3)

The output of the same row would be like


Slot 0 Offset 0x60 Length 9

Record Type = FORWARDING_STUB       Record Attributes =                 Record Size = 9

Memory Dump @0x000000ACDCC76060

0000000000000000:   04618702 00010000 00                          .a‡......
Forwarding to  =  file 1 page 165729 slot 0                              

boom now we have a forwarded pointer

the record itself on the page is called forwarding_stub since the row itself is not here, it is in another page and here we have just the stub as a pointer to the moved row

now if we try to decipher the bytes:

so 04 = means it is a forwarded pointer

page number = 61870200 =165729 (hex editor)

slot 0 file 1 = 01000 0000 = 0000 0001 = slot 0 file 1

Now let’s check out the new page:

dbcc page('TestDatabase',1,165729,3)

The output was like


Slot 0 Offset 0x60 Length 4027

Record Type = FORWARDED_RECORD      Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 4027                  
Memory Dump @0x000000ACDCC76060

0000000000000000:   32000800 01000000 02000002 00b10fbb 8f737373  2............±.».sss
0000000000000014:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
0000000000000FB4:   87020001 000000                               ‡......

Forwarded from  =  file 1 page 165728 slot 0                             

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

id = 1                              

Slot 0 Column 2 Offset 0x11 Length 4000 Length (physical) 4000

as you can see we have a forwarded record

now let’s breakdown the thing

now until we reach the 73 part it is the same as before the update

except for the status bits now include the fact that this is a forwarded record

but then at the end, I can’t decipher it i don’t have a guide

now this arrangement can help us change the place of the rows without making drastic changes to the non-clustered index

if we move it

it would point only to the new location, not the old one, so no forward pointer pointing to another forward pointer

and if the row gets small again it will be returned to the original page

Deleting rows in heaps

SQL server does not reclaim the space from deleted heaps immediately, it does not happen until the rows need a contagious page to add a row

to see that let’s do the following:

use testdatabase
create table heap_deletes(
id int ,
col1 char(4))
insert into heap_deletes(id,col1)values(1,'ss'),(2,'kk'),(3,'aa')

dbcc ind('testdatabase','dbo.heap_deletes',-1)
dbcc traceon(3604)
dbcc page('testdatabase',1,165736,3)

The output was like:

Slot 0 Offset 0x60 Length 15

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000ACCB3F6060

0000000000000000:   10000c00 01000000 73732020 020000             ........ss  ...

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

id = 1                              

Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4

col1 = ss                           

Slot 1 Offset 0x6f Length 15

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000ACCB3F606F

0000000000000000:   10000c00 02000000 6b6b2020 020000             ........kk  ...

Slot 1 Column 1 Offset 0x4 Length 4 Length (physical) 4

id = 2                              

Slot 1 Column 2 Offset 0x8 Length 4 Length (physical) 4

col1 = kk                           

Slot 2 Offset 0x7e Length 15

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000ACCB3F607E

0000000000000000:   10000c00 03000000 61612020 020000             ........aa  ...

Slot 2 Column 1 Offset 0x4 Length 4 Length (physical) 4

id = 3                              

Slot 2 Column 2 Offset 0x8 Length 4 Length (physical) 4

col1 = aa                           

Now as you can see we have 3 rows with 2 columns now if we delete the middle one like:

delete from heap_deletes
where id = 2

And dbcc page like:

dbcc page('testdatabase',1,165736,3)

The output would be like:

Slot 0 Offset 0x60 Length 15

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000ACCB3F6060

0000000000000000:   10000c00 01000000 73732020 020000             ........ss  ...

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

id = 1                              

Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4

col1 = ss                           

Slot 2 Offset 0x7e Length 15

Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 15

Memory Dump @0x000000ACCB3F607E

0000000000000000:   10000c00 03000000 61612020 020000             ........aa  ...

Slot 2 Column 1 Offset 0x4 Length 4 Length (physical) 4

id = 3                              

Slot 2 Column 2 Offset 0x8 Length 4 Length (physical) 4

col1 = aa                           

Now the row is not here not even in the offset array in my test now if we dump style 1 like:

dbcc page ('testdatabase',1,165736,1)
OFFSET TABLE:

Row - Offset                        
2 (0x2) - 126 (0x7e)                
1 (0x1) - 0 (0x0)                   
0 (0x0) - 96 (0x60)                 

as you can see slot 1 is there but has no values meaning the row is not there anymore

now if we change the dump style to 2 :

dbcc page ('testdatabase',1,165736,2)

The output:


000000ACDE876000:   01010000 04000001 00000000 00000c00 00000000  ....................
000000ACDE876014:   00000000 16010000 a01f6000 68870200 01000000  ........ .`.h‡......
000000ACDE876028:   43000000 381f0000 25000000 00000000 00000000  C...8...%...........
000000ACDE87603C:   00000000 00000000 00000000 00000000 00000000  ....................
000000ACDE876050:   00000000 00000000 00000000 00000000 00000c00  ....................
000000ACDE876064:   01000000 73732020 02000010 000c0002 0000006b  ....ss  ...........k
000000ACDE876078:   6b202002 00001000 0c000300 00006161 20200200  k  ...........aa  ..
000000ACDE87608C:   00000073 73737373 73737373 73737373 73737373  ...sssssssssssssssss
000000ACDE8760A0:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
000000ACDE8760B4:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
000000ACDE8760C8:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
000000ACDE8760DC:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
000000ACDE8760F0:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
000000ACDE876104:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
000000ACDE876118:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
000000ACDE87612C:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
000000ACDE876140:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
000000ACDE876154:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
000000ACDE876168:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
000000ACDE87617C:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss
000000ACDE876190:   73737373 73737373 73737373 73737373 73737373  ssssssssssssssssssss

so even if we did not take a look at the offset we can see that these, a lot of ghosted records from other tables that we did not delete and we can see our kk row right there

so just because we deleted a row and it is not showing up in dbcc page, it does not make it deleted

we might not even reclaim the space even if we dropped the table

so how could we reclaim the space

alter table rebuild

so if I do it like:

ALTER TABLE heap_deletes REBUILD;

and we dbcc page the same page

in my case, it was still the same we could not reclaim the space

now if we create a clustered on the id like:

create clustered index h_d_cl on heap_deletes(id)

Then dbcc ind like:

dbcc ind('testdatabase','dbo.heap_deletes',-1)

So it is the same page, it still has the ghosted rows however if index rebuild


alter index h_d_cl on heap_deletes rebuild

then dbcc ind shows that it did create a new page, and the old page does not have the s anymore

now heaps could have updates in place when the row could be updated on the same page

or it could fit without a forwarded pointer

with that, you don’t need to update the non-clustered index

or it needs a forwarded pointer, and still, the data in the index does not need to be moved

since the id in our example fits there but the bigger column data needs to be in another place

so the id = id but the col1 was not so it points to the id with no problem and forwards us to the page that contains the data we need so the index does not need to be updated

unless it needs to move the whole thing to another page

then it might need to update the index

and with that, we finished part 4 of the storage engine series, see you in the next blog

Leave a Reply

Your email address will not be published. Required fields are marked *