Here we introduced few catalog views so we could do some schema analysis, to know if our index strategy lacks any immediate threats.
We started with heaps
sys.tables:
Now, if we select* like:
use testing
go
select*
from sys.tables
We would get:

A lot of these columns are inherited from sys.objects:
- name
- object_id
- principal_id: who is the owner
- schema_id
- parent_object_id: 0 not a child object
- type: U means user table this
- type_desc: same here
- creation and modify dates
- is_ms_shipped: created by ms = Microsoft = is an internal component
- is_published: It is about replicas
- is_schema_published: same, but for the schemas
Those are the inherited ones
Now, what is the use of sys.objects?
Some examples:
select
OBJECT_SCHEMA_NAME(object_id,db_id()) + '.' + OBJECT_NAME(object_id) as object_name,
type_desc,
modify_date
from sys.objects
where modify_date >='2025-03-23 15:35:41.497'
order by modify_date

Here we have the objects modified since March 23rd, and what they are in this database
Now if we go back to sys.tables:

- lob_data_space_id: the id of the filegroup that contains the LOB data same goes for filestream
- max_columns_id_used: the id of the biggest columns
- lock_On_bulk_load: whether we used minimally logged operations or not
- ansi thing
- is_replicated
- has_replication_filter
- Is_merge_published something that relates to replication
- is_trans …… replication
- has_uncheked_assembly_data something that is so internal that it relates to actual assembly code
- text_in_row_limit now we don’t have limits, so no limits

- large_value_types_out_of_row overflow or lob
- is_tracked_by_cdc when you want to track extra stuff using cdc
- lock_escalation
- What type of lock escalation
- had filestream enabled
- Is it a memory table
- schema or schema and data, the later is 0
- same here
- Temporal tables are tables stored to track data, so here it tracks that information
- Is_remote_data_archive_enabled is this a stretch database

- External tables store metadata about tables outside SQL Server
- history stuff is for temporal tables
- is_node edge is for Graph SQL in SQL Server
- After that, until ledger table is about Azure SQL edge
- The ledger stuff is for a new feature in SQL Server 2022 for auditing
And with that, this ends
sys.indexes
If we select* like:
select*
from sys.indexes
We get:

- object_id: as you can see there are a lot of internal tables and their indexes for example the 3 is syscols etc.
- index_id: like we said before 1 is always clustered 0 is always heap and nonclustered increments from 2 to the end
- type: 0 heap, 1 clustered, 2 nonclustered, 3 XML ……
- type_desc: same
- is_unique: unique
- data_space_id: filegroup or partition schema
- is_primary_key
- is_unique_constraint has a unique constraint
- fill_factor internal fragmentation, check out our fragmentation blogs in this series
- is_padded internal fragmentation for intermediate index levels
- is_disabled when you put the index out of use, disable it
- is_hypothetical holds only column-level statistics and not actual data, DTA creates them so as we

- is_ignored_in_optimization is not considered in tree generation
- allow_row_locks and page: you can hold these types of locks on i
- has_filter: filtered
- compression_delay How long would a columnstore index hold its breath until compression
- suppress_dup_key_messages . Paul White, this option ignores duplicates inserted at a unique or primary key, the constraint does not allow duplicate messages would be ignored Paul white
- auto_created did use a tool to create it, or not like DTA
- optimize_for sequential_key: we talked about before in the ascending_key_issue that was addressed first by a trace flag, then by incorporating it to the whole engine
And with that, this is over
sys.columns
Now if we select* like:
select*
from sys.columns
where OBJECT_ID(N'dbo.index_operational_stats_copies') = object_id
and we got:

- object_id
- name
- column_id according to creation
- system_type_id and user_type_id datatype id for example 56 int
- max_length in bytes
- precision if numeric otherwise 0
- scale means how many digits after the decimal point
- collation_name if it is character based otherwise 0
- is_nullable
- ansi
- is_roguidcol
- is_identity like our identity(1,1)
- is_computed
- is_filestream
- is_replicated

- is_non_sql_subscribed replicated in another server that does not use SQL server
- is_merge_published
- is_dts(SSIS)_replicated
- is_xml_document
- xml_collection_id
- defualt_object_id when we use default in the column and that default constraint fills the space it references the column, unless you bind it then it references itself
- rule_object_id if you create your own way of data integrity this would be the object_id for it if you use it in a column
- is_sparse
- is_column_set a way to store data in sparse columns
- generated_always_type is the one generated by the engine not the user and the types differ by edition
- encryption stuff

- then graph SQL stuff
- is_data_deletion…. Azure Edge stuff
- after that all ledger stuff for auditing
And with that, this catalog view is done
sys.index_columns
if we select*
select*
from sys.index_columns
where OBJECT_ID(N'dbo.index_operational_stats_copies') = object_id
we get:

- object_id
- index_id
- index_column_id is index-specific while the column_id is table-specific
- key_ordinal is the first key or not, if not, which key is it
- partition_ordinal if you partition by multiple columns, this would tell at which key is it if it is at all
- is_descending_key now the default is ascending so when it is descending this lights up
- column_store_order_ordinal same here
and with that, we finish this one
sys.foreign_keys
go
use ContosoRetailDW
go
select*
from sys.foreign_keys
And we got:

so:
- name
- object_id
- principal_id = owner id
- schema_id
- parent_object_id: the table that has the foreign key declared in it, the child table itself since the constraint is an object
- Type F means foreign
- type_desc
- create and modify dates
- is_ms_shipped
- is_published
- is_schema_published

- referenced_object_id, where the key is the primary key
- key_index_id
- is_disabled for the foreign key, not the index
- is_not_for_replication
- is_not_trusted
- delete_referential_action_desc: how should we perform the delete 4 options we will talk about them in another blog
- same for the update
- is_system_name
sys.foreign_key_columns
select *
from sys.foreign_key_columns
We get:

so object_id for the constraints tables and their columns
nothing much
obviously we can’t just read it like this, but if we join it with the sys.foreign_keys like they did in the docs, we would get:

Now that we have introduced them, we are going to use them in schema analysis:
Schema Analysis
Using the above-mentioned catalog views, we can identify multiple issues like:
- Heaps
- Uniqueidentifier
- Wide and non-unique indexes
- Untrusted foreign keys
- nonindexed foreign keys
- redundant indexes
- high identity values
Heaps
According to this Microsoft whitepaper
We could use heaps in:
- very small table
- RID is smaller than the clustered index key More on that in our previous blogs in this series on heaps and clustered indexes, but basically it is a physical locator if we don’t have a physical design in mind( clustered index) that the engine creates for us so we could reach it, sometimes it is smaller the clustered index key in mind
- Inserts are not a good idea for heaps since clustered beat it in the above URL
- Same with updates and deletes
- selects as well
So, most of the time, you should not use heaps, and they are a problem
So, how could we identify them using the explained catalog views:
go
use testing
go
select
t.object_id as object_id,
s.name + '.' + t.name as table_name,
tp.sum_rows
from sys.tables t inner join sys.schemas s
on t.schema_id = s.schema_id
cross apply
(select sum(p.rows) as sum_rows
from sys.tables t inner join sys.partitions p
on t.object_id = p.object_id
and p.index_id = 0
) tp
where t.is_memory_optimized = 0
and t.is_ms_shipped = 0
and
exists
(
select*
from sys.indexes i
where t.object_id = i.object_id
and i.index_id = 0
)
order by tp.sum_rows desc
So here we wanted the heaps, so in the exists clause we asked for indexes that have the id of 0
Since we want the table name to do any analysis, we asked for sys.tables , sys.schemas
And the total_rows matters to know if the modifications are worth the effort
Other that we don’t have much here
And we got:

As you can see here, both small tables are so we could just ignore them
But let’s assume we are about to do something about them
So, what we could do is use the methodology described in our PerfMon counters blogs
Specifically, the blog on fullscans, and there we identifiedforwarded records per second
There, we used the operational, usage, and physical stats to identify how much of an effect these heaps have on the system
And we observed the effects of forwarding records in this blog
But basically, forwarding records introduces extra i/o since you can’t use the space at the moment
But you have a forwarding stub, a pointer, and a forwarded record, each has extra space and the record has its own page
So a lot of overhead for nothing
And internal fragmentation, since a heap can never reclaim the space which we talked about in this blog
Alternatively, we can use the cache, like we initially do:
go
select
t.object_id,
s.name + '.' + t.name as table_name ,
sum(ps.record_count) as heap_records_sum,
sum(ps.forwarded_record_count) as forwarded_record_sum,
sum(ps.avg_page_space_used_in_percent* ps.page_count)/nullif(sum(ps.page_count),0) as internal_fragmentation_in_percent
from sys.tables t inner join sys.schemas s on s.schema_id = t.schema_id
cross apply sys.dm_db_index_physical_stats(db_id(),t.object_id,0,null,'detailed') ps
where
t.is_memory_optimized = 0
and t.is_ms_shipped = 0
and
exists (
select*
from sys.indexes i where t.object_id = i.object_id and i.index_id = 0
)
group by t.object_id,s.name,t.name
order by forwarded_record_sum desc ;
Now what we did here is fairly simple, so we wanted to get heap,s so we looked for indexes that have the index_id = 0 in the exists clause
And we wanted the table and schema name, so if we wanted to generate some scripts, we know exactly where to strike
Now we are looking to monitor forwarding pointers, so we cross-applied to the index_physical_stats
And we grouped them by their names
Order them by forwarded records, since that is what we are monitoring
Other than that, we don’t have much in this query
And we got this:

So here we can see the high percentages of forwarded in both. If we knew how many forwarded records have been requested by the blogs mentioned, we could know if we want to do the modification
or we can just go ahead and do it if the structure is good for an index
and with that, we finish our intro blog to schema analysis, see you in the next one.