Hi everyone, Today we are going to talk about the way SQL server executes a query
- How it understands the request.
- How does it recognize syntax mistakes?
- how it generates the logical operators of a query, meaning how it transfers the SQL syntax to relational terms
- how it evaluates the plan for optimization and the optimization process
- how it generates the physical operators and the execution plan
we are going to use undocumented features here so don’t try in a production environment since you can get Microsoft’s support in case of an error or data corruption
Introduction
when you submit a query to SQL Server it goes through multiple phases to get executed
the component that does that is called the query optimizer
it does it in multiple steps:
- first, it Parses the query: it checks if the syntax that was provided was correct on before doing any operation so if you code SLECET instead of SELECT the query gets canceled immediately with no intention of correcting it to save cost
- then the Binding process starts: it checks if the objects you requested exist in the database you are using so if you ask for AnyTable in ContosoRetailDW database and that table does not exist in it, the query gets canceled since there is no such table in there, while doing that it converts the syntax to logical operators, meaning it converts to terms that the engine understands and can find physical operators for, it sees what steps you want so we could use the program to make it for you, meaning we will generate a tree representation of your query, what do you want do these and how do you want them, do you want a join? okay, how many joins? are there any duplicates in there? do you have these tables in this database? do you have access to these tables? etc. so it converts your correct terms to relational algebra terms, all the steps until here were done with the Algebrizer
- after that, the query gets submitted to the query optimizer
- after that it looks if your query can be reduced to less terms, simplifies your query=simplifcation, for example, it could remove unnecessary joins
- after that it includes cardinalty estimation and statistics to your logical operators
- then, it checks whether this query needs to be optimized: meaning is it a trivial plan, meaning are there too many ways to do this statement or is it so simple that there is only one way, is it a complex plan
- if it is then there is no need for optimization you just pass it along and the query gets executed
- but if it is a complex query then we start optimizing, meaning we will start the full optimization process and its steps
- here it starts using its pragmatic approach, heuristic algorithms of execution, it estimates how complex the query is should it be in phase 0, 1, or 2 and starts optimizing
- if it finds a plan in enough time it will generate and pass it to the execution engine
- but if it does not it moves to the next phase
- until the time ends then it passes at the end

sys.dm_exec_query_optimizer_info
This DMV shows us how the query optimizer works:
SELECT*
FROM **sys.dm_exec_query_optimizer_info**
and 3 columns would pop up:

- counter: name of the optimizer event
- occurrence: how many times did this event happen
- value: the average value per event occurrence, the average search task 0 per occurrence of it was 1198
there are 39 counters

- optimizations: how many times did a query get submitted to the optimizer( could be trivial so there is no optimization)
- elapsed_time: how long did it take
- final cost: in internal cost unit which was based on 1997 systems so for example here we had 1019 events on average each cost 7……. in internal units
- trivial_Plan: how many plans where submitted directly( detail later)
- tasks: how many actual optimization tasks that went through the phases so from 653 queries that got optimized each had on average 895 tasks
- no plan: nobody went without a plan
- search0: the initial phase of optimization( details later) : we had 139 queries that got optimized easily in search0 tasks each used about 2272 tasks
- the same for the other search phases
- gain stage 0 to stage 1 or 2: how many of the queries had so many tasks that it needed to move to another phase either it was too complex so right away to 2 or it had some degree 1.

- timeout: when the query optimizer decided that there was not enough time to get good enough plan( property in the execution plan details later)
- memory limit exceeded: when the optimization process for one query exceeded the limit for the plan memory
- …. stmt: how many of that specific statement
- contains subquery
- unnest failed: when we could not unnest the subquery meaning making it some sort of join
- tables: how many tables referenced per query
- hints:
- …. hint: that specific hint per query
- View reference: how many times that a specific view was referenced
- remote query
- maximum DOP: how many times we used parallelism per query
- max_recursion_level: what is the average of it when it was specified
- indexed views loaded, updated, dynamic cursor request fast forward request
we can filter this however we want.
this DMV shows cumulative information since the last server restart, so if we need to capture information for a specific query we need to:
- free the DMV by restarting the server
- , select our DMV so it gets optimized
- then keep the information to get a baseline,
- then execute the query in question
- then select the DMV again to see what kind of optimization went for this specific query
- or we could create snapshots before and after since we don’t just have our dmv there are always system queries going on
non of this should be done in a production environment
for example:
then we will make two snapshots before and after
we have to execute and drop so they get optimized
GO
SELECT*
INTO AFTER_OPTIMIZER
FROM sys.dm_exec_query_optimizer_info
GO
SELECT*
INTO BEFORE_OPTIMIZER
FROM sys.dm_exec_query_optimizer_info
GO
then we have to drop them so their and internal optimization don’t get recorded, make sure that the queries will match exactly the ones we will record after so they don’t get optimized twice, match in everything comments spaces all of it
GO
DROP TABLE AFTER_OPTIMIZER;
DROP TABLE BEFORE_OPTIMIZER;
GO
then we have to record the optimization information before
GO
SELECT*
INTO BEFORE_OPTIMIZER
FROM sys.dm_exec_query_optimizer_info
GO
then we input our query that we want to benchmark
GO
SELECT AddressLine1,AddressLine2
FROM DimCustomer
WHERE CustomerKey >= 10
AND CustomerKey <=200
OPTION (RECOMPILE)
GO
then we will input the optimization information after
GO
SELECT*
INTO AFTER_OPTIMIZER
FROM sys.dm_exec_query_optimizer_info
GO
now we will see our information
GO
SELECT
AFTER_OPTIMIZER.COUNTER,
(AFTER_OPTIMIZER.OCCURRENCE - BEFORE_OPTIMIZER.OCCURRENCE) AS OCCURRENCE,
(AFTER_OPTIMIZER.OCCURRENCE * AFTER_OPTIMIZER.VALUE - BEFORE_OPTIMIZER.OCCURRENCE * BEFORE_OPTIMIZER.VALUE) AS VALUE
FROM
BEFORE_OPTIMIZER
JOIN
AFTER_OPTIMIZER ON BEFORE_OPTIMIZER.COUNTER = AFTER_OPTIMIZER.COUNTER
WHERE
BEFORE_OPTIMIZER.OCCURRENCE <> AFTER_OPTIMIZER.OCCURRENCE;
GO
and this will pop out

NOTE: (if you did not match the insert statements before and after optimization you might see phase 1 optimization and an insert statement but this is output that you should see if you did exactly like I said)
so it happened once, it was trivial, with no need for optimization, only one way to execute, It was simple, used maxdop once, the cost in terms of SQL Server units, and one table was accessed the time that was so insignificant it did not show in terms of milliseconds.
Parsing And Binding
- parsing = checking correct syntax + converting it to a basic tree of T-SQL
- a tree = the logical presentation of the code
- logical presentation = a collection of logical operators
- logical operator = relational algebra presentation of the code = some terms that are more applicable than the code itself for the engine sometimes it is very similar, sometimes it is really different
- like in select get data, or in join is join depending on the type like left join is left join
- after that, we get the parse tree= corrected syntax in terms of a tree that we can work on
so here in the Algerbrizer, we parse the tree, and present it in the basic format
for example, if you run:
slecet*
from dimcurrency

here right away the parser stops and never consumes any resources
same feature could be enabled in ssms by clicking on

here you can use just the parser
or by
SET PARSEONLY ON;
-- Your SQL query here
SELECT * FROM DimCurrency;
SET PARSEONLY OFF;
and you will get the same functionality
then, we start:
- binding: name resolution = finding that the tree we want exists In the database, table we want so here we check what objects do we have, are they there, how could we find them, if not it gets canceled, does the user have the permission to this specific table, etc., do we have a view are we using it or going right away to the table if that is okay then pass it to the optimizer
now even though these trees and how they are created is not documented we can check this extended events DMV to see for ourselves:
SELECT*
FROM sys.dm_xe_map_values
WHERE name = 'query_optimizer_tree_id';

we will talk about the map_value output in a bit
for example this query:
USE ContosoRetailDW;
GO
BEGIN TRAN
SELECT DimAccount.AccountName
FROM DimAccount
JOIN DimCustomer ON DimAccount.AccountKey = DimCustomer.CustomerKey
WHERE DimCustomer.CustomerKey <20
GROUP BY DimAccount.AccountName;
COMMIT TRAN
GO
it would be something like:

Some undocumented trace flags could show us the logical trees used
we have to enable 3604
Converted tree
DBCC TRACEON(3604)
After that, we could try trace flag 8605:
USE ContosoRetailDW
GO
BEGIN TRAN
SELECT AccountKey
from DimAccount
WHERE AccountKey <20
OPTION(RECOMPILE,QUERYTRACEON 8605)--- THIS ENABLES THIS TRACE FOR THIS QUERY ONLY
COMMIT TRAN
GO
and this is the output
*** Converted Tree: ***
LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey
LogOp_Select
LogOp_Get TBL: DimAccount DimAccount TableID=405576483 TableReferenceID=0 IsRow: COL: IsBaseRow1000
ScaOp_Comp x_cmpLt
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=20)
AncOp_PrjList
*******************
so first
LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey
- here it defines the logical operation,
- it projects on these specific columns=these are the names of the tables
- where is the column? Contoso in dbo scheam in dimaccount table
- what is the column = accountkey
then
LogOp_Select
logical operation filters the operator and tells us where we could take the data from it is more like a where clause than a select
LogOp_Get TBL: DimAccount DimAccount TableID=405576483 TableReferenceID=0
IsRow: COL: IsBaseRow1000
ScaOp_Comp x_cmpLt
- logical operation get: it will get information means it will select
- TBL: table
- what is the table and its ID and the reference inside the query this reference helps in differentiating between tables in one logical tree
- then it says it is retrieving rows not other types
- from columns
- isbaserow1000: it is from the base row directly there is no intermediate thing to go through
- ScaOp_Comp: it is a scalar operation that retrieves one data at a time
- what kind of scalar operation? Comp= comparison meaning it contains a comparison operator like >< or =
- x_cmplt: x is an internal specifier
- cmp= comparison operator
- lt= less than = the operator <
- the whole thing means this is a scalar operation of comparison
then
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=20)
AncOp_PrjList
- ScaOp: scalar operator that works with single values
- identifier: it identifies specific column in the table
- QCOL: qualified column= it shows which col it identified
- ScaOp_Const= it is a constant meaning it equals
- TI(int,ML=4): type_id = it is an integer and ML=maximum length in bytes = 4
- XVAR: extended variable =what kind of variable that is constant and we are comparing against, paul holmes says it is like sql_variant meaning a column that can contain multiple data types
- it is an INT and it is not owned by any entity since this 20 does not come from anywhere we wrote it
- and it is 20
- AncOp_PrjList: ancillary operator projectlist= this operator would list computed columns if they existed otherwise it is for internal patching and might get deleted in future logical plans(Connor Cunningham DBA exchange)
these are the logical operators that were generated for this specific plan and this is what we mean by it
there is no documented information for this output so if we want to understand it we have to use relational algebra books and some database textbooks
we relied on Paul Holmes, Paul White, and Benjamin Nevarez while writing this blog
this is only a converted one it only passed through the algebrizer that does some initial preparations like
View Expansion
it would expand the whole view and include the whole definition
for example
let’s create this view
GO
CREATE VIEW CustomerAccountView AS
SELECT
DimAccount.AccountName,
DimAccount.AccountKey,
DimCustomer.CustomerKey
FROM
DimAccount
JOIN
DimCustomer ON DimAccount.AccountKey = DimCustomer.CustomerKey;
GO
then let’s query it with some traces we will only be concerned with the converted tree now
GO
BEGIN TRAN
SELECT
AccountName,
AccountKey,
CustomerKey
FROM
CustomerAccountView
OPTION (RECOMPILE,QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607)
COMMIT TRAN
GO
and this would be the output
*** Converted Tree: ***
LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountName QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey QCOL: [ContosoRetailDW].[dbo].[DimCustomer].CustomerKey
**LogOp_ViewAnchor**
LogOp_Project
LogOp_Join
LogOp_Get TBL: DimAccount DimAccount TableID=405576483 TableReferenceID=0 IsRow: COL: IsBaseRow1000
LogOp_Get TBL: DimCustomer DimCustomer TableID=469576711 TableReferenceID=0 IsRow: COL: IsBaseRow1002
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimCustomer].CustomerKey
AncOp_PrjList
AncOp_PrjList
**LogOp_ViewAnchor shows that we had the view then beneath it
we had the definition and what kind of join it did**
then we have two AncOp_PrjList that would have contained computed columns for the request and the view definition itself but they did not
what is the optimizer telling us at this point?
i will project(select) 3 qualified columns where the view in use is based that join and those conditions in it satisfy what we requested
this is the initial request
SQL server uses the base table directly at a later stage and gets rid of the view
Calculated Columns are expanded (regardless of use since we are just binding now)
let’s re-create our view
-- Drop the existing view if it exists
IF OBJECT_ID('CustomerAccountView', 'V') IS NOT NULL
BEGIN
DROP VIEW CustomerAccountView;
END;
GO
-- Create the new view with the calculated column as a mathematical operation
CREATE VIEW CustomerAccountView
AS
SELECT
AccountName,
AccountKey,
CustomerKey,
AccountKey + CustomerKey AS AccountCustomerKey
FROM
DimAccount
JOIN
DimCustomer ON DimAccount.AccountKey = DimCustomer.CustomerKey;
GO
here as you can see we added a computed column
let’s see the converted tree in some query
GO
BEGIN TRAN
SELECT
AccountName,
AccountKey,
CustomerKey,
AccountCustomerKey
FROM
CustomerAccountView
OPTION (RECOMPILE,QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607)
COMMIT TRAN
GO
and the output:
*** Converted Tree: ***
LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountName QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey QCOL: [ContosoRetailDW].[dbo].[DimCustomer].CustomerKey COL: Expr1004
LogOp_ViewAnchor
LogOp_Project
LogOp_Join
LogOp_Get TBL: DimAccount DimAccount TableID=405576483 TableReferenceID=0 IsRow: COL: IsBaseRow1000
LogOp_Get TBL: DimCustomer DimCustomer TableID=469576711 TableReferenceID=0 IsRow: COL: IsBaseRow1002
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimCustomer].CustomerKey
AncOp_PrjList
AncOp_PrjEl COL: Expr1004
ScaOp_Arithmetic x_aopAdd
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimCustomer].CustomerKey
AncOp_PrjList
*******************
so let’s get to it:
- first and the biggest parent operator: LogOp_Project it has the columns it would output 3 QCOL = columns with their definition provided meaning with their database schema table and their name, meaning their whole namespace and one Col that has only it is referenced name not the whole name could be used inside the context of the tree if we have the same column definition elsewhere or it is a computed expression etc it has two child operators
- LogOp_ViewAnchor: this operator as it sounds shows us the view and lists the definition of it it has
- LogOp_Join: it is the join operator
- LogOp_Get TBL: two that define where we should get(select) our data from has two table columnsScaOp_Comp x_cmpEq: Scalar operator that deals with singular values, it is a comparison operator that uses <>=etc. the x is something internal for the code as far as we can tell cmp is comparison again and Eq is equality means we get this specific LogOp_Get when the scalar operator equals the following child operator conditions
- ScaOp_Identifier QCOL: columns used in the join predicate and what should they be equal to so we have two columns that should be equal to each other for the join to happen
- AncOp_PrjEl COL: Expr1004: what is the projection element of the parent operator here it says it is the column named Expr1004, and then we see the definition of it in its child operators
- ScaOp_Arithmetic x_aopAdd: scalar operator that does an arithmetic operation what kind? addition so we have an addition operation that has two child operators that define the addition
- ScaOp_Identifier QCOL: scalar operator that identifies the columns used in this addition operation, customerkey and accountkey
- ScaOp_Arithmetic x_aopAdd: scalar operator that does an arithmetic operation what kind? addition so we have an addition operation that has two child operators that define the addition
- LogOp_Get TBL: two that define where we should get(select) our data from has two table columnsScaOp_Comp x_cmpEq: Scalar operator that deals with singular values, it is a comparison operator that uses <>=etc. the x is something internal for the code as far as we can tell cmp is comparison again and Eq is equality means we get this specific LogOp_Get when the scalar operator equals the following child operator conditions
- LogOp_Join: it is the join operator
- LogOp_ViewAnchor: this operator as it sounds shows us the view and lists the definition of it it has
Basic Constant folding:
converting some of the constants to simpler ones like 12+2 to 14
for example:
DBCC TRACEON (3604);
GO
USE ContosoRetailDW;
SELECT 10 + 2 AS Result
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607);
and the output:
*** Converted Tree: ***
LogOp_Project COL: Expr1000
LogOp_ConstTableGet (1) [empty]
AncOp_PrjList
AncOp_PrjEl COL: Expr1000
ScaOp_Const TI(int,Null,ML=4) XVAR(int,Not Owned,Value=12)
ScaOp_Const TI: in there you can see the value was 12 even
though we had 10+2 so two operators and one for the arithmetic
operation so 4 were folded to one
those were some of the preparations that the algebrizer has done now we will move to the optimizer, this list was not meant to be exhaustive for more details check out (Microsoft )or(Paul Holmes )
Input-tree
one of the things done here is
converting regular joins to cartesian products with filters(they will be rewritten afterward)
but the output of the cartesian join in there will eventually return the join in question so even though the query optimizer rewrote it as a cartesian, it filters it on the join conditions provided unlike in an actual cartesian where all the products will be produced if the query is to be outputted
for example, let’s cross-join some tables in Contoso
DBCC TRACEON (3604);
use ContosoRetailDW
go
BEGIN TRAN
SELECT
dimcurrency.CurrencyKey,
SalesKey,
StoreKey
FROM FactSales cross JOIN DimCurrency
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607);
commit tran
and the output of the converted tree would be like
*** Converted Tree: ***
LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimCurrency].CurrencyKey QCOL: [ContosoRetailDW].[dbo].[FactSales].SalesKey QCOL: [ContosoRetailDW].[dbo].[FactSales].StoreKey
LogOp_Join
LogOp_Get TBL: FactSales FactSales TableID=1602104748 TableReferenceID=0 IsRow: COL: IsBaseRow1000
LogOp_Get TBL: DimCurrency DimCurrency TableID=245575913 TableReferenceID=0 IsRow: COL: IsBaseRow1002
ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=1)
AncOp_PrjList
*******************
in the LogOp_Join: we can see two Get TBL and one ScaOp_Const TI that equals any possible value
then in the input tree
*** Input Tree: ***
LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimCurrency].CurrencyKey QCOL: [ContosoRetailDW].[dbo].[FactSales].SalesKey QCOL: [ContosoRetailDW].[dbo].[FactSales].StoreKey
LogOp_Join
LogOp_Get TBL: FactSales FactSales TableID=1602104748 TableReferenceID=0 IsRow: COL: IsBaseRow1000
LogOp_Get TBL: DimCurrency DimCurrency TableID=245575913 TableReferenceID=0 IsRow: COL: IsBaseRow1002
ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=1)
AncOp_PrjList
*******************
we see the same output no conversion to another join since it is already a cartesian join but if we use a regular old fashion inner join with ON clause in it like
go
use ContosoRetailDW
go
DBCC TRACEON (3604);
go
BEGIN TRAN
SELECT
dimcurrency.CurrencyKey,
SalesKey,
StoreKey
FROM FactSales inner JOIN DimCurrency
ON DimCurrency.CurrencyKey=FactSales.CurrencyKey
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607);
commit tran
go
we will get in the converted tree
** Converted Tree: ***
LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimCurrency].CurrencyKey QCOL: [ContosoRetailDW].[dbo].[FactSales].SalesKey QCOL: [ContosoRetailDW].[dbo].[FactSales].StoreKey
LogOp_Join
LogOp_Get TBL: FactSales FactSales TableID=1602104748 TableReferenceID=0 IsRow: COL: IsBaseRow1000
LogOp_Get TBL: DimCurrency DimCurrency TableID=245575913 TableReferenceID=0 IsRow: COL: IsBaseRow1002
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimCurrency].CurrencyKey
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[FactSales].CurrencyKey
AncOp_PrjList
in the join operator, we can see the two GETs but we also see a filter which is the join condition in ScaOp_Comp X_cmpEQ so there are some equality filters that we should suffice, the optimizer would not include other values
now in the input tree
*******************
*** Input Tree: ***
LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimCurrency].CurrencyKey QCOL: [ContosoRetailDW].[dbo].[FactSales].SalesKey QCOL: [ContosoRetailDW].[dbo].[FactSales].StoreKey
LogOp_Select
LogOp_Join
LogOp_Get TBL: FactSales FactSales TableID=1602104748 TableReferenceID=0 IsRow: COL: IsBaseRow1000
LogOp_Get TBL: DimCurrency DimCurrency TableID=245575913 TableReferenceID=0 IsRow: COL: IsBaseRow1002
**ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=1)**
**ScaOp_Comp x_cmpEq**
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimCurrency].CurrencyKey
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[FactSales].CurrencyKey
AncOp_PrjList
*******************
now we can see that the join has been rewritten as a cartesian one in ScaOp_Const TI so every output would be included but after the join has been done in the LogOp_Select the ScaOp_Comp x_cmpEq will filter the output for us
this one of the behaviors done by the engine in the input tree phase another one is
Unexpanding the view if we don’t need it
for example
-- Create a view on DimAccount with AccountKey
CREATE VIEW dbo.vDimAccount AS
SELECT
AccountKey
FROM
dbo.DimAccount;
GO
then
BEGIN TRAN;
SELECT
AccountKey
FROM
dbo.vDimAccount
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607);
COMMIT TRAN
the converted tree would have the view definition like
*** Converted Tree: ***
LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey
**LogOp_ViewAnchor**
LogOp_Project
LogOp_Get TBL: dbo.DimAccount dbo.DimAccount TableID=405576483 TableReferenceID=0 IsRow: COL: IsBaseRow1000
AncOp_PrjList
AncOp_PrjList
*******************
but the input would delete it since there is no need for it
*** Input Tree: ***
LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey
LogOp_Project
LogOp_Get TBL: dbo.DimAccount dbo.DimAccount TableID=405576483 TableReferenceID=0 IsRow: COL: IsBaseRow1000
AncOp_PrjList
AncOp_PrjList
no view anchor.
multiple other behaviors are done at this stage but we chose to omit it.
Simplification
so we had a query that had one logical tree
but most of the time we can represent the query in too many forms
some of them could be more efficient than others and easier to understand
this is what you call simplification and the whole purpose is to rewrite the query to the simplest form possible logically this happens in the optimizer.
so we are in the second phase of the optimizer’s rewrites of the query the first one was the input-tree phase.
before that all we did to the logical trees was write and prepare the original syntax
now we are rewriting the original tree entirely but we end up in the same result, it could it in a a lot of forms for example
Subqueries to left semi-joins
for example, if you run this subquery
begin tran
SELECT
FactSales.SalesKey,
FactSales.StoreKey,
FactSales.CurrencyKey
FROM FactSales
where CurrencyKey in
(SELECT CurrencyKey
FROM DimCurrency
)
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607);
COMMIT TRAN;
GO
the converted tree would be like
*** Converted Tree: ***
LogOp_Project QCOL: [ContosoRetailDW].[dbo].[FactSales].SalesKey QCOL: [ContosoRetailDW].[dbo].[FactSales].StoreKey QCOL: [ContosoRetailDW].[dbo].[FactSales].CurrencyKey
LogOp_Select
LogOp_Get TBL: FactSales FactSales TableID=1602104748 TableReferenceID=0 IsRow: COL: IsBaseRow1000
ScaOp_SomeComp 2
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[FactSales].CurrencyKey
LogOp_Project
LogOp_Get TBL: DimCurrency DimCurrency TableID=245575913 TableReferenceID=0 IsRow: COL: IsBaseRow1002
AncOp_PrjList
AncOp_PrjList
so first it would get all the information in the factsales
then afterwards if would filter it with LoGop_Project of the subquery that Gets the CurrencyKey from DimCurrency if there is a match it would record it and present it
in the simplified tree
*** Simplified Tree: ***
LogOp_LeftSemiJoin
LogOp_Get TBL: FactSales FactSales TableID=1602104748 TableReferenceID=0 IsRow: COL: IsBaseRow1000
LogOp_Get TBL: DimCurrency DimCurrency TableID=245575913 TableReferenceID=0 IsRow: COL: IsBaseRow1002
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[FactSales].CurrencyKey
ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimCurrency].CurrencyKey
*******************
it converted it to a left semi-join so it would not get the additional information unless the DimCurrency.CurrencyKey=FactSales.CurrencyKey
removing redundant joins left joins:
so each step has some rules that it could apply that we could see with trace flag 8621
for example, in simplification we have : RedundantLOJN = redundant left join removal
so in AdventureWorks2016
in Purchasing Schema
as you can see PurchaseOrderDetail has a foreign key on PurchaseOrderID from PurchaseOrderHeader, so if we use this constraint
like
BEGIN TRAN
SELECT POD.PurchaseOrderDetailID,POD.PurchaseOrderID
FROM Purchasing.PurchaseOrderDetail POD LEFT JOIN Purchasing.PurchaseOrderHeader POH
ON POD.PurchaseOrderID=POH.PurchaseOrderID
OPTION (RECOMPILE, QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
COMMIT TRAN
here we don’t even need the join because we already have the column in our table so the optimizer opted to use the following rule
*******************
***** Rule applied: RedundantLOJN - A LOJ B -> A
LogOp_Get TBL: Purchasing.PurchaseOrderDetail(alias TBL: POD) Purchasing.PurchaseOrderDetail TableID=1170103209 TableReferenceID=0 IsRow: COL: IsBaseRow1000
*** Simplified Tree: ***
LogOp_Get TBL: Purchasing.PurchaseOrderDetail(alias TBL: POD) Purchasing.PurchaseOrderDetail TableID=1170103209 TableReferenceID=0 IsRow: COL: IsBaseRow1000
*******************
and the join removed and now we just have PurchaseOrderDetail as our table the plan gets simplified
and if we look up the physical execution plan itself

Contradiction detection:
so in PurchaseOrderDetail, if we try to see what check constraints we have
SELECT
t.name AS TableName,
c.name AS ConstraintName,
c.definition AS ConstraintDefinition
FROM
sys.check_constraints AS c
INNER JOIN
sys.tables AS t ON c.parent_object_id = t.object_id
INNER JOIN
sys.schemas AS s ON t.schema_id = s.schema_id
INNER JOIN
sys.objects AS o ON c.parent_object_id = o.object_id
WHERE
s.name = 'Purchasing'
AND t.name = 'PurchaseOrderDetail';
GO
we will get

so logically we are not to have OrderQty equal or less than 0
so it is a contradiction to query that
so if we just do that
begin tran
select OrderQty
from Purchasing.PurchaseOrderDetail
where OrderQty<0
OPTION (RECOMPILE, QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
commit tran
we would get
*******************
***** Rule applied: SelectOnEmpty - Sel(Empty) -> Empty
LogOp_ConstTableGet (0) COL: IsBaseRow1000 QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderID QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderDetailID QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].OrderQty QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].UnitPrice QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].LineTotal QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].ReceivedQty QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].RejectedQty QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].StockedQty COL: KeyCo5
*** Simplified Tree: ***
LogOp_ConstTableGet (0) COL: IsBaseRow1000 QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderID QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderDetailID QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].OrderQty QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].UnitPrice QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].LineTotal QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].ReceivedQty QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].RejectedQty QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].StockedQty COL: KeyCo5
*******************
and we would get LogOp_ConstTableGet(0) Col: means we are getting zero columns from the name definition of the table we asked for meaning no effort since there is a check constraint on our OrderQty
this logical operator is equavilant to constant scan physical operator in the execution plan and it means here and there that sql server would not get any rows from the table since there are no matching values in the table hence the check constraint

now if we drop that
ALTER TABLE Purchasing.PurchaseOrderDetail
DROP CONSTRAINT CK_PurchaseOrderDetail_OrderQty
we would get
*******************
*** Simplified Tree: ***
LogOp_Select
LogOp_Get TBL: Purchasing.PurchaseOrderDetail Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000
ScaOp_Comp x_cmpLt
ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].OrderQty
ScaOp_Const TI(smallint,ML=2) XVAR(smallint,Not Owned,Value=0)
*******************
and in the physical plan

so it had to scan to make sure that the orderqty<0 is not there now let’s get our constraint back
ALTER TABLE Purchasing.PurchaseOrderDetail
ADD CONSTRAINT CK_PurchaseOrderDetail_OrderQty
CHECK (OrderQty > 0);
Predicate Pushdown: pushing the selects to joins
so if we had join
and in the where clause we had a filter after the join
and this was in the selects
the query optimizer might push the select filter down to the join clause before it happens
so it could only get the rows that is needed
for example
SELECT POD.PurchaseOrderDetailID,POH.RevisionNumber
FROM Purchasing.PurchaseOrderDetail POD JOIN Purchasing.PurchaseOrderHeader POH
ON POD.PurchaseOrderID=POH.PurchaseOrderID
WHERE POD.PurchaseOrderDetailID = 20
OPTION (RECOMPILE, QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
here we want the PurchaseOrderDetailID
now in the input tree:
*** Input Tree: ***
LogOp_Project QCOL: [POD].PurchaseOrderDetailID QCOL: [POH].RevisionNumber
LogOp_Select
LogOp_Join
LogOp_Project
LogOp_Get TBL: Purchasing.PurchaseOrderDetail(alias TBL: POD) Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000
AncOp_PrjList ...
LogOp_Project
LogOp_Get TBL: Purchasing.PurchaseOrderHeader(alias TBL: POH) Purchasing.PurchaseOrderHeader TableID=1282103608 TableReferenceID=0 IsRow: COL: IsBaseRow1001
AncOp_PrjList ....
ScaOp_Logical x_lopAnd
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [POD].PurchaseOrderID
ScaOp_Identifier QCOL: [POH].PurchaseOrderID
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [POD].PurchaseOrderDetailID
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=20)
AncOp_PrjList
*******************
we chose to omit the computed columns like the optimizer did afterward
here as you can see it joins the tables then it joins them again where [POD].PurchaseOrderDetailID=20
after the prediacte pushdown in the SELonJN - SEL JN -> JN rule meaning move the select to the join meaning it is a predicate pushdown meaning adding another join condition to eliminate excessive resource consumption we see in the simplified tree
***** Rule applied: SELonJN - SEL JN -> JN
LogOp_Join
LogOp_Select
LogOp_Get TBL: Purchasing.PurchaseOrderDetail(alias TBL: POD) Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [POD].PurchaseOrderDetailID
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=20)
LogOp_Get TBL: Purchasing.PurchaseOrderHeader(alias TBL: POH) Purchasing.PurchaseOrderHeader TableID=1282103608 TableReferenceID=0 IsRow: COL: IsBaseRow1001
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [POH].PurchaseOrderID
ScaOp_Identifier QCOL: [POD].PurchaseOrderID
so here in the LogOp-Join:
- instead of getting the whole PurchaseOrderDetail and joining them
- we first filter PurchaseOrderDetail ON PurchaseOrderId=20
- then we join it with PurchaseOrderHeader when PurchaseOrderID = PurchaseOrderID
same rules applied if we used Common Table Expression:
WITH FilteredPOD AS (
SELECT PurchaseOrderDetailID, PurchaseOrderID
FROM Purchasing.PurchaseOrderDetail
WHERE PurchaseOrderDetailID = 20
)
SELECT POD.PurchaseOrderDetailID, POH.RevisionNumber
FROM FilteredPOD AS POD
JOIN Purchasing.PurchaseOrderHeader AS POH
ON POD.PurchaseOrderID = POH.PurchaseOrderID
OPTION (RECOMPILE, QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
the rule applied
***** Rule applied: SELonJN - SEL JN -> JN
LogOp_Join
LogOp_Select
LogOp_Get TBL: Purchasing.PurchaseOrderDetail Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderDetailID
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=20)
LogOp_Get TBL: Purchasing.PurchaseOrderHeader(alias TBL: POH) Purchasing.PurchaseOrderHeader TableID=1282103608 TableReferenceID=0 IsRow: COL: IsBaseRow1002
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [POH].PurchaseOrderID
ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderID
or a subquery
SELECT POD.PurchaseOrderDetailID, POH.RevisionNumber
FROM (SELECT PurchaseOrderDetailID, PurchaseOrderID
FROM Purchasing.PurchaseOrderDetail
WHERE PurchaseOrderDetailID = 20) AS POD
JOIN Purchasing.PurchaseOrderHeader AS POH
ON POD.PurchaseOrderID = POH.PurchaseOrderID
OPTION (RECOMPILE, QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
and the rule:
***** Rule applied: SELonJN - SEL JN -> JN
LogOp_Join
LogOp_Select
LogOp_Get TBL: Purchasing.PurchaseOrderDetail Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderDetailID
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=20)
LogOp_Get TBL: Purchasing.PurchaseOrderHeader(alias TBL: POH) Purchasing.PurchaseOrderHeader TableID=1282103608 TableReferenceID=0 IsRow: COL: IsBaseRow1002
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [POH].PurchaseOrderID
ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderID
so even though the syntax differs at these values they are all the same
Cardinality Estimation and Statistics
we are going to talk about it in another blog
but basically in order to do heuristic join reordering, meaning choosing in an effective and non-time-consuming way, the good enough plan for the query SQL server makes an assumption based on some sampling from the tables.
now before that it just re-wrote the query without much thinking about the table contents, just basic logical rewriting, this is limited
but now based on the costs of each of the alternative logical plans that it would generate after this phase, it would choose one
so if some join costs less than the other one based on the sample statistics we have on the table
the lesser one would be chosen.
we will discuss this in another blog
but in simplification, the statistics binding happen
and here we start the cost-based optimization methods
now the first thing the optimizer does after it joins statistics
is join-collapsed trees
so we are in still in the logical territory
we are not in the physical operator stuff yet
then it normalizes it
then it might decide that this query needs some optimization (fully optimized plan) or not (trivial)
and after that, we see the physical operators
Join Collapsed Tree (heuristic join reordering)
so what could it do here?
it might eliminate some redundant joins not eliminated before
for example
BEGIN TRAN
SELECT POD.PurchaseOrderID
FROM Purchasing.PurchaseOrderDetail POD INNER JOIN Purchasing.PurchaseOrderHeader POH
ON POD.PurchaseOrderID = POH.PurchaseOrderID
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
COMMIT TRAN
in the simplified
*** Simplified Tree: ***
LogOp_Join
LogOp_Get TBL: Purchasing.PurchaseOrderDetail(alias TBL: POD) Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000
LogOp_Get TBL: Purchasing.PurchaseOrderHeader(alias TBL: POH) Purchasing.PurchaseOrderHeader TableID=1282103608 TableReferenceID=0 IsRow: COL: IsBaseRow1001
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [POH].PurchaseOrderID
ScaOp_Identifier QCOL: [POD].PurchaseOrderID
*******************
so here we have a join but since we already have a foreign key on PurchaseOrderID, we don’t need any joins since the output is the same
that is why the join-collapsed is:
*** Join-collapsed Tree: ***
LogOp_Get TBL: Purchasing.PurchaseOrderDetail(alias TBL: POD) Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000
just the PurchaseOrderDetail
now if we remove the Foreign key:
ALTER TABLE Purchasing.PurchaseOrderDetail
DROP CONSTRAINT FK_PurchaseOrderDetail_PurchaseOrderHeader_PurchaseOrderID;
we would get this join-collapsed tree:
*** Join-collapsed Tree: ***
LogOp_Join
LogOp_Get TBL: Purchasing.PurchaseOrderDetail(alias TBL: POD) Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000
LogOp_Get TBL: Purchasing.PurchaseOrderHeader(alias TBL: POH) Purchasing.PurchaseOrderHeader TableID=1282103608 TableReferenceID=0 IsRow: COL: IsBaseRow1001
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [POH].PurchaseOrderID
so no join-collapse and this execution plan:

now let’s clean up
ALTER TABLE Purchasing.PurchaseOrderDetail
ADD CONSTRAINT FK_PurchaseOrderDetail_PurchaseOrderHeader_PurchaseOrderID
FOREIGN KEY (PurchaseOrderID) REFERENCES Purchasing.PurchaseOrderHeader(PurchaseOrderID);
what else could it do?
LogOp_NAryJoin: new join structure that combines multiple joins
for example:
select E.BusinessEntityID,PH.Rate,DH.DepartmentID
FROM HumanResources.Employee E INNER JOIN HumanResources.EmployeePayHistory PH
ON E.BusinessEntityID = PH.BusinessEntityID
INNER JOIN HumanResources.EmployeeDepartmentHistory DH
ON PH.BusinessEntityID = DH.BusinessEntityID
OPTION (RECOMPILE,QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
in the simplified tree:
*** Simplified Tree: ***
LogOp_Join
LogOp_Join
LogOp_Get TBL: HumanResources.Employee(alias TBL: E) HumanResources.Employee TableID=1573580644 TableReferenceID=0 IsRow: COL: IsBaseRow1000
LogOp_Get TBL: HumanResources.EmployeePayHistory(alias TBL: PH) HumanResources.EmployeePayHistory TableID=1829581556 TableReferenceID=0 IsRow: COL: IsBaseRow1001
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [PH].BusinessEntityID
ScaOp_Identifier QCOL: [E].BusinessEntityID
LogOp_Get TBL: HumanResources.EmployeeDepartmentHistory(alias TBL: DH) HumanResources.EmployeeDepartmentHistory TableID=1781581385 TableReferenceID=0 IsRow: COL: IsBaseRow1002
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [DH].BusinessEntityID
ScaOp_Identifier QCOL: [PH].BusinessEntityID
*******************
you see the join conditions we want and each of them and how to filter each one here after you get the first join you filter by the second
in the join-collapsed:
*** Join-collapsed Tree: ***
LogOp_NAryJoin
LogOp_Get TBL: HumanResources.Employee(alias TBL: E) HumanResources.Employee TableID=1573580644 TableReferenceID=0 IsRow: COL: IsBaseRow1000
LogOp_Get TBL: HumanResources.EmployeeDepartmentHistory(alias TBL: DH) HumanResources.EmployeeDepartmentHistory TableID=1781581385 TableReferenceID=0 IsRow: COL: IsBaseRow1002
LogOp_Get TBL: HumanResources.EmployeePayHistory(alias TBL: PH) HumanResources.EmployeePayHistory TableID=1829581556 TableReferenceID=0 IsRow: COL: IsBaseRow1001
ScaOp_Logical x_lopAnd
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [PH].BusinessEntityID
ScaOp_Identifier QCOL: [DH].BusinessEntityID
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [E].BusinessEntityID
ScaOp_Identifier QCOL: [DH].BusinessEntityID
*******************
here you get the 3 joins together instead of joining the first then joining the result to the second join them all at once
what else could it do?
Eliminates redundant Columns in Table-Valued Constructor (TVC):
like in the following:
SELECT E.BusinessEntityID, MyTVC.ID
FROM HumanResources.Employee E
CROSS JOIN
(
VALUES
(1, 'SOME'),
(2, 'SOME'),
(3, 'SOME')
) AS MyTVC (ID, Col1)
OPTION (RECOMPILE, QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
in the simplified:
*** Simplified Tree: ***
LogOp_Join
LogOp_Get TBL: HumanResources.Employee(alias TBL: E) HumanResources.Employee TableID=1573580644 TableReferenceID=0 IsRow: COL: IsBaseRow1000
LogOp_ConstTableGet (3) COL: Union1007 COL: Union1008
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=1)
ScaOp_Const TI(varchar collate 872468488,Var,Trim,ML=4) XVAR(varchar,Not Owned,Value=Len,Data = (4,SOME))
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)
ScaOp_Const TI(varchar collate 872468488,Var,Trim,ML=4) XVAR(varchar,Not Owned,Value=Len,Data = (4,SOME))
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=3)
ScaOp_Const TI(varchar collate 872468488,Var,Trim,ML=4) XVAR(varchar,Not Owned,Value=Len,Data = (4,SOME))
ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=1)
*******************
here it shows each ‘SOME’ we had but we did not need them in the select so we could just ignore them, so in the join-collapsed:
*** Join-collapsed Tree: ***
LogOp_Join
LogOp_Get TBL: HumanResources.Employee(alias TBL: E) HumanResources.Employee TableID=1573580644 TableReferenceID=0 IsRow: COL: IsBaseRow1000
LogOp_ConstTableGet (3) COL: Union1007
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=1)
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=3)
ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=1)
*******************
only IDs
Before Project Normalization Tree:
nothing happens to the tree Paul White(https://dba.stackexchange.com/questions/273342/logical-trees-difference-between-before-project-normalization-and-join-collapse)
After Project Normalization Tree:
here it could do the following
Remove the expression that matches existing columns and reference the columns instead
for example
SELECT (isnull([ReceivedQty]-[RejectedQty],(0.00))) AS EXPRESSION_BOUT_TO_BE_REMOVED
FROM Purchasing.PurchaseOrderDetail
OPTION (RECOMPILE, QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
in the before project Normalization :
*** Tree Before Project Normalization ***
LogOp_Project
LogOp_Get TBL: Purchasing.PurchaseOrderDetail Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000
AncOp_PrjList
AncOp_PrjEl COL: Expr1002
ScaOp_Intrinsic isnull
ScaOp_Arithmetic x_aopSub
ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].ReceivedQty
ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].RejectedQty
ScaOp_Const TI(decimal,ML=5,PRC=9,SCL=2) XVAR(decimal,Not Owned,Value=(len,prec,scale)=(5,9,2))
*****************************************
in the AncOp_PrjEl COL: Expr1002 we find the entire expression definition
but in the after project normalization tree:
*****************************************
*** Tree After Project Normalization ***
LogOp_Project
LogOp_Get TBL: Purchasing.PurchaseOrderDetail Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000
AncOp_PrjList
AncOp_PrjEl COL: Expr1002
ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].StockedQty
****************************************
we only see it referencing the StockedQty column which matches the exact definition of the expression
what else?
Pre-computing common ranges in scalar operators(expr1002)
for example:
SELECT HumanResources.Department.Name
FROM HumanResources.Department
WHERE RIGHT(HumanResources.Department.Name,2)>='HR'
AND RIGHT(HumanResources.Department.Name,2)>='HT'
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
in before project normalization tree:
*** Tree Before Project Normalization ***
LogOp_Select
LogOp_Get TBL: HumanResources.Department HumanResources.Department TableID=1381579960 TableReferenceID=0 IsRow: COL: IsBaseRow1000
ScaOp_Logical x_lopAnd
ScaOp_Comp x_cmpGe
ScaOp_Intrinsic right
ScaOp_Identifier QCOL: [AdventureWorks2022].[HumanResources].[Department].Name
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)
ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=4) XVAR(nvarchar,Owned,Value=Len,Data = (4,7282))
ScaOp_Comp x_cmpGe
ScaOp_Intrinsic right
ScaOp_Identifier QCOL: [AdventureWorks2022].[HumanResources].[Department].Name
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)
ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=4) XVAR(nvarchar,Owned,Value=Len,Data = (4,7284))
*****************************************
here you can see for each range it included H but in after project normalization:
*** Tree After Project Normalization ***
LogOp_Select
LogOp_Project
LogOp_Get TBL: HumanResources.Department HumanResources.Department TableID=1381579960 TableReferenceID=0 IsRow: COL: IsBaseRow1000
AncOp_PrjList
AncOp_PrjEl COL: Expr1002
ScaOp_Intrinsic right
ScaOp_Identifier QCOL: [AdventureWorks2022].[HumanResources].[Department].Name
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)
ScaOp_Logical x_lopAnd
ScaOp_Comp x_cmpGe
ScaOp_Identifier COL: Expr1002
ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=4) XVAR(nvarchar,Owned,Value=Len,Data = (4,7282))
ScaOp_Comp x_cmpGe
ScaOp_Identifier COL: Expr1002
ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=4) XVAR(nvarchar,Owned,Value=Len,Data = (4,7284))
the optimizer precomputed the H in Expr1002 then it references in the and
what else?
Pushing down derived column expression to the joins:
for example:
SELECT P.MiddleName +'_suffix_'+ P.Suffix AS WHATEVER
FROM Person.Person P INNER JOIN Person.Password S ON
P.BusinessEntityID = S.BusinessEntityID
OPTION (RECOMPILE,querytraceon 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
In before:
*** Tree Before Project Normalization ***
LogOp_Project
LogOp_Join
LogOp_Get TBL: Person.Person(alias TBL: P) Person.Person TableID=2101582525 TableReferenceID=0 IsRow: COL: IsBaseRow1000
LogOp_Get TBL: Person.Password(alias TBL: S) Person.Password TableID=2053582354 TableReferenceID=0 IsRow: COL: IsBaseRow1001
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [S].BusinessEntityID
ScaOp_Identifier QCOL: [P].BusinessEntityID
AncOp_PrjList
AncOp_PrjEl COL: Expr1002
ScaOp_Arithmetic x_aopAdd
ScaOp_Arithmetic x_aopAdd
ScaOp_Identifier QCOL: [P].MiddleName
ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=16) XVAR(nvarchar,Owned,Value=Len,Data = (16,9511511710210210512095))
ScaOp_Identifier QCOL: [P].Suffix
*****************************************
you can see that the LogOp_Projection has two child operators one for the join and one for the expression
but in after:
*** Tree After Project Normalization ***
LogOp_Join
LogOp_Project
LogOp_Get TBL: Person.Person(alias TBL: P) Person.Person TableID=2101582525 TableReferenceID=0 IsRow: COL: IsBaseRow1000
AncOp_PrjList
AncOp_PrjEl COL: Expr1002
ScaOp_Arithmetic x_aopAdd
ScaOp_Arithmetic x_aopAdd
ScaOp_Identifier QCOL: [P].MiddleName
ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=16) XVAR(nvarchar,Owned,Value=Len,Data = (16,9511511710210210512095))
ScaOp_Identifier QCOL: [P].Suffix
LogOp_Get TBL: Person.Password(alias TBL: S) Person.Password TableID=2053582354 TableReferenceID=0 IsRow: COL: IsBaseRow1001
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [S].BusinessEntityID
ScaOp_Identifier QCOL: [P].BusinessEntityID
****************************************
now the LogOp_join pushed down the expression to the get of the Person.Person finished the expression and then joined.
and now we are done with rewriting the query logically and we are about to start generating physical operators.
see you in Part 2
[…] in logical operator generation(check out our blog on the query optimizer part 1) it […]
[…] could put some traces( for more details check out our blog on logical tree generation) to see what the query optimizer […]
[…] Now the next section is a little bit advanced, so you can skip it if you like we introduced it here […]