Query Optimizer In SQL Server: Part 1 Logical Tree Generation

Hi everyone, Today we are going to talk about the way SQL server executes a query

  • How it understands the request.
  • How does it recognize syntax mistakes?
  • how it generates the logical operators of a query, meaning how it transfers the SQL syntax to relational terms
  • how it evaluates the plan for optimization and the optimization process
  • how it generates the physical operators and the execution plan

we are going to use undocumented features here so don’t try in a production environment since you can get Microsoft’s support in case of an error or data corruption

Introduction

when you submit a query to SQL Server it goes through multiple phases to get executed

the component that does that is called the query optimizer

it does it in multiple steps:

  • first, it Parses the query: it checks if the syntax that was provided was correct on before doing any operation so if you code SLECET instead of SELECT the query gets canceled immediately with no intention of correcting it to save cost
  • then the Binding process starts: it checks if the objects you requested exist in the database you are using so if you ask for AnyTable in ContosoRetailDW database and that table does not exist in it, the query gets canceled since there is no such table in there, while doing that it converts the syntax to logical operators, meaning it converts to terms that the engine understands and can find physical operators for, it sees what steps you want so we could use the program to make it for you, meaning we will generate a tree representation of your query, what do you want do these and how do you want them, do you want a join? okay, how many joins? are there any duplicates in there? do you have these tables in this database? do you have access to these tables? etc. so it converts your correct terms to relational algebra terms, all the steps until here were done with the Algebrizer
  • after that, the query gets submitted to the query optimizer
  • after that it looks if your query can be reduced to less terms, simplifies your query=simplifcation, for example, it could remove unnecessary joins
  • after that it includes cardinalty estimation and statistics to your logical operators
  • then, it checks whether this query needs to be optimized: meaning is it a trivial plan, meaning are there too many ways to do this statement or is it so simple that there is only one way, is it a complex plan
  • if it is then there is no need for optimization you just pass it along and the query gets executed
  • but if it is a complex query then we start optimizing, meaning we will start the full optimization process and its steps
  • here it starts using its pragmatic approach, heuristic algorithms of execution, it estimates how complex the query is should it be in phase 0, 1, or 2 and starts optimizing
  • if it finds a plan in enough time it will generate and pass it to the execution engine
  • but if it does not it moves to the next phase
  • until the time ends then it passes at the end

sys.dm_exec_query_optimizer_info

This DMV shows us how the query optimizer works:

SELECT*
FROM **sys.dm_exec_query_optimizer_info**

and 3 columns would pop up:

  • counter: name of the optimizer event
  • occurrence: how many times did this event happen
  • value: the average value per event occurrence, the average search task 0 per occurrence of it was 1198

there are 39 counters

  • optimizations: how many times did a query get submitted to the optimizer( could be trivial so there is no optimization)
  • elapsed_time: how long did it take
  • final cost: in internal cost unit which was based on 1997 systems so for example here we had 1019 events on average each cost 7……. in internal units
  • trivial_Plan: how many plans where submitted directly( detail later)
  • tasks: how many actual optimization tasks that went through the phases so from 653 queries that got optimized each had on average 895 tasks
  • no plan: nobody went without a plan
  • search0: the initial phase of optimization( details later) : we had 139 queries that got optimized easily in search0 tasks each used about 2272 tasks
  • the same for the other search phases
  • gain stage 0 to stage 1 or 2: how many of the queries had so many tasks that it needed to move to another phase either it was too complex so right away to 2 or it had some degree 1.
  • timeout: when the query optimizer decided that there was not enough time to get good enough plan( property in the execution plan details later)
  • memory limit exceeded: when the optimization process for one query exceeded the limit for the plan memory
  • …. stmt: how many of that specific statement
  • contains subquery
  • unnest failed: when we could not unnest the subquery meaning making it some sort of join
  • tables: how many tables referenced per query
  • hints:
  • …. hint: that specific hint per query
  • View reference: how many times that a specific view was referenced
  • remote query
  • maximum DOP: how many times we used parallelism per query
  • max_recursion_level: what is the average of it when it was specified
  • indexed views loaded, updated, dynamic cursor request fast forward request

we can filter this however we want.

this DMV shows cumulative information since the last server restart, so if we need to capture information for a specific query we need to:

  • free the DMV by restarting the server
  • , select our DMV so it gets optimized
  • then keep the information to get a baseline,
  • then execute the query in question
  • then select the DMV again to see what kind of optimization went for this specific query
  • or we could create snapshots before and after since we don’t just have our dmv there are always system queries going on

non of this should be done in a production environment

for example:

then we will make two snapshots before and after

we have to execute and drop so they get optimized

GO
SELECT*
INTO AFTER_OPTIMIZER
FROM sys.dm_exec_query_optimizer_info
GO
SELECT*
INTO BEFORE_OPTIMIZER
FROM sys.dm_exec_query_optimizer_info
GO

then we have to drop them so their and internal optimization don’t get recorded, make sure that the queries will match exactly the ones we will record after so they don’t get optimized twice, match in everything comments spaces all of it

GO
DROP TABLE AFTER_OPTIMIZER;
DROP TABLE BEFORE_OPTIMIZER;
GO

then we have to record the optimization information before

GO
SELECT*
INTO BEFORE_OPTIMIZER
FROM sys.dm_exec_query_optimizer_info
GO

then we input our query that we want to benchmark

GO
SELECT AddressLine1,AddressLine2
FROM DimCustomer
WHERE CustomerKey >= 10 
      AND CustomerKey <=200
OPTION (RECOMPILE)
GO

then we will input the optimization information after

GO
SELECT*
INTO AFTER_OPTIMIZER
FROM sys.dm_exec_query_optimizer_info
GO

now we will see our information

GO
SELECT 
    AFTER_OPTIMIZER.COUNTER,
    (AFTER_OPTIMIZER.OCCURRENCE - BEFORE_OPTIMIZER.OCCURRENCE) AS OCCURRENCE,
    (AFTER_OPTIMIZER.OCCURRENCE * AFTER_OPTIMIZER.VALUE - BEFORE_OPTIMIZER.OCCURRENCE * BEFORE_OPTIMIZER.VALUE) AS VALUE
FROM 
    BEFORE_OPTIMIZER
JOIN 
    AFTER_OPTIMIZER ON BEFORE_OPTIMIZER.COUNTER = AFTER_OPTIMIZER.COUNTER
WHERE 
    BEFORE_OPTIMIZER.OCCURRENCE <> AFTER_OPTIMIZER.OCCURRENCE;
GO

and this will pop out

NOTE: (if you did not match the insert statements before and after optimization you might see phase 1 optimization and an insert statement but this is output that you should see if you did exactly like I said)

so it happened once, it was trivial, with no need for optimization, only one way to execute, It was simple, used maxdop once, the cost in terms of SQL Server units, and one table was accessed the time that was so insignificant it did not show in terms of milliseconds.

Parsing And Binding

  • parsing = checking correct syntax + converting it to a basic tree of T-SQL
  • a tree = the logical presentation of the code
  • logical presentation = a collection of logical operators
  • logical operator = relational algebra presentation of the code = some terms that are more applicable than the code itself for the engine sometimes it is very similar, sometimes it is really different
  • like in select get data, or in join is join depending on the type like left join is left join
  • after that, we get the parse tree= corrected syntax in terms of a tree that we can work on

so here in the Algerbrizer, we parse the tree, and present it in the basic format

for example, if you run:

slecet*
from dimcurrency 

here right away the parser stops and never consumes any resources

same feature could be enabled in ssms by clicking on

here you can use just the parser

or by

SET PARSEONLY ON;
-- Your SQL query here
SELECT * FROM DimCurrency;
SET PARSEONLY OFF;

and you will get the same functionality

then, we start:

  • binding: name resolution = finding that the tree we want exists In the database, table we want so here we check what objects do we have, are they there, how could we find them, if not it gets canceled, does the user have the permission to this specific table, etc., do we have a view are we using it or going right away to the table if that is okay then pass it to the optimizer

now even though these trees and how they are created is not documented we can check this extended events DMV to see for ourselves:

SELECT*
FROM sys.dm_xe_map_values 
WHERE name = 'query_optimizer_tree_id';

we will talk about the map_value output in a bit

for example this query:

USE ContosoRetailDW;
GO
BEGIN TRAN
SELECT DimAccount.AccountName
FROM DimAccount 
JOIN DimCustomer ON DimAccount.AccountKey = DimCustomer.CustomerKey
WHERE DimCustomer.CustomerKey <20 
GROUP BY DimAccount.AccountName;
COMMIT TRAN
GO 

it would be something like:

Some undocumented trace flags could show us the logical trees used

we have to enable 3604

Converted tree

DBCC TRACEON(3604)

After that, we could try trace flag 8605:

USE ContosoRetailDW
GO
BEGIN TRAN 
SELECT AccountKey
from DimAccount
WHERE AccountKey <20 
OPTION(RECOMPILE,QUERYTRACEON 8605)--- THIS ENABLES THIS TRACE FOR THIS QUERY ONLY 
COMMIT TRAN 
GO 

and this is the output

*** Converted Tree: ***
    LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey

        LogOp_Select

            LogOp_Get TBL: DimAccount DimAccount TableID=405576483 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

            ScaOp_Comp x_cmpLt

                ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey

                ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=20)

        AncOp_PrjList 

*******************

so first

LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey

  • here it defines the logical operation,
  • it projects on these specific columns=these are the names of the tables
  • where is the column? Contoso in dbo scheam in dimaccount table
  • what is the column = accountkey

then

LogOp_Select

logical operation filters the operator and tells us where we could take the data from it is more like a where clause than a select

 LogOp_Get TBL: DimAccount DimAccount TableID=405576483 TableReferenceID=0 
 IsRow: COL: IsBaseRow1000 

            ScaOp_Comp x_cmpLt
  • logical operation get: it will get information means it will select
  • TBL: table
  • what is the table and its ID and the reference inside the query this reference helps in differentiating between tables in one logical tree
  • then it says it is retrieving rows not other types
  • from columns
  • isbaserow1000: it is from the base row directly there is no intermediate thing to go through
  • ScaOp_Comp: it is a scalar operation that retrieves one data at a time
  • what kind of scalar operation? Comp= comparison meaning it contains a comparison operator like >< or =
  • x_cmplt: x is an internal specifier
  • cmp= comparison operator
  • lt= less than = the operator <
  • the whole thing means this is a scalar operation of comparison

then

 ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey

                ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=20)

        AncOp_PrjList 

  • ScaOp: scalar operator that works with single values
  • identifier: it identifies specific column in the table
  • QCOL: qualified column= it shows which col it identified
  • ScaOp_Const= it is a constant meaning it equals
  • TI(int,ML=4): type_id = it is an integer and ML=maximum length in bytes = 4
  • XVAR: extended variable =what kind of variable that is constant and we are comparing against, paul holmes says it is like sql_variant meaning a column that can contain multiple data types
  • it is an INT and it is not owned by any entity since this 20 does not come from anywhere we wrote it
  • and it is 20
  • AncOp_PrjList: ancillary operator projectlist= this operator would list computed columns if they existed otherwise it is for internal patching and might get deleted in future logical plans(Connor Cunningham DBA exchange)

these are the logical operators that were generated for this specific plan and this is what we mean by it

there is no documented information for this output so if we want to understand it we have to use relational algebra books and some database textbooks

we relied on Paul Holmes, Paul White, and Benjamin Nevarez while writing this blog

this is only a converted one it only passed through the algebrizer that does some initial preparations like

View Expansion

it would expand the whole view and include the whole definition

for example

let’s create this view

GO
CREATE VIEW CustomerAccountView AS
SELECT 
    DimAccount.AccountName,
    DimAccount.AccountKey,
    DimCustomer.CustomerKey
FROM 
    DimAccount 
JOIN 
    DimCustomer ON DimAccount.AccountKey = DimCustomer.CustomerKey;
GO

then let’s query it with some traces we will only be concerned with the converted tree now

GO
BEGIN TRAN
SELECT 
      AccountName,
      AccountKey,
      CustomerKey 
FROM 
    CustomerAccountView
    OPTION (RECOMPILE,QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607)
COMMIT TRAN
GO

and this would be the output

*** Converted Tree: ***
    LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountName QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey QCOL: [ContosoRetailDW].[dbo].[DimCustomer].CustomerKey

        **LogOp_ViewAnchor**

            LogOp_Project

                LogOp_Join

                    LogOp_Get TBL: DimAccount DimAccount TableID=405576483 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

                    LogOp_Get TBL: DimCustomer DimCustomer TableID=469576711 TableReferenceID=0 IsRow: COL: IsBaseRow1002 

                    ScaOp_Comp x_cmpEq

                        ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey

                        ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimCustomer].CustomerKey

                AncOp_PrjList 

        AncOp_PrjList 

    **LogOp_ViewAnchor shows that we had the view then  beneath it 
we had the definition and what kind of join it did** 

then we have two AncOp_PrjList that would have contained computed columns for the request and the view definition itself but they did not

what is the optimizer telling us at this point?

i will project(select) 3 qualified columns where the view in use is based that join and those conditions in it satisfy what we requested

this is the initial request

SQL server uses the base table directly at a later stage and gets rid of the view

Calculated Columns are expanded (regardless of use since we are just binding now)

let’s re-create our view

-- Drop the existing view if it exists
IF OBJECT_ID('CustomerAccountView', 'V') IS NOT NULL
BEGIN
    DROP VIEW CustomerAccountView;
END;
GO

-- Create the new view with the calculated column as a mathematical operation
CREATE VIEW CustomerAccountView
AS
SELECT 
    AccountName,
    AccountKey,
    CustomerKey,
    AccountKey + CustomerKey AS AccountCustomerKey
FROM 
    DimAccount 
JOIN 
    DimCustomer ON DimAccount.AccountKey = DimCustomer.CustomerKey;
GO

here as you can see we added a computed column

let’s see the converted tree in some query

GO
BEGIN TRAN
SELECT 
      AccountName,
      AccountKey,
      CustomerKey,
      AccountCustomerKey
FROM 
    CustomerAccountView
    OPTION (RECOMPILE,QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607)
COMMIT TRAN
GO

and the output:

*** Converted Tree: ***
    LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountName QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey QCOL: [ContosoRetailDW].[dbo].[DimCustomer].CustomerKey COL: Expr1004 

        LogOp_ViewAnchor

            LogOp_Project

                LogOp_Join

                    LogOp_Get TBL: DimAccount DimAccount TableID=405576483 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

                    LogOp_Get TBL: DimCustomer DimCustomer TableID=469576711 TableReferenceID=0 IsRow: COL: IsBaseRow1002 

                    ScaOp_Comp x_cmpEq

                        ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey

                        ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimCustomer].CustomerKey

                AncOp_PrjList 

                    AncOp_PrjEl COL: Expr1004 

                        ScaOp_Arithmetic x_aopAdd

                            ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey

                            ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimCustomer].CustomerKey

        AncOp_PrjList 

*******************

so let’s get to it:

  • first and the biggest parent operator: LogOp_Project it has the columns it would output 3 QCOL = columns with their definition provided meaning with their database schema table and their name, meaning their whole namespace and one Col that has only it is referenced name not the whole name could be used inside the context of the tree if we have the same column definition elsewhere or it is a computed expression etc it has two child operators
    • LogOp_ViewAnchor: this operator as it sounds shows us the view and lists the definition of it it has
      • LogOp_Join: it is the join operator
        • LogOp_Get TBL: two that define where we should get(select) our data from has two table columnsScaOp_Comp x_cmpEq: Scalar operator that deals with singular values, it is a comparison operator that uses <>=etc. the x is something internal for the code as far as we can tell cmp is comparison again and Eq is equality means we get this specific LogOp_Get when the scalar operator equals the following child operator conditions
          • ScaOp_Identifier QCOL: columns used in the join predicate and what should they be equal to so we have two columns that should be equal to each other for the join to happen
        AncOp_PrjList : the second child operator for the view it shows the computed columns if they exist otherwise it is either for patching the inputs for the algorithm(Connor Cunningham) or it is for the control of the output of the parent operator
        • AncOp_PrjEl COL: Expr1004: what is the projection element of the parent operator here it says it is the column named Expr1004, and then we see the definition of it in its child operators
          • ScaOp_Arithmetic x_aopAdd: scalar operator that does an arithmetic operation what kind? addition so we have an addition operation that has two child operators that define the addition
            • ScaOp_Identifier QCOL: scalar operator that identifies the columns used in this addition operation, customerkey and accountkey
      AncOp_PrjList : the second child operator of the first LogOp_Project first parent operator.

Basic Constant folding:

converting some of the constants to simpler ones like 12+2 to 14

for example:

DBCC TRACEON (3604);
GO
USE ContosoRetailDW;
SELECT 10 + 2 AS Result 
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607);

and the output:

*** Converted Tree: ***
    LogOp_Project COL: Expr1000 

        LogOp_ConstTableGet (1) [empty]

        AncOp_PrjList 

            AncOp_PrjEl COL: Expr1000 

                ScaOp_Const TI(int,Null,ML=4) XVAR(int,Not Owned,Value=12)

    ScaOp_Const TI: in there you can see the value was 12 even
 though  we had 10+2  so two operators and one for the arithmetic
 operation so 4 were folded to one 

those were some of the preparations that the algebrizer has done now we will move to the optimizer, this list was not meant to be exhaustive for more details check out (Microsoft )or(Paul Holmes )

Input-tree

one of the things done here is

converting regular joins to cartesian products with filters(they will be rewritten afterward)

but the output of the cartesian join in there will eventually return the join in question so even though the query optimizer rewrote it as a cartesian, it filters it on the join conditions provided unlike in an actual cartesian where all the products will be produced if the query is to be outputted

for example, let’s cross-join some tables in Contoso

DBCC TRACEON (3604);

use ContosoRetailDW
go
BEGIN TRAN 
SELECT
dimcurrency.CurrencyKey,
SalesKey,
StoreKey
FROM FactSales cross JOIN DimCurrency 
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607);
commit tran

and the output of the converted tree would be like

*** Converted Tree: ***
    LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimCurrency].CurrencyKey QCOL: [ContosoRetailDW].[dbo].[FactSales].SalesKey QCOL: [ContosoRetailDW].[dbo].[FactSales].StoreKey

        LogOp_Join

            LogOp_Get TBL: FactSales FactSales TableID=1602104748 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

            LogOp_Get TBL: DimCurrency DimCurrency TableID=245575913 TableReferenceID=0 IsRow: COL: IsBaseRow1002 

            ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=1)

        AncOp_PrjList 

*******************

in the LogOp_Join: we can see two Get TBL and one ScaOp_Const TI that equals any possible value

then in the input tree

*** Input Tree: ***
        LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimCurrency].CurrencyKey QCOL: [ContosoRetailDW].[dbo].[FactSales].SalesKey QCOL: [ContosoRetailDW].[dbo].[FactSales].StoreKey

            LogOp_Join

                LogOp_Get TBL: FactSales FactSales TableID=1602104748 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

                LogOp_Get TBL: DimCurrency DimCurrency TableID=245575913 TableReferenceID=0 IsRow: COL: IsBaseRow1002 

                ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=1)

            AncOp_PrjList 

*******************

we see the same output no conversion to another join since it is already a cartesian join but if we use a regular old fashion inner join with ON clause in it like

go
use ContosoRetailDW
go
DBCC TRACEON (3604);
go
BEGIN TRAN 
SELECT
dimcurrency.CurrencyKey,
SalesKey,
StoreKey
FROM FactSales inner JOIN DimCurrency 
               ON DimCurrency.CurrencyKey=FactSales.CurrencyKey
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607);
commit tran 
go

we will get in the converted tree

** Converted Tree: ***
    LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimCurrency].CurrencyKey QCOL: [ContosoRetailDW].[dbo].[FactSales].SalesKey QCOL: [ContosoRetailDW].[dbo].[FactSales].StoreKey

        LogOp_Join

            LogOp_Get TBL: FactSales FactSales TableID=1602104748 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

            LogOp_Get TBL: DimCurrency DimCurrency TableID=245575913 TableReferenceID=0 IsRow: COL: IsBaseRow1002 

            ScaOp_Comp x_cmpEq

                ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimCurrency].CurrencyKey

                ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[FactSales].CurrencyKey

        AncOp_PrjList 

in the join operator, we can see the two GETs but we also see a filter which is the join condition in ScaOp_Comp X_cmpEQ so there are some equality filters that we should suffice, the optimizer would not include other values

now in the input tree

*******************

*** Input Tree: ***
        LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimCurrency].CurrencyKey QCOL: [ContosoRetailDW].[dbo].[FactSales].SalesKey QCOL: [ContosoRetailDW].[dbo].[FactSales].StoreKey

            LogOp_Select

                LogOp_Join

                    LogOp_Get TBL: FactSales FactSales TableID=1602104748 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

                    LogOp_Get TBL: DimCurrency DimCurrency TableID=245575913 TableReferenceID=0 IsRow: COL: IsBaseRow1002 

                    **ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=1)**

                **ScaOp_Comp x_cmpEq**

                    ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimCurrency].CurrencyKey

                    ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[FactSales].CurrencyKey

            AncOp_PrjList 

*******************

now we can see that the join has been rewritten as a cartesian one in ScaOp_Const TI so every output would be included but after the join has been done in the LogOp_Select the ScaOp_Comp x_cmpEq will filter the output for us

this one of the behaviors done by the engine in the input tree phase another one is

Unexpanding the view if we don’t need it

for example

-- Create a view on DimAccount with AccountKey
CREATE VIEW dbo.vDimAccount AS
SELECT 
    AccountKey
FROM 
    dbo.DimAccount;
GO

then

BEGIN TRAN;
SELECT 
    AccountKey
FROM 
    dbo.vDimAccount
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607);

COMMIT TRAN

the converted tree would have the view definition like

*** Converted Tree: ***
    LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey

        **LogOp_ViewAnchor**

            LogOp_Project

                LogOp_Get TBL: dbo.DimAccount dbo.DimAccount TableID=405576483 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

                AncOp_PrjList 

        AncOp_PrjList 

*******************

but the input would delete it since there is no need for it


*** Input Tree: ***
        LogOp_Project QCOL: [ContosoRetailDW].[dbo].[DimAccount].AccountKey

            LogOp_Project

                LogOp_Get TBL: dbo.DimAccount dbo.DimAccount TableID=405576483 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

                AncOp_PrjList 

            AncOp_PrjList 

no view anchor.

multiple other behaviors are done at this stage but we chose to omit it.

Simplification

so we had a query that had one logical tree

but most of the time we can represent the query in too many forms

some of them could be more efficient than others and easier to understand

this is what you call simplification and the whole purpose is to rewrite the query to the simplest form possible logically this happens in the optimizer.

so we are in the second phase of the optimizer’s rewrites of the query the first one was the input-tree phase.

before that all we did to the logical trees was write and prepare the original syntax

now we are rewriting the original tree entirely but we end up in the same result, it could it in a a lot of forms for example

Subqueries to left semi-joins

for example, if you run this subquery

begin tran
SELECT
    FactSales.SalesKey,
    FactSales.StoreKey,
    FactSales.CurrencyKey
FROM FactSales
where CurrencyKey in 
    (SELECT CurrencyKey
     FROM DimCurrency
     
    )
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607);
COMMIT TRAN;
GO

the converted tree would be like

*** Converted Tree: ***
    LogOp_Project QCOL: [ContosoRetailDW].[dbo].[FactSales].SalesKey QCOL: [ContosoRetailDW].[dbo].[FactSales].StoreKey QCOL: [ContosoRetailDW].[dbo].[FactSales].CurrencyKey

        LogOp_Select

            LogOp_Get TBL: FactSales FactSales TableID=1602104748 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

            ScaOp_SomeComp 2

                ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[FactSales].CurrencyKey

                LogOp_Project

                    LogOp_Get TBL: DimCurrency DimCurrency TableID=245575913 TableReferenceID=0 IsRow: COL: IsBaseRow1002 

                    AncOp_PrjList 

        AncOp_PrjList 

so first it would get all the information in the factsales

then afterwards if would filter it with LoGop_Project of the subquery that Gets the CurrencyKey from DimCurrency if there is a match it would record it and present it

in the simplified tree

*** Simplified Tree: ***
        LogOp_LeftSemiJoin

            LogOp_Get TBL: FactSales FactSales TableID=1602104748 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

            LogOp_Get TBL: DimCurrency DimCurrency TableID=245575913 TableReferenceID=0 IsRow: COL: IsBaseRow1002 

            ScaOp_Comp x_cmpEq

                ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[FactSales].CurrencyKey

                ScaOp_Identifier QCOL: [ContosoRetailDW].[dbo].[DimCurrency].CurrencyKey

*******************

it converted it to a left semi-join so it would not get the additional information unless the DimCurrency.CurrencyKey=FactSales.CurrencyKey

removing redundant joins left joins:

so each step has some rules that it could apply that we could see with trace flag 8621

for example, in simplification we have : RedundantLOJN = redundant left join removal

so in AdventureWorks2016

in Purchasing Schema

as you can see PurchaseOrderDetail has a foreign key on PurchaseOrderID from PurchaseOrderHeader, so if we use this constraint

like


BEGIN TRAN 
SELECT POD.PurchaseOrderDetailID,POD.PurchaseOrderID
FROM Purchasing.PurchaseOrderDetail POD LEFT JOIN Purchasing.PurchaseOrderHeader POH 
                                         ON POD.PurchaseOrderID=POH.PurchaseOrderID
OPTION (RECOMPILE, QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
COMMIT TRAN 

here we don’t even need the join because we already have the column in our table so the optimizer opted to use the following rule

*******************

***** Rule applied: RedundantLOJN - A LOJ B -> A

    LogOp_Get TBL: Purchasing.PurchaseOrderDetail(alias TBL: POD) Purchasing.PurchaseOrderDetail TableID=1170103209 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

*** Simplified Tree: ***
        LogOp_Get TBL: Purchasing.PurchaseOrderDetail(alias TBL: POD) Purchasing.PurchaseOrderDetail TableID=1170103209 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

*******************

and the join removed and now we just have PurchaseOrderDetail as our table the plan gets simplified

and if we look up the physical execution plan itself

Contradiction detection:

so in PurchaseOrderDetail, if we try to see what check constraints we have

SELECT 
    t.name AS TableName,
    c.name AS ConstraintName,
    c.definition AS ConstraintDefinition
FROM 
    sys.check_constraints AS c
INNER JOIN 
    sys.tables AS t ON c.parent_object_id = t.object_id
INNER JOIN 
    sys.schemas AS s ON t.schema_id = s.schema_id
INNER JOIN 
    sys.objects AS o ON c.parent_object_id = o.object_id
WHERE 
    s.name = 'Purchasing' 
    AND t.name = 'PurchaseOrderDetail';
GO

we will get

so logically we are not to have OrderQty equal or less than 0

so it is a contradiction to query that

so if we just do that

begin tran 
select OrderQty
from Purchasing.PurchaseOrderDetail
where OrderQty<0
OPTION (RECOMPILE, QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
commit tran 

we would get

*******************

***** Rule applied: SelectOnEmpty - Sel(Empty) -> Empty

    LogOp_ConstTableGet (0) COL: IsBaseRow1000  QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderID QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderDetailID QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].OrderQty QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].UnitPrice QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].LineTotal QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].ReceivedQty QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].RejectedQty QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].StockedQty COL: KeyCo5 

*** Simplified Tree: ***
        LogOp_ConstTableGet (0) COL: IsBaseRow1000  QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderID QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderDetailID QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].OrderQty QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].UnitPrice QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].LineTotal QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].ReceivedQty QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].RejectedQty QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].StockedQty COL: KeyCo5 

*******************

and we would get LogOp_ConstTableGet(0) Col: means we are getting zero columns from the name definition of the table we asked for meaning no effort since there is a check constraint on our OrderQty

this logical operator is equavilant to constant scan physical operator in the execution plan and it means here and there that sql server would not get any rows from the table since there are no matching values in the table hence the check constraint

now if we drop that


ALTER TABLE Purchasing.PurchaseOrderDetail
	DROP CONSTRAINT CK_PurchaseOrderDetail_OrderQty

we would get

*******************

*** Simplified Tree: ***
        LogOp_Select

            LogOp_Get TBL: Purchasing.PurchaseOrderDetail Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

            ScaOp_Comp x_cmpLt

                ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].OrderQty

                ScaOp_Const TI(smallint,ML=2) XVAR(smallint,Not Owned,Value=0)

*******************

and in the physical plan

so it had to scan to make sure that the orderqty<0 is not there now let’s get our constraint back


    ALTER TABLE Purchasing.PurchaseOrderDetail
ADD CONSTRAINT CK_PurchaseOrderDetail_OrderQty
CHECK (OrderQty > 0);

Predicate Pushdown: pushing the selects to joins

so if we had join

and in the where clause we had a filter after the join

and this was in the selects

the query optimizer might push the select filter down to the join clause before it happens

so it could only get the rows that is needed

for example

SELECT POD.PurchaseOrderDetailID,POH.RevisionNumber
FROM Purchasing.PurchaseOrderDetail POD JOIN Purchasing.PurchaseOrderHeader POH
                                         ON POD.PurchaseOrderID=POH.PurchaseOrderID
WHERE POD.PurchaseOrderDetailID = 20 
OPTION (RECOMPILE, QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

here we want the PurchaseOrderDetailID

now in the input tree:

*** Input Tree: ***
        LogOp_Project QCOL: [POD].PurchaseOrderDetailID QCOL: [POH].RevisionNumber

            LogOp_Select

                LogOp_Join

                    LogOp_Project

                        LogOp_Get TBL: Purchasing.PurchaseOrderDetail(alias TBL: POD) Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

                        AncOp_PrjList ...
                    LogOp_Project

                        LogOp_Get TBL: Purchasing.PurchaseOrderHeader(alias TBL: POH) Purchasing.PurchaseOrderHeader TableID=1282103608 TableReferenceID=0 IsRow: COL: IsBaseRow1001 

                        AncOp_PrjList ....
                ScaOp_Logical x_lopAnd

                    ScaOp_Comp x_cmpEq

                        ScaOp_Identifier QCOL: [POD].PurchaseOrderID

                        ScaOp_Identifier QCOL: [POH].PurchaseOrderID

                    ScaOp_Comp x_cmpEq

                        ScaOp_Identifier QCOL: [POD].PurchaseOrderDetailID

                        ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=20)

            AncOp_PrjList 

*******************

we chose to omit the computed columns like the optimizer did afterward

here as you can see it joins the tables then it joins them again where [POD].PurchaseOrderDetailID=20

after the prediacte pushdown in the SELonJN - SEL JN -> JN rule meaning move the select to the join meaning it is a predicate pushdown meaning adding another join condition to eliminate excessive resource consumption we see in the simplified tree

***** Rule applied: SELonJN - SEL JN -> JN

    LogOp_Join

      LogOp_Select

        LogOp_Get TBL: Purchasing.PurchaseOrderDetail(alias TBL: POD) Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

        ScaOp_Comp x_cmpEq

          ScaOp_Identifier QCOL: [POD].PurchaseOrderDetailID

          ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=20)

      LogOp_Get TBL: Purchasing.PurchaseOrderHeader(alias TBL: POH) Purchasing.PurchaseOrderHeader TableID=1282103608 TableReferenceID=0 IsRow: COL: IsBaseRow1001 

      ScaOp_Comp x_cmpEq

        ScaOp_Identifier QCOL: [POH].PurchaseOrderID

        ScaOp_Identifier QCOL: [POD].PurchaseOrderID

so here in the LogOp-Join:

  • instead of getting the whole PurchaseOrderDetail and joining them
  • we first filter PurchaseOrderDetail ON PurchaseOrderId=20
  • then we join it with PurchaseOrderHeader when PurchaseOrderID = PurchaseOrderID

same rules applied if we used Common Table Expression:

WITH FilteredPOD AS (
    SELECT PurchaseOrderDetailID, PurchaseOrderID
    FROM Purchasing.PurchaseOrderDetail
    WHERE PurchaseOrderDetailID = 20
)
SELECT POD.PurchaseOrderDetailID, POH.RevisionNumber
FROM FilteredPOD AS POD
JOIN Purchasing.PurchaseOrderHeader AS POH
    ON POD.PurchaseOrderID = POH.PurchaseOrderID
OPTION (RECOMPILE, QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

the rule applied

***** Rule applied: SELonJN - SEL JN -> JN

    LogOp_Join

      LogOp_Select

        LogOp_Get TBL: Purchasing.PurchaseOrderDetail Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

        ScaOp_Comp x_cmpEq

          ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderDetailID

          ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=20)

      LogOp_Get TBL: Purchasing.PurchaseOrderHeader(alias TBL: POH) Purchasing.PurchaseOrderHeader TableID=1282103608 TableReferenceID=0 IsRow: COL: IsBaseRow1002 

      ScaOp_Comp x_cmpEq

        ScaOp_Identifier QCOL: [POH].PurchaseOrderID

        ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderID

or a subquery

SELECT POD.PurchaseOrderDetailID, POH.RevisionNumber
FROM (SELECT PurchaseOrderDetailID, PurchaseOrderID
      FROM Purchasing.PurchaseOrderDetail
      WHERE PurchaseOrderDetailID = 20) AS POD
JOIN Purchasing.PurchaseOrderHeader AS POH
    ON POD.PurchaseOrderID = POH.PurchaseOrderID
OPTION (RECOMPILE, QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

and the rule:


***** Rule applied: SELonJN - SEL JN -> JN

    LogOp_Join

      LogOp_Select

        LogOp_Get TBL: Purchasing.PurchaseOrderDetail Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

        ScaOp_Comp x_cmpEq

          ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderDetailID

          ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=20)

      LogOp_Get TBL: Purchasing.PurchaseOrderHeader(alias TBL: POH) Purchasing.PurchaseOrderHeader TableID=1282103608 TableReferenceID=0 IsRow: COL: IsBaseRow1002 

      ScaOp_Comp x_cmpEq

        ScaOp_Identifier QCOL: [POH].PurchaseOrderID

        ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].PurchaseOrderID

so even though the syntax differs at these values they are all the same

Cardinality Estimation and Statistics

we are going to talk about it in another blog

but basically in order to do heuristic join reordering, meaning choosing in an effective and non-time-consuming way, the good enough plan for the query SQL server makes an assumption based on some sampling from the tables.

now before that it just re-wrote the query without much thinking about the table contents, just basic logical rewriting, this is limited

but now based on the costs of each of the alternative logical plans that it would generate after this phase, it would choose one

so if some join costs less than the other one based on the sample statistics we have on the table

the lesser one would be chosen.

we will discuss this in another blog

but in simplification, the statistics binding happen

and here we start the cost-based optimization methods

now the first thing the optimizer does after it joins statistics

is join-collapsed trees

so we are in still in the logical territory

we are not in the physical operator stuff yet

then it normalizes it

then it might decide that this query needs some optimization (fully optimized plan) or not (trivial)

and after that, we see the physical operators

Join Collapsed Tree (heuristic join reordering)

so what could it do here?

it might eliminate some redundant joins not eliminated before

for example

BEGIN TRAN 
SELECT POD.PurchaseOrderID
FROM Purchasing.PurchaseOrderDetail POD INNER JOIN Purchasing.PurchaseOrderHeader POH 
                                       ON POD.PurchaseOrderID = POH.PurchaseOrderID
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);
COMMIT TRAN

in the simplified

*** Simplified Tree: ***
        LogOp_Join

            LogOp_Get TBL: Purchasing.PurchaseOrderDetail(alias TBL: POD) Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

            LogOp_Get TBL: Purchasing.PurchaseOrderHeader(alias TBL: POH) Purchasing.PurchaseOrderHeader TableID=1282103608 TableReferenceID=0 IsRow: COL: IsBaseRow1001 

            ScaOp_Comp x_cmpEq

                ScaOp_Identifier QCOL: [POH].PurchaseOrderID

                ScaOp_Identifier QCOL: [POD].PurchaseOrderID

*******************

so here we have a join but since we already have a foreign key on PurchaseOrderID, we don’t need any joins since the output is the same

that is why the join-collapsed is:

*** Join-collapsed Tree: ***
        LogOp_Get TBL: Purchasing.PurchaseOrderDetail(alias TBL: POD) Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

just the PurchaseOrderDetail

now if we remove the Foreign key:

ALTER TABLE Purchasing.PurchaseOrderDetail
DROP CONSTRAINT FK_PurchaseOrderDetail_PurchaseOrderHeader_PurchaseOrderID;

we would get this join-collapsed tree:

*** Join-collapsed Tree: ***
        LogOp_Join

            LogOp_Get TBL: Purchasing.PurchaseOrderDetail(alias TBL: POD) Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

            LogOp_Get TBL: Purchasing.PurchaseOrderHeader(alias TBL: POH) Purchasing.PurchaseOrderHeader TableID=1282103608 TableReferenceID=0 IsRow: COL: IsBaseRow1001 

            ScaOp_Comp x_cmpEq

                ScaOp_Identifier QCOL: [POH].PurchaseOrderID

so no join-collapse and this execution plan:

now let’s clean up

ALTER TABLE Purchasing.PurchaseOrderDetail
ADD CONSTRAINT FK_PurchaseOrderDetail_PurchaseOrderHeader_PurchaseOrderID
FOREIGN KEY (PurchaseOrderID) REFERENCES Purchasing.PurchaseOrderHeader(PurchaseOrderID);

what else could it do?

LogOp_NAryJoin: new join structure that combines multiple joins

for example:

select E.BusinessEntityID,PH.Rate,DH.DepartmentID
FROM  HumanResources.Employee E INNER JOIN HumanResources.EmployeePayHistory PH
                               ON E.BusinessEntityID = PH.BusinessEntityID
                               INNER JOIN HumanResources.EmployeeDepartmentHistory DH
                               ON PH.BusinessEntityID = DH.BusinessEntityID
  OPTION (RECOMPILE,QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

in the simplified tree:

*** Simplified Tree: ***
        LogOp_Join

            LogOp_Join

                LogOp_Get TBL: HumanResources.Employee(alias TBL: E) HumanResources.Employee TableID=1573580644 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

                LogOp_Get TBL: HumanResources.EmployeePayHistory(alias TBL: PH) HumanResources.EmployeePayHistory TableID=1829581556 TableReferenceID=0 IsRow: COL: IsBaseRow1001 

                ScaOp_Comp x_cmpEq

                    ScaOp_Identifier QCOL: [PH].BusinessEntityID

                    ScaOp_Identifier QCOL: [E].BusinessEntityID

            LogOp_Get TBL: HumanResources.EmployeeDepartmentHistory(alias TBL: DH) HumanResources.EmployeeDepartmentHistory TableID=1781581385 TableReferenceID=0 IsRow: COL: IsBaseRow1002 

            ScaOp_Comp x_cmpEq

                ScaOp_Identifier QCOL: [DH].BusinessEntityID

                ScaOp_Identifier QCOL: [PH].BusinessEntityID

*******************

you see the join conditions we want and each of them and how to filter each one here after you get the first join you filter by the second

in the join-collapsed:


*** Join-collapsed Tree: ***
        LogOp_NAryJoin

            LogOp_Get TBL: HumanResources.Employee(alias TBL: E) HumanResources.Employee TableID=1573580644 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

            LogOp_Get TBL: HumanResources.EmployeeDepartmentHistory(alias TBL: DH) HumanResources.EmployeeDepartmentHistory TableID=1781581385 TableReferenceID=0 IsRow: COL: IsBaseRow1002 

            LogOp_Get TBL: HumanResources.EmployeePayHistory(alias TBL: PH) HumanResources.EmployeePayHistory TableID=1829581556 TableReferenceID=0 IsRow: COL: IsBaseRow1001 

            ScaOp_Logical x_lopAnd

                ScaOp_Comp x_cmpEq

                    ScaOp_Identifier QCOL: [PH].BusinessEntityID

                    ScaOp_Identifier QCOL: [DH].BusinessEntityID

                ScaOp_Comp x_cmpEq

                    ScaOp_Identifier QCOL: [E].BusinessEntityID

                    ScaOp_Identifier QCOL: [DH].BusinessEntityID

*******************

here you get the 3 joins together instead of joining the first then joining the result to the second join them all at once

what else could it do?

Eliminates redundant Columns in Table-Valued Constructor (TVC):

like in the following:


SELECT E.BusinessEntityID, MyTVC.ID
FROM HumanResources.Employee E
CROSS JOIN 
(
    VALUES 
        (1, 'SOME'),
        (2, 'SOME'),
        (3, 'SOME')
) AS MyTVC (ID, Col1)
OPTION (RECOMPILE, QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

in the simplified:


*** Simplified Tree: ***
        LogOp_Join

            LogOp_Get TBL: HumanResources.Employee(alias TBL: E) HumanResources.Employee TableID=1573580644 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

            LogOp_ConstTableGet (3) COL: Union1007  COL: Union1008 

                ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=1)

                ScaOp_Const TI(varchar collate 872468488,Var,Trim,ML=4) XVAR(varchar,Not Owned,Value=Len,Data = (4,SOME))

                ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)

                ScaOp_Const TI(varchar collate 872468488,Var,Trim,ML=4) XVAR(varchar,Not Owned,Value=Len,Data = (4,SOME))

                ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=3)

                ScaOp_Const TI(varchar collate 872468488,Var,Trim,ML=4) XVAR(varchar,Not Owned,Value=Len,Data = (4,SOME))

            ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=1)

*******************

here it shows each ‘SOME’ we had but we did not need them in the select so we could just ignore them, so in the join-collapsed:

*** Join-collapsed Tree: ***
        LogOp_Join

            LogOp_Get TBL: HumanResources.Employee(alias TBL: E) HumanResources.Employee TableID=1573580644 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

            LogOp_ConstTableGet (3) COL: Union1007 

                ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=1)

                ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)

                ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=3)

            ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=1)

*******************

only IDs

Before Project Normalization Tree:

nothing happens to the tree Paul White(https://dba.stackexchange.com/questions/273342/logical-trees-difference-between-before-project-normalization-and-join-collapse)

After Project Normalization Tree:

here it could do the following

Remove the expression that matches existing columns and reference the columns instead

for example

SELECT (isnull([ReceivedQty]-[RejectedQty],(0.00))) AS  EXPRESSION_BOUT_TO_BE_REMOVED
FROM Purchasing.PurchaseOrderDetail
OPTION (RECOMPILE, QUERYTRACEON 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

in the before project Normalization :

*** Tree Before Project Normalization ***
        LogOp_Project

            LogOp_Get TBL: Purchasing.PurchaseOrderDetail Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

            AncOp_PrjList 

                AncOp_PrjEl COL: Expr1002 

                    ScaOp_Intrinsic isnull

                        ScaOp_Arithmetic x_aopSub

                            ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].ReceivedQty

                            ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].RejectedQty

                        ScaOp_Const TI(decimal,ML=5,PRC=9,SCL=2) XVAR(decimal,Not Owned,Value=(len,prec,scale)=(5,9,2))

*****************************************

in the AncOp_PrjEl COL: Expr1002 we find the entire expression definition

but in the after project normalization tree:

*****************************************

*** Tree After Project Normalization ***
        LogOp_Project

            LogOp_Get TBL: Purchasing.PurchaseOrderDetail Purchasing.PurchaseOrderDetail TableID=1186103266 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

            AncOp_PrjList 

                AncOp_PrjEl COL: Expr1002 

                    ScaOp_Identifier QCOL: [AdventureWorks2022].[Purchasing].[PurchaseOrderDetail].StockedQty

****************************************

we only see it referencing the StockedQty column which matches the exact definition of the expression

what else?

Pre-computing common ranges in scalar operators(expr1002)

for example:


SELECT HumanResources.Department.Name
FROM HumanResources.Department 
WHERE RIGHT(HumanResources.Department.Name,2)>='HR'
      AND RIGHT(HumanResources.Department.Name,2)>='HT'
      OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

in before project normalization tree:

*** Tree Before Project Normalization ***
        LogOp_Select

            LogOp_Get TBL: HumanResources.Department HumanResources.Department TableID=1381579960 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

            ScaOp_Logical x_lopAnd

                ScaOp_Comp x_cmpGe

                    ScaOp_Intrinsic right

                        ScaOp_Identifier QCOL: [AdventureWorks2022].[HumanResources].[Department].Name

                        ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)

                    ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=4) XVAR(nvarchar,Owned,Value=Len,Data = (4,7282))

                ScaOp_Comp x_cmpGe

                    ScaOp_Intrinsic right

                        ScaOp_Identifier QCOL: [AdventureWorks2022].[HumanResources].[Department].Name

                        ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)

                    ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=4) XVAR(nvarchar,Owned,Value=Len,Data = (4,7284))

*****************************************

here you can see for each range it included H but in after project normalization:


*** Tree After Project Normalization ***
        LogOp_Select

            LogOp_Project

                LogOp_Get TBL: HumanResources.Department HumanResources.Department TableID=1381579960 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

                AncOp_PrjList 

                    AncOp_PrjEl COL: Expr1002 

                        ScaOp_Intrinsic right

                            ScaOp_Identifier QCOL: [AdventureWorks2022].[HumanResources].[Department].Name

                            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=2)

            ScaOp_Logical x_lopAnd

                ScaOp_Comp x_cmpGe

                    ScaOp_Identifier COL: Expr1002 

                    ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=4) XVAR(nvarchar,Owned,Value=Len,Data = (4,7282))

                ScaOp_Comp x_cmpGe

                    ScaOp_Identifier COL: Expr1002 

                    ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=4) XVAR(nvarchar,Owned,Value=Len,Data = (4,7284))

the optimizer precomputed the H in Expr1002 then it references in the and

what else?

Pushing down derived column expression to the joins:

for example:

SELECT P.MiddleName +'_suffix_'+ P.Suffix AS WHATEVER
FROM Person.Person P INNER JOIN Person.Password S ON
                     P.BusinessEntityID = S.BusinessEntityID
OPTION (RECOMPILE,querytraceon 8621, QUERYTRACEON 8605, QUERYTRACEON 8606, QUERYTRACEON 8607, QUERYTRACEON 3604);

In before:

*** Tree Before Project Normalization ***
        LogOp_Project

            LogOp_Join

                LogOp_Get TBL: Person.Person(alias TBL: P) Person.Person TableID=2101582525 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

                LogOp_Get TBL: Person.Password(alias TBL: S) Person.Password TableID=2053582354 TableReferenceID=0 IsRow: COL: IsBaseRow1001 

                ScaOp_Comp x_cmpEq

                    ScaOp_Identifier QCOL: [S].BusinessEntityID

                    ScaOp_Identifier QCOL: [P].BusinessEntityID

            AncOp_PrjList 

                AncOp_PrjEl COL: Expr1002 

                    ScaOp_Arithmetic x_aopAdd

                        ScaOp_Arithmetic x_aopAdd

                            ScaOp_Identifier QCOL: [P].MiddleName

                            ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=16) XVAR(nvarchar,Owned,Value=Len,Data = (16,9511511710210210512095))

                        ScaOp_Identifier QCOL: [P].Suffix

*****************************************

you can see that the LogOp_Projection has two child operators one for the join and one for the expression

but in after:


*** Tree After Project Normalization ***
        LogOp_Join

            LogOp_Project

                LogOp_Get TBL: Person.Person(alias TBL: P) Person.Person TableID=2101582525 TableReferenceID=0 IsRow: COL: IsBaseRow1000 

                AncOp_PrjList 

                    AncOp_PrjEl COL: Expr1002 

                        ScaOp_Arithmetic x_aopAdd

                            ScaOp_Arithmetic x_aopAdd

                                ScaOp_Identifier QCOL: [P].MiddleName

                                ScaOp_Const TI(nvarchar collate 872468488,Var,Trim,ML=16) XVAR(nvarchar,Owned,Value=Len,Data = (16,9511511710210210512095))

                            ScaOp_Identifier QCOL: [P].Suffix

            LogOp_Get TBL: Person.Password(alias TBL: S) Person.Password TableID=2053582354 TableReferenceID=0 IsRow: COL: IsBaseRow1001 

            ScaOp_Comp x_cmpEq

                ScaOp_Identifier QCOL: [S].BusinessEntityID

                ScaOp_Identifier QCOL: [P].BusinessEntityID

****************************************

now the LogOp_join pushed down the expression to the get of the Person.Person finished the expression and then joined.

and now we are done with rewriting the query logically and we are about to start generating physical operators.

see you in Part 2

3 responses to “Query Optimizer In SQL Server: Part 1 Logical Tree Generation”

Leave a Reply

Your email address will not be published. Required fields are marked *