Analysis of joins with 10053 CBO trace
This post continues from Part 1, diving deeper into the challenges posed by a suboptimal SQL query that led the optimizer to select a Cartesian Join combined with an overly expensive Hash Join.
But do not treat its content as limited to just a Cartesian Join.
It will also help you to better understand of the internals of the 10053 trace, especially costing of joins and the decision paths undertaken. It also explains the join permutations and gives an interesting example of cost-pruning.
To recap from the first post, the main issue can be summarized as follows:
When statistics are gathered directly after a table truncation, they obviously reflect zero rows and blocks. If these tables are subsequently populated without refreshing the statistics, the Cost-Based Optimizer (CBO) assumes a row count of 1 (rounded up from 0).
Here’s where the problem arises: with three tables involved in two joins, the first join usually defaults to a Cartesian Join because the optimizer assumes cardinality of 1 for the two selected tables.
In reality, the actual row count is usually much larger which – due to the way Cartesian Join processes the participating rows – leads to a massive intermediate rowset.
But that’s just one side of the issue.
This large rowset then becomes the basis for the Hash Join in the second step. The resulting hash table is far too large to fit in memory, spilling over to TEMP. And that’s the other half of the problem—TEMP usage skyrockets, significantly degrading performance and wasting resources.
This naturally leads to an important question: what caused the optimizer to choose such a suboptimal SQL plan involving a Cartesian Join, especially when a set of straightforward hints enforcing two hash joins resulted in significantly better performance? See the previous post for details.
To be fair, the root issue lies in the optimizer receiving highly inaccurate data, which leads to incorrect cost calculations in the first place.
What’s striking (and fortunately not typical), in such case the comparisons among the costs is carried out based on tiny numbers differing at the second or even third decimal point. To give you example: 4.000380 “won” with “4.016386”, as you’ll spot further.
On the other hand you might raise a doubt (to be somewhat malicious) – it could deviate in the other direction and “make wrongly right decision” ;- )
At the first place, what stands out is the striking similarity in the cost estimates for the two plans.
When comparing the SQL plans—specifically the critical Plan IDs (Plan ID = 2) in both: the Cartesian Join plan and the two-Hash-Join plan—they both report the same cost: 4. This uniformity in cost is particularly perplexing given the dramatic difference in execution time.
Even more: as you can see, the final cost for both queries amounts to 6. So at the first glance, just looking at the costing in the “Cost (%CPU)” column for both SQL Plans there is no way to explain it.
SQL Plan does not display exact values but instead rounds floating-point numbers to integers, so only deeper investigation into the internals of 10053 trace can uncover the underlying causes.
Why, then, was the Cartesian join selected over the very efficient Hash-Join?
These details will be explored now.
Let’s start from both SQL Plans.
As concluded from the part1 of this two-parts post, the Plan Id = 2 is the root cause, although it did not involve long time.
SQL Plan with Cartesian Join and the Hash Join (ela time: ~7 minutes)
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 813 | 6 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 813 | 6 (0)| 00:00:01 |
| 2 | MERGE JOIN CARTESIAN | | 1 | 542 | 4 (0)| 00:00:01 |
| 3 | TABLE ACCESS STORAGE FULL | STAG01 | 1 | 271 | 2 (0)| 00:00:01 |
| 4 | BUFFER SORT | | 1 | 271 | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS STORAGE FULL| STAG03 | 1 | 271 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS STORAGE FULL | STAG02 | 1 | 271 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
SQL Plan with two Hash Joins (ela time: ~1 sec):
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 813 | 6 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 813 | 6 (0)| 00:00:01 |
|* 2 | HASH JOIN | | 1 | 542 | 4 (0)| 00:00:01 |
| 3 | TABLE ACCESS STORAGE FULL| STAG01 | 1 | 271 | 2 (0)| 00:00:01 |
| 4 | TABLE ACCESS STORAGE FULL| STAG02 | 1 | 271 | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS STORAGE FULL | STAG03 | 1 | 271 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Taking the opportunity, I can't refrain from emphasizing again (and again) how helpful the 10053 trace is when in need of in-depth exploring the decisions of the Oracle optimizer.
There are many publications explaining the internals of the 10053 trace, yet I am afraid until now there is none which would be the complete compendium.
Not a big surprise, all in all it's the Oracle internal event whose content is constantly updated to conform to the new CBO features, so it's constantly maturing, thus its content changes.
I have made an attempt to explain an excerpt of it in this post. You'll also find there some references to other useful articles and YB video.
Examining of the selected sections of the 10053 trace file
Let us dive deeper into the internals of 10053 CBO trace.
To accomplish that I will force a hard parse of my test SQL (you’ll find it below) and will dump the CBO’s decisions to a text file.
Here it is :
20:09:31 SYS@CDBTST1->PDBPS> @tr10053
20:09:38 SYS@CDBTST1->PDBPS> ALTER SESSION SET TRACEFILE_IDENTIFIER='&file_identifier';
Enter value for file_identifier: cartj_hj
old 1: ALTER SESSION SET TRACEFILE_IDENTIFIER='&file_identifier'
new 1: ALTER SESSION SET TRACEFILE_IDENTIFIER='cartj_hj'
Session altered.
-- invoking of 10053 tracing :
20:09:44 SYS@CDBTST1->PDBPS> alter session set events '10053 trace name context forever, level 1';
Session altered.
20:09:46 SYS@CDBTST1->PDBPS> @sett
20:09:49 SYS@CDBTST1->PDBPS> set autotr trace expl
20:09:49 SYS@CDBTST1->PDBPS>
-- forcing hard parse (not really running the query) :
20:09:50 SYS@CDBTST1->PDBPS> @testcartnew
20:09:54 SYS@CDBTST1->PDBPS> select /* &comment */
20:09:54 2 /*+
20:09:54 3 monitor
20:09:54 11 */
20:09:54 12 a.pad, b.pad, c.pad
20:09:54 13 --a.paddie, b.paddie, c.paddie
20:09:54 14 from DELME.STAG01 a, DELME.STAG02 b, DELME.STAG03 c
20:09:54 15 where a.id = b.id and b.id = c.id
20:09:54 16 /
Enter value for comment: 10053_ev
old 1: select /* &comment */
new 1: select /* 10053_ev */
Execution Plan
----------------------------------------------------------
Plan hash value: 3256225140
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 813 | 6 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 813 | 6 (0)| 00:00:01 |
| 2 | MERGE JOIN CARTESIAN | | 1 | 542 | 4 (0)| 00:00:01 |
| 3 | TABLE ACCESS STORAGE FULL | STAG01 | 1 | 271 | 2 (0)| 00:00:01 |
| 4 | BUFFER SORT | | 1 | 271 | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS STORAGE FULL| STAG03 | 1 | 271 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS STORAGE FULL | STAG02 | 1 | 271 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."ID"="B"."ID" AND "B"."ID"="C"."ID")
In the diag folder (formerly: bdump) where all the *.trc and alert.log resides (generally, diagnostic destination) there will be the 10053 trace file: CDBTST1_ora_346481_cartj_hj.trc being the dump of CBO’s decisions.
For quick analysis I handle it with the usual vi and set up the following while in the vi editor :
:set ic (ignoring the case while searching for some well known anchor strings, but not well enough to remember whether they are there in lower or upper case 😉
:set number (numbering the line for easier navigating).
Having mentioned that, the numbers on the left hand side in all subsequent excerpts of the trace file are the adds-on from the vi after setting the :set number.
BASE STATISTICAL INFORMATION
Let’s start from the “BASE STATISTICAL INFORMATION” :
1041 ***************************************
1042 BASE STATISTICAL INFORMATION
1043 ***********************
1044 Table Stats::
1045 Table: STAG03 Alias: C
1046 #Rows: 0 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 0.00 NEB: 0 ChainCnt: 0.00 ScanRate: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
1047 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000
1048 Column (#1): ID(NUMBER)
1049 AvgLen: 22 NDV: 0 Nulls: 0 Density: 0.000000
1050 ***********************
1051 Table Stats::
1052 Table: STAG02 Alias: B
1053 #Rows: 0 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 0.00 NEB: 0 ChainCnt: 0.00 ScanRate: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
1054 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000
1055 Column (#1): ID(NUMBER)
1056 AvgLen: 22 NDV: 0 Nulls: 0 Density: 0.000000
1057 ***********************
1058 Table Stats::
1059 Table: STAG01 Alias: A
1060 #Rows: 0 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 0.00 NEB: 0 ChainCnt: 0.00 ScanRate: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
1061 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000
1062 Column (#1): ID(NUMBER)
1063 AvgLen: 22 NDV: 0 Nulls: 0 Density: 0.000000
Several things worth noticing.
First, even though the BLOCKS in dba_tables report 0 (zero) the CBO assumes 1 (#Blks: 1).
It knows (has such false knowledge) the table is empty since #Rows: 0, nonetheless it applies such up-rounds in several places (e.g. see “Card: Original: 0.000000 Rounded: 1” later in subsequent sections of the trace). Perhaps it’s for the sake of avoiding divisions by 0 (just my theory though).
SINGLE TABLE ACCESS PATH
Then, let’s navigate to “SINGLE TABLE ACCESS PATH” sections.
These sections give us the notion of the cost of full table scans for each table participating in the query.
As you can see, for every participating table the “Cost: 2.000190”. So this won’t be any factor which would differ while searching for any oddities.
Furthermore, focusing on Plan Id = 3: whichever join would be chosen as the first most nested join (Cartesian or Hash Join), the outer/driving table on both cases is STAG01 (A) accessed with the full table scan is assigned the same cost (now we know it’s precisely the same cost as we can see the exact numbers without rounding).
So we are still not close to the true reason of such disastrous outcome, since the cost of the FTS in all cases is the same: Cost: 2.000190.
Note, that’ll be the Plan Id = 3 in both SQL Plans. You can find that number in the column “Cost (%CPU)”, obviously rounded to integer value: 2.
1077 SINGLE TABLE ACCESS PATH
1078 Single Table Cardinality Estimation for STAG01[A]
1079 SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
1080 Table: STAG01 Alias: A
1081 Card: Original: 0.000000 Rounded: 1 Computed: 0.000000 Non Adjusted: 0.000000
1082 Scan IO Cost (Disk) = 2.000000
1083 Scan CPU Cost (Disk) = 7121.440000
1084 Total Scan IO Cost = 2.000000 (scan (Disk))
1085 = 2.000000
1086 Total Scan CPU Cost = 7121.440000 (scan (Disk))
1087 = 7121.440000
1088 Access Path: TableScan
1089 Cost: 2.000190 Resp: 2.000190 Degree: 0
1090 Cost_io: 2.000000 Cost_cpu: 7121
1091 Resp_io: 2.000000 Resp_cpu: 7121
1092 Best:: AccessPath: TableScan
1093 Cost: 2.000190 Degree: 1 Resp: 2.000190 Card: 0.000000 Bytes: 0.000000
1094
1095 Access path analysis for STAG02
1096 ***************************************
1097 SINGLE TABLE ACCESS PATH
1098 Single Table Cardinality Estimation for STAG02[B]
1099 SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
1100 Table: STAG02 Alias: B
1101 Card: Original: 0.000000 Rounded: 1 Computed: 0.000000 Non Adjusted: 0.000000
1102 Scan IO Cost (Disk) = 2.000000
1103 Scan CPU Cost (Disk) = 7121.440000
1104 Total Scan IO Cost = 2.000000 (scan (Disk))
1105 = 2.000000
1106 Total Scan CPU Cost = 7121.440000 (scan (Disk))
1107 = 7121.440000
1108 Access Path: TableScan
1109 Cost: 2.000190 Resp: 2.000190 Degree: 0
1110 Cost_io: 2.000000 Cost_cpu: 7121
1111 Resp_io: 2.000000 Resp_cpu: 7121
1112 Best:: AccessPath: TableScan
1113 Cost: 2.000190 Degree: 1 Resp: 2.000190 Card: 0.000000 Bytes: 0.000000
1114
1115 Access path analysis for STAG03
1116 ***************************************
1117 SINGLE TABLE ACCESS PATH
1118 Single Table Cardinality Estimation for STAG03[C]
1119 SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
1120 Table: STAG03 Alias: C
1121 Card: Original: 0.000000 Rounded: 1 Computed: 0.000000 Non Adjusted: 0.000000
1122 Scan IO Cost (Disk) = 2.000000
1123 Scan CPU Cost (Disk) = 7121.440000
1124 Total Scan IO Cost = 2.000000 (scan (Disk))
1125 = 2.000000
1126 Total Scan CPU Cost = 7121.440000 (scan (Disk))
1127 = 7121.440000
1128 Access Path: TableScan
1129 Cost: 2.000190 Resp: 2.000190 Degree: 0
1130 Cost_io: 2.000000 Cost_cpu: 7121
1131 Resp_io: 2.000000 Resp_cpu: 7121
1132 Best:: AccessPath: TableScan
1133 Cost: 2.000190 Degree: 1 Resp: 2.000190 Card: 0.000000 Bytes: 0.000000
Side note:
If you’re curious how this is cost is calculated visit here and search for the string “First, let’s jump to the formula of costing of the full table scan” – this will bring you directly to the relevant formula and the example.
For that you’ll need the following data, also reported in the trace. To be fair, after applying the relevant values you’ll get just the I/O cost = 2.000000 (Scan IO Cost (Disk) = 2.000000). The tiny number: 0.000190 comes from the costing of the CPU, skipped (CPU costing will be explained later).
-----------------------------
SYSTEM STATISTICS INFORMATION
-----------------------------
Using dictionary system stats.
Using NOWORKLOAD Stats
CPUSPEEDNW: 3125 millions instructions/sec (default is 100)
IOTFRSPEED: 4096 bytes per millisecond (default is 4096)
IOSEEKTIM: 10 milliseconds (default is 10)
MBRC: 128 blocks (default is 8)
CPUSPEED - CPU Speed
IOTFRSPEED - I/O transfer speed
IOSEEKTIM - I/O seek time
SREADTIM - average single block read time
MREADTIM - average multiblock read time
MBRC - average multiblock read count
MAXTHR - maximum I/O system throughput
SLAVETHR - average slave I/O throughput
The root cause comes with the costing of the join permutations.
First, some theory.
If you have 3 tables: A, B, and C, the optimizer considers such join order:
(A joined to B) then joined to C
(A joined to C) then joined to B
(B joined to A) then joined to C
(B joined to C) then joined to A
(C joined to A) then joined to B
(C joined to B) then joined to A
The number of permutations analyzed during query optimization is, in general, N! (N factorial).
Let’s pause here and think to consider just how computationally expensive a hard parse can become, especially when dealing with a larger number of tables.
The process involves a significant amount of floating-point math operations. For each permutation, the optimizer evaluates three types of joins—Nested Loop (NL), Sort Merge (SM), and Hash Join (HA)—each consuming CPU cycles.
Thankfully, the CBO makes itself life easier and doesn’t aim to analyze every possible scenario. It employs a few strategies to simplify the process:
- Cost Pruning:
If the cost of a join at a specific level of nesting exceeds the best plan cost computed so far, the CBO abandons further calculations for that permutation set (see the example below). - Permutation Limiting:
The optimizer caps the number of permutations to 2000 per query block, controlled by the hidden parameter_optimizer_max_permutations
. This acts as a safety mechanism. Realistically, even approaching 2000 permutations per query block is quite “generous.”, isn’t it ..? - Constraint-Based Elimination:
Constraints like primary/foreign keys or join dependencies help filter out invalid or redundant join orders. - Query Rewrite:
The CBO can leverage query rewrites to reduce the number of costed joins.
However, none of these optimizations eliminate the fact that hard parsing is an extremely CPU-intensive operation, especually when a query involves numerous tables.
In our case with three tables the number of permutations is indeed: 3! = 6 (see “Join order [1-6]” below).
Even so, we observed cost pruning in action—evidenced by the message:
“Join order aborted: cost > best plan cost” – so effectively, just two permutation scenarios have been computed.
This confirms that calculations were skipped once the estimated cost exceeded the best cost computed so far. Note, “skipped” does not mean “free ride” – before the CBO could make such decision it still had to involve CPU-cycles to reach that point, thus to carry out certain calculations.
Red-marked is the one which was selected with the lowest cost computed (we will see it soon).
# cat -n CDBTST1_ora_346481_cartj_hj.trc | grep "Join order"
1149 Join order[1]: STAG01[A]#0 STAG02[B]#1 STAG03[C]#2
1289 Join order[2]: STAG01[A]#0 STAG03[C]#2 STAG02[B]#1
1390 Join order[3]: STAG02[B]#1 STAG01[A]#0 STAG03[C]#2
1523 Join order aborted: cost > best plan cost
1525 Join order[4]: STAG02[B]#1 STAG03[C]#2 STAG01[A]#0
1658 Join order aborted: cost > best plan cost
1660 Join order[5]: STAG03[C]#2 STAG01[A]#0 STAG02[B]#1
1754 Join order aborted: cost > best plan cost
1756 Join order[6]: STAG03[C]#2 STAG02[B]#1 STAG01[A]#0
1889 Join order aborted: cost > best plan cost
Details of costing of the joins based on the 10053 trace
As mentioned, the root cause of the suboptimal performance can be explained with the costing of the join permutations.
Let’s take a closer look at some excerpts from the 10053 trace, which reports the cost of joins. The numbered lines are introduced to help navigate the trace file more easily. This way, you can quickly see how many lines have been skipped.
FYI: The second permutation, which was ultimately selected, is included in its entirety.
In our case, the CBO began with the most straightforward join order – see below.
I’ve added inline comments directly into the trace text (bold, italics). The rest of the text remains as it appears in the original trace dump.
1149 Join order[1]: STAG01[A]#0 STAG02[B]#1 STAG03[C]#2
1150
1151 ***************
1152 Now joining: STAG02[B]#1 --> from here CBO calculates the cost of join of STAG01 to STAG02 going through several join scenarios of which the Hash Join has been evaluated as the one with the lowest cost (other two evaluations are skipped)
1211 HA Join
1212 HA cost: 4.016386
1213 resc: 4.016386 resc_io: 4.000000 resc_cpu: 614493
1214 resp: 4.016386 resp_io: 4.000000 resp_cpu: 614493
1215 Best:: JoinMethod: Hash
1216 Cost: 4.016386 Degree: 1 Resp: 4.016386 Card: 0.000000 Bytes:
Then, once it has the calculated cost of the join scenario with the lowest cost (it happens to be HA = Hash Join) it jumps to evaluating of the cost of joining that product join to the STAG03 table.
See “Outer table: Card: 0.000000 Cost: 4.016386 Resp: 4.016386 Degree: 1” which in fact is the product of the join between the STAG01 and STAG02 being considered the outer table for the NL (“nested loop”) join.
1218 ***************
1219 Now joining: STAG03[C]#2 --> evaluating of the cost of the join between: (A & B - evaluated above) with C
1220 ***************
1221 NL Join
1222 Outer table: Card: 0.000000 Cost: 4.016386 Resp: 4.016386 Degree: 1 Bytes:
1234 Inner table: STAG03 Alias: C
1235 Access Path: TableScan
1236 NL Join: Cost: 6.016576 Resp: 6.016576 Degree: 1
1237 Cost_io: 6.000000 Cost_cpu: 621614
1238 Resp_io: 6.000000 Resp_cpu: 621614
(…)
(other join scenarios probed)
(…)
Then finally it ends up with the one which yells the lowest cost (Hash Join) :
1278 HA Join
1279 HA cost: 6.032583
1280 resc: 6.032583 resc_io: 6.000000 resc_cpu: 1221864
1281 resp: 6.032583 resp_io: 6.000000 resp_cpu: 1221864
1282 Best:: JoinMethod: Hash
1283 Cost: 6.032583 Degree: 1 Resp: 6.032583 Card: 0.000000 Bytes:
Then at the end of that subsection :
1284 ***********************
1285 Best so far: Table#: 0 cost: 2.000190 card: 0.000000 bytes: 271.000000
1286 Table#: 1 cost: 4.016386 card: 0.000000 bytes: 542.000000
1287 Table#: 2 cost: 6.032583 card: 0.000000 bytes: 813.000000
1288 ***********************
Conveniently, the 10053 trace provides a cost summary for all evaluated joins, or (the first line) cost of accessing the first table.
It’s worth noting that the calculated (join) cardinality for each phase is shown as 0, whereas the SQL Plan reports it as 1.
So, how the above summary should exactly be interpreted?
Here’s the summary with highlights :
1285 Best so far: Table#: 0 cost: 2.000190 card: 0.000000 bytes: 271.000000
represents the cost of accessing (with FTS) the first table: STAG01 (A) in the probed join order. Reach out to the SINGLE TABLE ACCESS PATH section, line 1093
1286 Table#: 1 cost: 4.016386 card: 0.000000 bytes: 542.000000
Represents the cost of join STAG01 to STAG02 (with Hash Join, not mentioned in the summary). So Table#: 1 is in fact a join product
Finally :
1287 Table#: 2 cost: 6.032583 card: 0.000000 bytes: 813.000000
represents the cost of join of the product of join of (STAG01 and STAG2) to STAG03, also with Hash Join. Again, Table#: 2 stands for the product of a join with a table STAG03 (C)
Malicious comment – Had the CBO kept the above join scenario (FTS – HJ – HJ) it would have done much better. It obviously jumped to further evaluations though which unfortunately led it to the scenario: FTS – Cart.Join – HJ (turning out the disaster).
Let’s see the root causes below – it will only be clearly seen once
So, let’s examine the permutation which ultimately was chosen (the one with the lowest costs).
This time I kept the entire excerpt (see the line numbering), putting some comments between the lines, as before (bold, italics).
For better visibility I included the SQL Plan for reference along with the marked IDs corresponding with the immediate sections of 10053 below.
First, as a reminder:
Execution Plan
----------------------------------------------------------
SQL_ID: 1mmtvzb7u2nts
Plan hash value: 3256225140
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 813 | 6 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 813 | 6 (0)| 00:00:01 |
| 2 | MERGE JOIN CARTESIAN | | 1 | 542 | 4 (0)| 00:00:01 | -> at first, this join will be evaluated below in 10053
| 3 | TABLE ACCESS STORAGE FULL | STAG01 | 1 | 271 | 2 (0)| 00:00:01 |
| 4 | BUFFER SORT | | 1 | 271 | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS STORAGE FULL| STAG03 | 1 | 271 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS STORAGE FULL | STAG02 | 1 | 271 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
The first join evaluation (STAG01 to STAG03) :
1289 Join order[2]: STAG01[A]#0 STAG03[C]#2 STAG02[B]#1
1290
1291 ***************
1292 Now joining: STAG03[C]#2 -> from here CBO calculates the cost of join of STAG01 to STAG03 :
1293 ***************
1294 NL Join
1295 Outer table: Card: 0.000000 Cost: 2.000190 Resp: 2.000190 Degree: 1 Bytes: -> outer table STAG01 accessed with FTS, costed before: 2.000190
1296 Access path analysis for STAG03
1297 Scan IO Cost (Disk) = 2.000000
1298 Scan CPU Cost (Disk) = 7121.440000
1299 Total Scan IO Cost = 2.000000 (scan (Disk))
1300 = 2.000000
1301 Total Scan CPU Cost = 7121.440000 (scan (Disk))
1302 = 7121.440000
1303 Inner table: STAG03 Alias: C
1304 Access Path: TableScan
1305 NL Join: Cost: 4.000380 Resp: 4.000380 Degree: 1
1306 Cost_io: 4.000000 Cost_cpu: 14243
1307 Resp_io: 4.000000 Resp_cpu: 14243
1308
1309 Best NL cost: 4.000380
1310 resc: 4.000380 resc_io: 4.000000 resc_cpu: 14243
1311 resp: 4.000380 resp_io: 4.000000 resc_cpu: 14243
1312 SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
1313 Join Card: 0.000000 = outer (0.000000) * inner (0.000000) * sel (1.000000) -> calc. join cardinality: 0 as multiplication with (at least) one factor = 0.
1314 Join Card - Rounded: 1 Computed: 0.000000 -> rounded to 1 and this is the value displayed in the SQL Plan ID = 2
1315 Best:: JoinMethod: NestedLoop -> In reality, Cartesian Join!
1316 Cost: 4.000380 Degree: 1 Resp: 4.000380 Card: 0.000000 Bytes:
-> to be continued in line 1317 below. Some important comments first.
Having the cost of join of STAG01 to STAG03 calculated (4.000380) the CBO starts with evaluating of the cost of joining of this join product to STAG02 (see below).
So, talking aliases and join ordering, having calculated the cost of a join (A to C) in the previous section now it evaluates the cost of join of the product (A to C) with B.
Some important notes.
1.
First, the “Cost” column of the (A to C) in the SQL Plan has been rounded to 4 from 4.000380 (Plan Id = 2) – but that’s already known, just mentioning for the record.
2.
Secondly, neither HJ nor SM scenario has been evaluated in this section.
The reason for that is unknown to me. I’d risk the theory, CBO just skipped them as the outcome numbers for a join between (STAG01 and STAG02) vs. (STAG01 and STAG03) wouldn’t differ as far as SM and HJ are concerned since the BASE STATISTICAL INFORMATION for all three tables are identical (the state of CBO’s knowledge: they are empty), therefore perhaps the numbers wouldn’t change upon simply just reverting the order of the tables.
3.
Admittedly though, the change of the join order from:
STAG01 -> STAG02 (Cost: 4.016386)
to
STAG01 -> STAG03 (Cost: 4.000380)
(just slightly) decreased the cost, but consequently surely negatively influenced the elapsed time and resource utilization.
Why would the reverting of that join order matter?
Because the join cardinality of the cartesian join (it would surely be still Cartesian-Join) between STAG01 with 1’000 rows and STAG03 with 100’000 made the cardinality: 100’000’000 (1000 x 100’000), whereas the cardinality of the cartesian join between STAG01 with 1’000 and STAG02 with 10’000 would yell “just” 10’000’000 (1’000 x 10’000) – a magnitude lower.
Recall, the product of such cartesian join then served as the input for creating the Hash-Table (Plan Id = 2) of the Hash-Join (Plan Id = 1).
Thus, the size and effort of building of the Hash-Table based on 100’000’000 rows as oppose to 10’000’000 rows makes order of magnitude difference.
But this is just a speculation “what if”- ultimately it didn’t happen.
4.
Most importantly: albeit the costing of the first, most nested join was the critical moment, however not the cartesian join itself took the longest time.
It was the Hash-Join “up-higher” in the SQL Plan which received a huge rowset for its Hash-Table to be built (manifested by large number of waits “direct path write temp” in the SQL Monitor – see the previous post for clarity) which then it probed against the STAG02 (accompanied by “direct path read temp”).
5.
As shortly mentioned somewhere before, that alone still does not represent the entire problem.
As with Hash-Joins, the Hash-Table contains not only the join predicates (so, IDs of type NUMBER) but also all other columns which are required to satisfy the query (block). This is why I introduced two test columns: PADDIE of VARCHAR2(32) and PAD of VARCHAR(512) and will show it later (in the next post) that the size (and time of building) of the Hash-Table will differ depending on the columns selected “somewhere” in the query.
This is to be seen e.g. within the “Column Projection Information (identified by operation id)” section of the SQL Plan.
This also makes the good opportunity to assess the space required in the TEMP tablespace, provided you’d be able to fairly accurately estimate the join cardinality of the Hash-Join of interest.
Coming back to the main topic though.
SQL Plan for quick reminder.
As before – red-marked is Plan Id which will be evaluated in the subsequent section of 10053 trace. Bold font is used to mark its children :
Execution Plan
----------------------------------------------------------
SQL_ID: 1mmtvzb7u2nts
Plan hash value: 3256225140
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 813 | 6 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 813 | 6 (0)| 00:00:01 | -> this join will be evaluated below
| 2 | MERGE JOIN CARTESIAN | | 1 | 542 | 4 (0)| 00:00:01 |
| 3 | TABLE ACCESS STORAGE FULL | STAG01 | 1 | 271 | 2 (0)| 00:00:01 |
| 4 | BUFFER SORT | | 1 | 271 | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS STORAGE FULL| STAG03 | 1 | 271 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS STORAGE FULL | STAG02 | 1 | 271 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
So, now the Plan Id = 1 will be evaluated with its children in Plan Id=2 (Cartesian Join which serves as the input for the Hash-Table) and Plan Id=6 (probed table for a Hash-Join). Referencing the join order, now we are to analyze the join between already costed join (A-C) with B. Note the red-marked three scenarios: NL (Nested Loop), SM (Sort Merge) and HA (Hash Join).
Continuing from line 1316 :
1317
1318 ***************
1319 Now joining: STAG02[B]#1
1320 ***************
1321 NL Join
1322 Outer table: Card: 0.000000 Cost: 4.000380 Resp: 4.000380 Degree: 1 Bytes: -> “Outer table” here in the NL join should be understood as the join product of STAG01 with STAG03.
1323 Access path analysis for STAG02
1324 Scan IO Cost (Disk) = 2.000000
1325 Scan CPU Cost (Disk) = 7121.440000
1326 Cost of predicates:
1327 io = NOCOST, cpu = 50.000000, sel = 0.000000 flag = 2048 ("A"."ID"="B"."ID")
1328 io = NOCOST, cpu = 50.000000, sel = 0.000000 flag = 2048 ("B"."ID"="C"."ID")
1329 Total Scan IO Cost = 2.000000 (scan (Disk))
1330 + 0.000000 (io filter eval) (= 0.000000 (per row) * 0.000000 (#rows))
1331 = 2.000000
1332 Total Scan CPU Cost = 7121.440000 (scan (Disk))
1333 + 0.000000 (cpu filter eval) (= 50.000000 (per row) * 0.000000 (#rows))
1334 = 7121.440000
1335 Inner table: STAG02 Alias: B
1336 Access Path: TableScan
1337 NL Join: Cost: 6.000570 Resp: 6.000570 Degree: 1
1338 Cost_io: 6.000000 Cost_cpu: 21364
1339 Resp_io: 6.000000 Resp_cpu: 21364
1340
1341 Best NL cost: 6.000570
1342 resc: 6.000570 resc_io: 6.000000 resc_cpu: 21364
1343 resp: 6.000570 resp_io: 6.000000 resc_cpu: 21364
1344 SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
1345 Join Card: 0.000000 = outer (0.000000) * inner (0.000000) * sel (0.000000)
1346 Join Card - Rounded: 1 Computed: 0.000000
1347 Outer table: STAG03 Alias: C
1348 resc: 4.000380 card 0.000000 bytes: deg: 1 resp: 4.000380
1349 Inner table: STAG02 Alias: B
1350 resc: 2.000190 card: 0.000000 bytes: deg: 1 resp: 2.000190
1351 using dmeth: 2 #groups: 1
1352 SORT ressource Sort statistics
1353 Sort width: 898 Area size: 1048576 Max Area size: 314572800
1354 Degree: 1
1355 Blocks to Sort: 1 Row size: 607 Total Rows: 1
1356 Initial runs: 1 Merge passes: 0 IO Cost / pass: 0
1357 Total IO sort cost: 0.000000 Total CPU sort cost: 37500000
1358 Total Temp space used: 0
1359 SORT ressource Sort statistics
1360 Sort width: 898 Area size: 1048576 Max Area size: 314572800
1361 Degree: 1
1362 Blocks to Sort: 1 Row size: 309 Total Rows: 1
1363 Initial runs: 1 Merge passes: 0 IO Cost / pass: 0
1364 Total IO sort cost: 0.000000 Total CPU sort cost: 37500000
1365 Total Temp space used: 0
1366 SM join: Resc: 8.000570 Resp: 8.000570 [multiMatchCost=0.000000]
1367 SM Join
1368 SM cost: 8.000570
1369 resc: 8.000570 resc_io: 6.000000 resc_cpu: 75021364
1370 resp: 8.000570 resp_io: 6.000000 resp_cpu: 75021364
1371 Outer table: STAG03 Alias: C
1372 resc: 4.000380 card 0.000000 bytes: deg: 1 resp: 4.000380
1373 Inner table: STAG02 Alias: B
1374 resc: 2.000190 card: 0.000000 bytes: deg: 1 resp: 2.000190
1375 using dmeth: 2 #groups: 1
1376 Cost per ptn: 0.016007 #ptns: 1
1377 hash_area: 508 (max=76800) buildfrag: 1 probefrag: 1 hmbio: 127 fanout: 8 ppasses = 1.0000
1378 Hash join: Resc: 6.016576 Resp: 6.016576 [multiMatchCost=0.000000]
1379 HA Join
1380 HA cost: 6.016576
1381 resc: 6.016576 resc_io: 6.000000 resc_cpu: 621614
1382 resp: 6.016576 resp_io: 6.000000 resp_cpu: 621614
1383 Best:: JoinMethod: Hash
1384 Cost: 6.016576 Degree: 1 Resp: 6.016576 Card: 0.000000 Bytes:
1385 ***********************
1386 Best so far: Table#: 0 cost: 2.000190 card: 0.000000 bytes: 271.000000
1387 Table#: 2 cost: 4.000380 card: 0.000000 bytes: 542.000000
1388 Table#: 1 cost: 6.016576 card: 0.000000 bytes: 813.000000
1389 ***********************
Considering, the rest 4 permutations were discarded, these two were eventually in play:
Previously analyzed permutation (A-B)-C :
1284 ***********************
1285 Best so far: Table#: 0 cost: 2.000190 card: 0.000000 bytes: 271.000000
1286 Table#: 1 cost: 4.016386 card: 0.000000 bytes: 542.000000
1287 Table#: 2 cost: 6.032583 card: 0.000000 bytes: 813.000000
1288 ***********************
Currently analyzed permutation (A-C)-B :
1385 ***********************
1386 Best so far: Table#: 0 cost: 2.000190 card: 0.000000 bytes: 271.000000
1387 Table#: 2 cost: 4.000380 card: 0.000000 bytes: 542.000000
1388 Table#: 1 cost: 6.016576 card: 0.000000 bytes: 813.000000
1389 ***********************
If you sum up these values you’ll come to terms that the cost of the second (and selected) permutation is just slightly lower – arguably, almost at the edge of the rounding error. Yet it’s obviously the winner.
I am trying to point out by that that typically the differences in costing are much better visible – the example below.
This time I will dump the outcome of the “healthy” conditions, that is, following gathering of the statistics. Under such condition the optimizer (obviously) made the best possible decision: two Hash-Joins in the best possible order.
I forced skipping creating and using the Bloom Filter (nothing against it, just for the reason of simplifying of the dump).
Mentioning that, I will show the SQL Plan accompanied with the runtime stats “TRACE EXPLAIN STATISTICS“, the output of the SQL Monitor and last but not least: the relevant excerpt of the 10053 trace (just joining section).
SQL Plan with runtime stats:
The important lines are marked red or put in bold.
19:26:24 SYS@CDBTST1->PDBPS> @tr10053
Enter value for file_identifier: HJ
old 1: ALTER SESSION SET TRACEFILE_IDENTIFIER='&file_identifier'
new 1: ALTER SESSION SET TRACEFILE_IDENTIFIER='HJ'
Session altered.
Elapsed: 00:00:00.00
Session altered.
Elapsed: 00:00:00.00
19:26:34 SYS@CDBTST1->PDBPS> @setts
19:26:36 SYS@CDBTST1->PDBPS> SET AUTOTRACE TRACE EXPLAIN STATISTICS
19:26:38 SYS@CDBTST1->PDBPS> @ssqlid
19:26:43 SYS@CDBTST1->PDBPS> set echo on
19:26:43 SYS@CDBTST1->PDBPS> set feedback on sql_id
19:26:43 SYS@CDBTST1->PDBPS> set echo off
19:26:45 SYS@CDBTST1->PDBPS> @setts
19:26:46 SYS@CDBTST1->PDBPS> SET AUTOTRACE TRACE EXPLAIN STATISTICS
19:26:46 SYS@CDBTST1->PDBPS> @testcartnew
19:26:51 SYS@CDBTST1->PDBPS> alter session set "_bloom_serial_filter" = off ;
Session altered.
19:26:51 SYS@CDBTST1->PDBPS> select /* &comment */
19:26:51 2 /*+
19:26:51 3 monitor
19:26:51 11 */
19:26:51 12 a.pad, b.pad, c.pad
19:26:51 13 --a.paddie, b.paddie, c.paddie
19:26:51 14 from DELME.STAG01 a, DELME.STAG02 b, DELME.STAG03 c
19:26:51 15 where a.id = b.id and b.id = c.id
19:26:51 16 /
Enter value for comment: HJ
old 1: select /* &comment */
new 1: select /* HJ */
1000 rows selected.
SQL_ID: 83h92yz13r012
Elapsed: 00:00:00.10
Execution Plan
----------------------------------------------------------
Plan hash value: 2925396405
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 1515K| 1524 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 1000 | 1515K| 1524 (1)| 00:00:01 |
|* 2 | HASH JOIN | | 1000 | 1009K| 168 (0)| 00:00:01 |
| 3 | TABLE ACCESS STORAGE FULL| STAG01 | 1000 | 504K| 15 (0)| 00:00:01 |
| 4 | TABLE ACCESS STORAGE FULL| STAG02 | 10000 | 5048K| 153 (0)| 00:00:01 |
| 5 | TABLE ACCESS STORAGE FULL | STAG03 | 100K | 49M| 1355 (1)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("B"."ID"="C"."ID")
2 - access("A"."ID"="B"."ID")
Statistics
----------------------------------------------------------
9 recursive calls
0 db block gets
8690 consistent gets
0 physical reads
0 redo size
21479 bytes sent via SQL*Net to client
1442 bytes received via SQL*Net from client
68 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000 rows processed
Output of SQL Monitor :
Note, the true cardinalities in every Plan Id below confirm the CBO estimations from above (Rows).
SQL Monitoring Report
SQL Text
------------------------------
select /* HJ */ /*+ monitor */ a.pad, b.pad, c.pad --a.paddie, b.paddie, c.paddie from DELME.STAG01 a, DELME.STAG02 b, DELME.STAG03 c where a.id = b.id and
b.id = c.id
Global Information
------------------------------
Status : DONE (ALL ROWS)
Instance ID : 1
Session : SYS (7395:61645)
SQL ID : 83h92yz13r012
SQL Execution ID : 16777216
Duration : .030109s
Fetch Calls : 68
Global Stats
=================================================
| Elapsed | Cpu | Other | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls | Gets |
=================================================
| 0.03 | 0.03 | 0.00 | 68 | 8690 |
=================================================
SQL Plan Monitoring Details (Plan Hash Value=2925396405)
=============================================================================================================================================
| Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Mem | Activity | Activity Detail |
| | | | (Estim) | | Active(s) | Active | | (Actual) | (Max) | (%) | (# samples) |
=============================================================================================================================================
| 0 | SELECT STATEMENT | | | | 1 | +0 | 1 | 1000 | . | | |
| 1 | HASH JOIN | | 1000 | 1524 | 1 | +0 | 1 | 1000 | 5MB | | |
| 2 | HASH JOIN | | 1000 | 168 | 1 | +0 | 1 | 1000 | 5MB | | |
| 3 | TABLE ACCESS STORAGE FULL | STAG01 | 1000 | 15 | 1 | +0 | 1 | 1000 | . | | |
| 4 | TABLE ACCESS STORAGE FULL | STAG02 | 10000 | 153 | 1 | +0 | 1 | 10000 | . | | |
| 5 | TABLE ACCESS STORAGE FULL | STAG03 | 100K | 1355 | 1 | +0 | 1 | 100K | . | | |
=============================================================================================================================================
Finally, the relevant section from the 10053 trace :
First, grepping after the “Join order”.
Just the first permutation, which turned out the cheapest, was fully evaluated – but this does not yell the entire truth, either – see below.
The 5 of all 6 (=3!) have been discarded. I will enclose the example of one of them – quite interesting how much work sticks behind even such abandoned permutation.
# cat -n CDBTST1_ora_91326_HJ.trc | grep "Join order"
selected, turned out the cheapest :
1142 Join order[1]: STAG01[A]#0 STAG02[B]#1 STAG03[C]#2
discarded :
1282 Join order[2]: STAG01[A]#0 STAG03[C]#2 STAG02[B]#1
1308 Join order aborted: cost > best plan cost
1310 Join order[3]: STAG02[B]#1 STAG01[A]#0 STAG03[C]#2
1451 Join order aborted: cost > best plan cost
1453 Join order[4]: STAG02[B]#1 STAG03[C]#2 STAG01[A]#0
1519 Join order aborted: cost > best plan cost
1521 Join order[5]: STAG03[C]#2 STAG01[A]#0 STAG02[B]#1
1547 Join order aborted: cost > best plan cost
1549 Join order[6]: STAG03[C]#2 STAG02[B]#1 STAG01[A]#0
1623 Join order aborted: cost > best plan cost
The full dump of the first winning permutation – see the relevance in the SQL Plan above under “Cost (CPU%))” :
Relevant sections marked red or bold, or just commented with (->italics bold)
1136 OPTIMIZER STATISTICS AND COMPUTATIONS
1137 ***************************************
1138 GENERAL PLANS
1139 ***************************************
1140 Considering cardinality-based initial join order.
1141 Permutations for Starting Table :0
1142 Join order[1]: STAG01[A]#0 STAG02[B]#1 STAG03[C]#2
1143
1144 ***************
1145 Now joining: STAG02[B]#1 -> meaning: join of the pair: (A with B)
1146 ***************
1147 NL Join -> costing of nested loop join of A with B (in this order)
1148 Outer table: Card: 1000.000000 Cost: 15.020259 Resp: 15.020259 Degree: 1 Bytes:
1149 Access path analysis for STAG02
1150 Scan IO Cost (Disk) = 151.358000
1151 Scan CPU Cost (Disk) = 8124138.560000
1152 Cost of predicates:
1153 io = NOCOST, cpu = 50.000000, sel = 0.000100 flag = 2048 ("A"."ID"="B"."ID")
1154 Total Scan IO Cost = 151.358000 (scan (Disk))
1155 + 0.000000 (io filter eval) (= 0.000000 (per row) * 10000.000000 (#rows))
1156 = 151.358000
1157 Total Scan CPU Cost = 8124138.560000 (scan (Disk))
1158 + 500000.000000 (cpu filter eval) (= 50.000000 (per row) * 10000.000000 (#rows))
1159 = 8624138.560000
1160 Inner table: STAG02 Alias: B
1161 Access Path: TableScan
1162 NL Join: Cost: 151602.997287 Resp: 151602.997287 Degree: 1
1163 Cost_io: 151373.000000 Cost_cpu: 8624898275
1164 Resp_io: 151373.000000 Resp_cpu: 8624898275
1165
1166 Best NL cost: 151602.997287 -> cost of the nested loop (A -> B). Very high
1167 resc: 151602.997287 resc_io: 151373.000000 resc_cpu: 8624898275
1168 resp: 151602.997287 resp_io: 151373.000000 resc_cpu: 8624898275
1169 SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
1170 Join Card: 1000.000000 = outer (1000.000000) * inner (10000.000000) * sel (1.0000e-04)
1171 Join Card - Rounded: 1000 Computed: 1000.000000
1172 Outer table: STAG01 Alias: A
1173 resc: 15.020259 card 1000.000000 bytes: deg: 1 resp: 15.020259
1174 Inner table: STAG02 Alias: B
1175 resc: 153.216644 card: 10000.000000 bytes: deg: 1 resp: 153.216644
1176 using dmeth: 2 #groups: 1
1177 SORT ressource Sort statistics
1178 Sort width: 898 Area size: 1048576 Max Area size: 314572800
1179 Degree: 1
1180 Blocks to Sort: 71 Row size: 579 Total Rows: 1000
1181 Initial runs: 1 Merge passes: 0 IO Cost / pass: 0
1182 Total IO sort cost: 0.000000 Total CPU sort cost: 37949004
1183 Total Temp space used: 0
1184 SORT ressource Sort statistics
1185 Sort width: 898 Area size: 1048576 Max Area size: 314572800
1186 Degree: 1
1187 Blocks to Sort: 709 Row size: 579 Total Rows: 10000
1188 Initial runs: 2 Merge passes: 1 IO Cost / pass: 312
1189 Total IO sort cost: 1021.000000 Total CPU sort cost: 60928830
1190 Total Temp space used: 10937000
1191 SM join: Resc: 1191.873645 Resp: 1191.873645 [multiMatchCost=0.000000]
1192 SM Join
1193 SM cost: 1191.873645
1194 resc: 1191.873645 resc_io: 1189.000000 resc_cpu: 107761688
1195 resp: 1191.873645 resp_io: 1189.000000 resp_cpu: 107761688
1196 Outer table: STAG01 Alias: A
1197 resc: 15.020259 card 1000.000000 bytes: deg: 1 resp: 15.020259
1198 Inner table: STAG02 Alias: B
1199 resc: 153.216644 card: 10000.000000 bytes: deg: 1 resp: 153.216644
1200 using dmeth: 2 #groups: 1
1201 Cost per ptn: 0.046667 #ptns: 1
1202 hash_area: 508 (max=76800) buildfrag: 65 probefrag: 646 hmbio: 127 fanout: 8 ppasses = 1.0000
1203 Hash join: Resc: 168.283569 Resp: 168.283569 [multiMatchCost=0.000000]
1204 HA Join
1205 HA cost: 168.283569
1206 resc: 168.283569 resc_io: 168.000000 resc_cpu: 10633854
1207 resp: 168.283569 resp_io: 168.000000 resp_cpu: 10633854
1208 Best:: JoinMethod: Hash
1209 Cost: 168.283569 Degree: 1 Resp: 168.283569 Card: 1000.000000 Bytes:
-> costing of the three types of joins of A->B, so NL, SM and HJ (in 10053: HA) completed with the Hash-Join as the lowest (SQL Plan Id: 2)
-> This does not complete processing of the entire permutation.
Now the cost of the join of the (A->B) with C will be computed. Also with three types: NL, SM and HJ evaluated :
1210
1211 ***************
1212 Now joining: STAG03[C]#2
1213 ***************
1214 NL Join
1215 Outer table: Card: 1000.000000 Cost: 168.283569 Resp: 168.283569 Degree: 1 Bytes: -> note the Cost: 168.283569 of the "outer table" - it's clearly the outcome of the previously selected "winner", so the Hash-Join of A-B.
1216 Access path analysis for STAG03
1217 Scan IO Cost (Disk) = 1351.475000
1218 Scan CPU Cost (Disk) = 74575717.760000
1219 Cost of predicates:
1220 io = NOCOST, cpu = 50.000000, sel = 0.000010 flag = 2048 ("B"."ID"="C"."ID")
1221 Total Scan IO Cost = 1351.475000 (scan (Disk))
1222 + 0.000000 (io filter eval) (= 0.000000 (per row) * 100000.000000 (#rows))
1223 = 1351.475000
1224 Total Scan CPU Cost = 74575717.760000 (scan (Disk))
1225 + 5000000.000000 (cpu filter eval) (= 50.000000 (per row) * 100000.000000 (#rows))
1226 = 79575717.760000
1227 Inner table: STAG03 Alias: C
1228 Access Path: TableScan
1229 NL Join: Cost: 1353765.302710 Resp: 1353765.302710 Degree: 1
1230 Cost_io: 1351643.000000 Cost_cpu: 79586351614
1231 Resp_io: 1351643.000000 Resp_cpu: 79586351614
1232
1233 Best NL cost: 1353765.302710
1234 resc: 1353765.302710 resc_io: 1351643.000000 resc_cpu: 79586351614
1235 resp: 1353765.302710 resp_io: 1351643.000000 resc_cpu: 79586351614
1236 SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
1237 Join Card: 1000.000000 = outer (1000.000000) * inner (100000.000000) * sel (1.0000e-05)
1238 Join Card - Rounded: 1000 Computed: 1000.000000
1239 Outer table: STAG02 Alias: B
1240 resc: 168.283569 card 1000.000000 bytes: deg: 1 resp: 168.283569
1241 Inner table: STAG03 Alias: C
1242 resc: 1354.988686 card: 100000.000000 bytes: deg: 1 resp: 1354.988686
1243 using dmeth: 2 #groups: 1
1244 SORT ressource Sort statistics
1245 Sort width: 898 Area size: 1048576 Max Area size: 314572800
1246 Degree: 1
1247 Blocks to Sort: 141 Row size: 1148 Total Rows: 1000
1248 Initial runs: 2 Merge passes: 1 IO Cost / pass: 64
1249 Total IO sort cost: 205.000000 Total CPU sort cost: 41417745
1250 Total Temp space used: 2352000
1251 SORT ressource Sort statistics
1252 Sort width: 898 Area size: 1048576 Max Area size: 314572800
1253 Degree: 1
1254 Blocks to Sort: 7101 Row size: 580 Total Rows: 100000
1255 Initial runs: 2 Merge passes: 1 IO Cost / pass: 3108
1256 Total IO sort cost: 10209.000000 Total CPU sort cost: 287025717
1257 Total Temp space used: 109241000
1258 SM join: Resc: 11946.030748 Resp: 11946.030748 [multiMatchCost=0.000000]
1259 SM Join
1260 SM cost: 11946.030748
1261 resc: 11946.030748 resc_io: 11935.000000 resc_cpu: 413653033
1262 resp: 11946.030748 resp_io: 11935.000000 resp_cpu: 413653033
1263 Outer table: STAG02 Alias: B
1264 resc: 168.283569 card 1000.000000 bytes: deg: 1 resp: 168.283569
1265 Inner table: STAG03 Alias: C
1266 resc: 1354.988686 card: 100000.000000 bytes: deg: 1 resp: 1354.988686
1267 using dmeth: 2 #groups: 1
1268 Cost per ptn: 0.286667 #ptns: 1
1269 hash_area: 508 (max=76800) buildfrag: 128 probefrag: 6470 hmbio: 127 fanout: 8 ppasses = 1.0000
1270 Hash join: Resc: 1523.558922 Resp: 1523.558922 [multiMatchCost=0.000000]
1271 HA Join
1272 HA cost: 1523.558922
1273 resc: 1523.558922 resc_io: 1521.000000 resc_cpu: 95959572
1274 resp: 1523.558922 resp_io: 1521.000000 resp_cpu: 95959572
1275 Best:: JoinMethod: Hash
1276 Cost: 1523.558922 Degree: 1 Resp: 1523.558922 Card: 1000.000000 Bytes:
1277 ***********************
1278 Best so far: Table#: 0 cost: 15.020259 card: 1000.000000 bytes: 517000.000000
1279 Table#: 1 cost: 168.283569 card: 1000.000000 bytes: 1034000.000000
1280 Table#: 2 cost: 1523.558922 card: 1000.000000 bytes: 1552000.000000
1281 ***********************
Again, very useful summary at the end of the “join section” :
1278 Best so far: Table#: 0 cost: 15.020259 card: 1000.000000 bytes: 517000.000000
1279 Table#: 1 cost: 168.283569 card: 1000.000000 bytes: 1034000.000000
1280 Table#: 2 cost: 1523.558922 card: 1000.000000 bytes: 1552000.000000
1281 ***********************
You can easily match the numbers from “cost:” above with the corresponding “Cost (%CPU)” from the SQL Plan (rounded) :
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 1515K| 1524 (1)| 00:00:01 |
|* 1 | HASH JOIN | | 1000 | 1515K| 1524 (1)| 00:00:01 |
|* 2 | HASH JOIN | | 1000 | 1009K| 168 (0)| 00:00:01 |
| 3 | TABLE ACCESS STORAGE FULL| STAG01 | 1000 | 504K| 15 (0)| 00:00:01 |
| 4 | TABLE ACCESS STORAGE FULL| STAG02 | 10000 | 5048K| 153 (0)| 00:00:01 |
| 5 | TABLE ACCESS STORAGE FULL | STAG03 | 100K| 49M| 1355 (1)| 00:00:01 |
--------------------------------------------------------------------------------------
One more surely interesting thing is, Cost Pruning (marked as “Join order aborted: cost > best plan cost” in the 10053 trace) also involves CPU cycles.
Typically it’s very efficient technique and brings quite a saving on CPU cycles, but I will show rather extreme example of Cost Pruning.
“Extreme”, since in that case the effort incurred “to save” seems to be even exceeding the full (and chosen) permutation. Admittedly, it’s rarely the case. Besides, it’s another opportunity to show several interesting spots.
Below the example of such Cost Pruning. Cost evaluation for the permutation for Join order[3]: STAG02[B]#1 STAG01[A]#0 STAG03[C]#2 (so: (B-A) – C ) with some interesting spottings. As before, pay attention to the inline comments marked with (->bold).
I deliberately left the entire dump of this discarded permutation to show the entire effort (see the consecutive line numbers on the left hand side).
1310 Join order[3]: STAG02[B]#1 STAG01[A]#0 STAG03[C]#2
1311
1312 ***************
1313 Now joining: STAG01[A]#0
1314 ***************
1315 NL Join
1316 Outer table: Card: 10000.000000 Cost: 153.216644 Resp: 153.216644 Degree: 1 Bytes:
1317 Access path analysis for STAG01
1318 Scan IO Cost (Disk) = 13.854300
1319 Scan CPU Cost (Disk) = 759715.200000
1320 Cost of predicates:
1321 io = NOCOST, cpu = 50.000000, sel = 0.000100 flag = 2048 ("A"."ID"="B"."ID")
1322 Total Scan IO Cost = 13.854300 (scan (Disk))
1323 + 0.000000 (io filter eval) (= 0.000000 (per row) * 1000.000000 (#rows))
1324 = 13.854300
1325 Total Scan CPU Cost = 759715.200000 (scan (Disk))
1326 + 50000.000000 (cpu filter eval) (= 50.000000 (per row) * 1000.000000 (#rows))
1327 = 809715.200000
1328 Inner table: STAG01 Alias: A
1329 Access Path: TableScan
1330 NL Join: Cost: 138912.140697 Resp: 138912.140697 Degree: 1
1331 Cost_io: 138696.000000 Cost_cpu: 8105276139
1332 Resp_io: 138696.000000 Resp_cpu: 8105276139
1333
1334 Best NL cost: 138912.140697 -> lower than previously calculated (A-B), but other join types will be calculated below :SM and HJ (HA)
1335 resc: 138912.140697 resc_io: 138696.000000 resc_cpu: 8105276139
1336 resp: 138912.140697 resp_io: 138696.000000 resc_cpu: 8105276139
1337 SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
1338 Join Card: 1000.000000 = outer (10000.000000) * inner (1000.000000) * sel (1.0000e-04)
1339 Join Card - Rounded: 1000 Computed: 1000.000000
1340 Outer table: STAG02 Alias: B
1341 resc: 153.216644 card 10000.000000 bytes: deg: 1 resp: 153.216644
1342 Inner table: STAG01 Alias: A
1343 resc: 15.020259 card: 1000.000000 bytes: deg: 1 resp: 15.020259
1344 using dmeth: 2 #groups: 1
1345 SORT ressource Sort statistics
1346 Sort width: 898 Area size: 1048576 Max Area size: 314572800
1347 Degree: 1
1348 Blocks to Sort: 709 Row size: 579 Total Rows: 10000
1349 Initial runs: 2 Merge passes: 1 IO Cost / pass: 312
1350 Total IO sort cost: 1021.000000 Total CPU sort cost: 60928830
1351 Total Temp space used: 10937000
1352 SORT ressource Sort statistics
1353 Sort width: 898 Area size: 1048576 Max Area size: 314572800
1354 Degree: 1
1355 Blocks to Sort: 71 Row size: 579 Total Rows: 1000
1356 Initial runs: 1 Merge passes: 0 IO Cost / pass: 0
1357 Total IO sort cost: 0.000000 Total CPU sort cost: 37949004
1358 Total Temp space used: 0
1359 SM join: Resc: 1191.873645 Resp: 1191.873645 [multiMatchCost=0.000000]
1360 SM Join
1361 SM cost: 1191.873645 -> the same as previously. No wonder, in both cases both tables need to be sorted, order does not matter
1362 resc: 1191.873645 resc_io: 1189.000000 resc_cpu: 107761688
1363 resp: 1191.873645 resp_io: 1189.000000 resp_cpu: 107761688
-> data for evaluating of the Hash Join (B-A) starts here :
1364 Outer table: STAG02 Alias: B
1365 resc: 153.216644 card 10000.000000 bytes: deg: 1 resp: 153.216644
1366 Inner table: STAG01 Alias: A
1367 resc: 15.020259 card: 1000.000000 bytes: deg: 1 resp: 15.020259
1368 using dmeth: 2 #groups: 1
1369 Cost per ptn: 247.139023 #ptns: 1
1370 hash_area: 508 (max=76800) buildfrag: 646 probefrag: 65 hmbio: 127 fanout: 8 ppasses = 1.0000 -> clearly order is: B-A. Building of the Hash-Table 646 (10x bigger), probed 65 (10x smaller, clearly the table A is meant)
1371 Hash join: Resc: 415.375926 Resp: 415.375926
[multiMatchCost=0.000000]
-> now it's getting interesting. The cost of the Hash-Join (B-A) got calculated as 415, so higher than best computed previously (HJ: A-B): 168. The CBO clearly reverts the order and calculates again (or at best, copies) the cost of: A-B :
1372 Outer table: STAG01 Alias: A
1373 resc: 15.020259 card 1000.000000 bytes: deg: 1 resp: 15.020259
1374 Inner table: STAG02 Alias: B
1375 resc: 153.216644 card: 10000.000000 bytes: deg: 1 resp: 153.216644
1376 using dmeth: 2 #groups: 1
1377 Cost per ptn: 0.046667 #ptns: 1
1378 hash_area: 508 (max=76800) buildfrag: 65 probefrag: 646 hmbio: 127 fanout: 8 ppasses = 1.0000
1379 Hash join: Resc: 168.283569 Resp: 168.283569 [multiMatchCost=0.000000] -> Cost of the Hash-Join (A-B): 168. No surprise, it calculated it previously.
-> Finally, it reports swapping! So from now on, it will continue the calculations of the join scenarios with C based on the Hash-Join with A as Hash-Table and B as Probed-Table :
1380 HA Join
1381 HA cost: 168.283569 swapped
1382 resc: 168.283569 resc_io: 168.000000 resc_cpu: 10633854
1383 resp: 168.283569 resp_io: 168.000000 resp_cpu: 10633854
1384 Best:: JoinMethod: Hash
1385 Cost: 168.283569 Degree: 1 Resp: 168.283569 Card: 1000.000000 Bytes:
-> now the product of the swapped! Hash-Join will serve as the input for ALL three scenarios (NL,SM,HJ) of joining it with C.
-> Surprisingly, all three scenarios will be evaluated again (perhaps copied?) with obviously the same outcome - compare the marked costs of NL,SM and HA with the previous and best permutation. No wonder, all in all we are joining the (swapped) A-B with C. We did that before..
1386
1387 ***************
1388 Now joining: STAG03[C]#2
1389 ***************
1390 NL Join
1391 Outer table: Card: 1000.000000 Cost: 168.283569 Resp: 168.283569 Degree: 1 Bytes:
1392 Access path analysis for STAG03
1393 Scan IO Cost (Disk) = 1351.475000
1394 Scan CPU Cost (Disk) = 74575717.760000
1395 Cost of predicates:
1396 io = NOCOST, cpu = 50.000000, sel = 0.000010 flag = 2048 ("B"."ID"="C"."ID")
1397 Total Scan IO Cost = 1351.475000 (scan (Disk))
1398 + 0.000000 (io filter eval) (= 0.000000 (per row) * 100000.000000 (#rows))
1399 = 1351.475000
1400 Total Scan CPU Cost = 74575717.760000 (scan (Disk))
1401 + 5000000.000000 (cpu filter eval) (= 50.000000 (per row) * 100000.000000 (#rows))
1402 = 79575717.760000
1403 Inner table: STAG03 Alias: C
1404 Access Path: TableScan
1405 NL Join: Cost: 1353765.302710 Resp: 1353765.302710 Degree: 1
1406 Cost_io: 1351643.000000 Cost_cpu: 79586351614
1407 Resp_io: 1351643.000000 Resp_cpu: 79586351614
1408
1409 Best NL cost: 1353765.302710 -> NL cost
1410 resc: 1353765.302710 resc_io: 1351643.000000 resc_cpu: 79586351614
1411 resp: 1353765.302710 resp_io: 1351643.000000 resc_cpu: 79586351614
1412 SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
1413 Join Card: 1000.000000 = outer (1000.000000) * inner (100000.000000) * sel (1.0000e-05)
1414 Join Card - Rounded: 1000 Computed: 1000.000000
1415 Outer table: STAG01 Alias: A
1416 resc: 168.283569 card 1000.000000 bytes: deg: 1 resp: 168.283569
1417 Inner table: STAG03 Alias: C
1418 resc: 1354.988686 card: 100000.000000 bytes: deg: 1 resp: 1354.988686
1419 using dmeth: 2 #groups: 1
1420 SORT ressource Sort statistics
1421 Sort width: 898 Area size: 1048576 Max Area size: 314572800
1422 Degree: 1
1423 Blocks to Sort: 141 Row size: 1148 Total Rows: 1000
1424 Initial runs: 2 Merge passes: 1 IO Cost / pass: 64
1425 Total IO sort cost: 205.000000 Total CPU sort cost: 41417745
1426 Total Temp space used: 2352000
1427 SORT ressource Sort statistics
1428 Sort width: 898 Area size: 1048576 Max Area size: 314572800
1429 Degree: 1
1430 Blocks to Sort: 7101 Row size: 580 Total Rows: 100000
1431 Initial runs: 2 Merge passes: 1 IO Cost / pass: 3108
1432 Total IO sort cost: 10209.000000 Total CPU sort cost: 287025717
1433 Total Temp space used: 109241000
1434 SM join: Resc: 11946.030748 Resp: 11946.030748 [multiMatchCost=0.000000]
1435 SM Join
1436 SM cost: 11946.030748 -> SM cost
1437 resc: 11946.030748 resc_io: 11935.000000 resc_cpu: 413653033
1438 resp: 11946.030748 resp_io: 11935.000000 resp_cpu: 413653033
1439 Outer table: STAG01 Alias: A
1440 resc: 168.283569 card 1000.000000 bytes: deg: 1 resp: 168.283569
1441 Inner table: STAG03 Alias: C
1442 resc: 1354.988686 card: 100000.000000 bytes: deg: 1 resp: 1354.988686
1443 using dmeth: 2 #groups: 1
1444 Cost per ptn: 0.286667 #ptns: 1
1445 hash_area: 508 (max=76800) buildfrag: 128 probefrag: 6470 hmbio: 127 fanout: 8 ppasses = 1.0000
1446 Hash join: Resc: 1523.558922 Resp: 1523.558922 [multiMatchCost=0.000000]
1447 HA Join
1448 HA cost: 1523.558922 -> Hash-Join cost
1449 resc: 1523.558922 resc_io: 1521.000000 resc_cpu: 95959572
1450 resp: 1523.558922 resp_io: 1521.000000 resp_cpu: 95959572
-> Finally, the trace reports that the calculated total cost for the third permutation exceeds the up to now computed the best cost and gets aborted. To be precise, it is equal, not greater.
1451 Join order aborted: cost > best plan cost
1452 ***********************
Let me recap.
This example above is certainly somewhat extreme, as usually the Cost Pruning mechanism is able to discard such permutation much earlier, therefore not bearing such burden of the associated math operations (obviously hammering CPU).
So, in the outlined example the trace reports that the calculated total cost for the third permutation exceeds (Join order aborted: cost > best plan cost) the up to now computed the best cost and gets aborted.
To be precise: it is equal, not greater and it most of all, it gets aborted after having gone through the entire calculation process.
So, in that case the effort incurred is arguably even bigger than the one associated with the fully calculated 1st (and selected) permutation.
Summing up, it’s hard to blame the optimizer – it was working with incorrect information and simply applied the math based on the data it was given.
What warrants attention, however, with the state of its false knowledge about the participating tables being empty and just subtle differences in the calculated values representing cost (extending to the third decimal place) the execution times may differ in several orders of magnitude.
Leave a Reply