Analysis of joins with 10053 CBO trace

This post continues from Part 1, diving deeper into the challenges posed by a suboptimal SQL query that led the optimizer to select a Cartesian Join combined with an overly expensive Hash Join.
But do not treat its content as limited to just a Cartesian Join.
It will also help you to better understand of the internals of the 10053 trace, especially costing of joins and the decision paths undertaken. It also explains the join permutations and gives an interesting example of cost-pruning.

To recap from the first post, the main issue can be summarized as follows:
When statistics are gathered directly after a table truncation, they obviously reflect zero rows and blocks. If these tables are subsequently populated without refreshing the statistics, the Cost-Based Optimizer (CBO) assumes a row count of 1 (rounded up from 0).
Here’s where the problem arises: with three tables involved in two joins, the first join usually defaults to a Cartesian Join because the optimizer assumes cardinality of 1 for the two selected tables.
In reality, the actual row count is usually much larger which – due to the way Cartesian Join processes the participating rows – leads to a massive intermediate rowset.

But that’s just one side of the issue.

This large rowset then becomes the basis for the Hash Join in the second step. The resulting hash table is far too large to fit in memory, spilling over to TEMP. And that’s the other half of the problem—TEMP usage skyrockets, significantly degrading performance and wasting resources.

This naturally leads to an important question: what caused the optimizer to choose such a suboptimal SQL plan involving a Cartesian Join, especially when a set of straightforward hints enforcing two hash joins resulted in significantly better performance? See the previous post for details.
To be fair, the root issue lies in the optimizer receiving highly inaccurate data, which leads to incorrect cost calculations in the first place.
What’s striking (and fortunately not typical), in such case the comparisons among the costs is carried out based on tiny numbers differing at the second or even third decimal point. To give you example: 4.000380 “won” with “4.016386”, as you’ll spot further.
On the other hand you might raise a doubt (to be somewhat malicious) – it could deviate in the other direction and “make wrongly right decision” ;- )

At the first place, what stands out is the striking similarity in the cost estimates for the two plans.
When comparing the SQL plans—specifically the critical Plan IDs (Plan ID = 2) in both: the Cartesian Join plan and the two-Hash-Join plan—they both report the same cost: 4. This uniformity in cost is particularly perplexing given the dramatic difference in execution time.
Even more: as you can see, the final cost for both queries amounts to 6. So at the first glance, just looking at the costing in the “Cost (%CPU)” column for both SQL Plans there is no way to explain it.
SQL Plan does not display exact values but instead rounds floating-point numbers to integers, so only deeper investigation into the internals of 10053 trace can uncover the underlying causes.

Why, then, was the Cartesian join selected over the very efficient Hash-Join?
These details will be explored now.

Let’s start from both SQL Plans.
As concluded from the part1 of this two-parts post, the Plan Id = 2 is the root cause, although it did not involve long time.

SQL Plan with Cartesian Join and the Hash Join (ela time: ~7 minutes)
---------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |     1 |   813 |     6   (0)| 00:00:01 |
|*  1 |  HASH JOIN                   |        |     1 |   813 |     6   (0)| 00:00:01 |
|   2 |   MERGE JOIN CARTESIAN       |        |     1 |   542 |     4   (0)| 00:00:01 ||   3 |    TABLE ACCESS STORAGE FULL | STAG01 |     1 |   271 |     2   (0)| 00:00:01 |
|   4 |    BUFFER SORT               |        |     1 |   271 |     2   (0)| 00:00:01 |
|   5 |     TABLE ACCESS STORAGE FULL| STAG03 |     1 |   271 |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS STORAGE FULL  | STAG02 |     1 |   271 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------


SQL Plan with two Hash Joins (ela time: ~1 sec): 
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |   813 |     6   (0)| 00:00:01 |
|*  1 |  HASH JOIN                  |        |     1 |   813 |     6   (0)| 00:00:01 |
|*  2 |   HASH JOIN                 |        |     1 |   542 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS STORAGE FULL| STAG01 |     1 |   271 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS STORAGE FULL| STAG02 |     1 |   271 |     2   (0)| 00:00:01 |
|   5 |   TABLE ACCESS STORAGE FULL | STAG03 |     1 |   271 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Taking the opportunity, I can't refrain from emphasizing again (and again) how helpful the 10053 trace is when in need of in-depth exploring the decisions of the Oracle optimizer. 
There are many publications explaining the internals of the 10053 trace, yet I am afraid until now there is none which would be the complete compendium. 
Not a big surprise, all in all it's the Oracle internal event whose content is constantly updated to conform to the new CBO features, so it's constantly maturing, thus its content changes.

I have made an attempt to explain an excerpt of it in this post. You'll also find there some references to other useful articles and YB video.

Examining of the selected sections of the 10053 trace file

Let us dive deeper into the internals of 10053 CBO trace.
To accomplish that I will force a hard parse of my test SQL (you’ll find it below) and will dump the CBO’s decisions to a text file.
Here it is :

20:09:31 SYS@CDBTST1->PDBPS> @tr10053
20:09:38 SYS@CDBTST1->PDBPS> ALTER SESSION SET TRACEFILE_IDENTIFIER='&file_identifier';
Enter value for file_identifier: cartj_hj
old   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='&file_identifier'
new   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='cartj_hj'

Session altered.

-- invoking of 10053 tracing : 
20:09:44 SYS@CDBTST1->PDBPS> alter session set events '10053 trace name context forever, level 1';

Session altered.

20:09:46 SYS@CDBTST1->PDBPS> @sett
20:09:49 SYS@CDBTST1->PDBPS> set autotr trace expl
20:09:49 SYS@CDBTST1->PDBPS>

-- forcing hard parse (not really running the query) :
20:09:50 SYS@CDBTST1->PDBPS> @testcartnew
20:09:54 SYS@CDBTST1->PDBPS> select /* &comment */
20:09:54   2  /*+
20:09:54   3   monitor
20:09:54  11  */
20:09:54  12  a.pad, b.pad, c.pad
20:09:54  13  --a.paddie, b.paddie, c.paddie
20:09:54  14  from DELME.STAG01 a, DELME.STAG02 b, DELME.STAG03 c
20:09:54  15  where a.id = b.id and b.id = c.id
20:09:54  16  /
Enter value for comment: 10053_ev
old   1: select /* &comment */
new   1: select /* 10053_ev */

Execution Plan
----------------------------------------------------------
Plan hash value: 3256225140

---------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |     1 |   813 |     6   (0)| 00:00:01 |
|*  1 |  HASH JOIN                   |        |     1 |   813 |     6   (0)| 00:00:01 |
|   2 |   MERGE JOIN CARTESIAN       |        |     1 |   542 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS STORAGE FULL | STAG01 |     1 |   271 |     2   (0)| 00:00:01 |
|   4 |    BUFFER SORT               |        |     1 |   271 |     2   (0)| 00:00:01 |
|   5 |     TABLE ACCESS STORAGE FULL| STAG03 |     1 |   271 |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS STORAGE FULL  | STAG02 |     1 |   271 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A"."ID"="B"."ID" AND "B"."ID"="C"."ID")

In the diag folder (formerly: bdump) where all the *.trc and alert.log resides (generally, diagnostic destination) there will be the 10053 trace file: CDBTST1_ora_346481_cartj_hj.trc being the dump of CBO’s decisions.
For quick analysis I handle it with the usual vi and set up the following while in the vi editor :
:set ic (ignoring the case while searching for some well known anchor strings, but not well enough to remember whether they are there in lower or upper case 😉
:set number (numbering the line for easier navigating).
Having mentioned that, the numbers on the left hand side in all subsequent excerpts of the trace file are the adds-on from the vi after setting the :set number.

BASE STATISTICAL INFORMATION

Let’s start from the “BASE STATISTICAL INFORMATION” :

1041 ***************************************
   1042 BASE STATISTICAL INFORMATION
   1043 ***********************
   1044 Table Stats::
   1045   Table: STAG03  Alias:  C
   1046   #Rows: 0  SSZ: 0  LGR: 0  #Blks:  1  AvgRowLen:  0.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 1
   1047   #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000
   1048   Column (#1): ID(NUMBER)
   1049     AvgLen: 22 NDV: 0 Nulls: 0 Density: 0.000000
   1050 ***********************
   1051 Table Stats::
   1052   Table: STAG02  Alias:  B
   1053   #Rows: 0  SSZ: 0  LGR: 0  #Blks:  1  AvgRowLen:  0.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 1
   1054   #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000
   1055   Column (#1): ID(NUMBER)
   1056     AvgLen: 22 NDV: 0 Nulls: 0 Density: 0.000000
   1057 ***********************
   1058 Table Stats::
   1059   Table: STAG01  Alias:  A
   1060   #Rows: 0  SSZ: 0  LGR: 0  #Blks:  1  AvgRowLen:  0.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 1
   1061   #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000
   1062   Column (#1): ID(NUMBER)
   1063     AvgLen: 22 NDV: 0 Nulls: 0 Density: 0.000000

Several things worth noticing.
First, even though the BLOCKS in dba_tables report 0 (zero) the CBO assumes 1 (#Blks: 1).
It knows (has such false knowledge) the table is empty since #Rows: 0, nonetheless it applies such up-rounds in several places (e.g. see “Card: Original: 0.000000 Rounded: 1” later in subsequent sections of the trace). Perhaps it’s for the sake of avoiding divisions by 0 (just my theory though).

SINGLE TABLE ACCESS PATH

Then, let’s navigate to “SINGLE TABLE ACCESS PATH” sections.
These sections give us the notion of the cost of full table scans for each table participating in the query.
As you can see, for every participating table the “Cost: 2.000190”. So this won’t be any factor which would differ while searching for any oddities.
Furthermore, focusing on Plan Id = 3: whichever join would be chosen as the first most nested join (Cartesian or Hash Join), the outer/driving table on both cases is STAG01 (A) accessed with the full table scan is assigned the same cost (now we know it’s precisely the same cost as we can see the exact numbers without rounding).

So we are still not close to the true reason of such disastrous outcome, since the cost of the FTS in all cases is the same: Cost: 2.000190.
Note, that’ll be the Plan Id = 3 in both SQL Plans. You can find that number in the column “Cost (%CPU)”, obviously rounded to integer value: 2.

   1077 SINGLE TABLE ACCESS PATH
   1078   Single Table Cardinality Estimation for STAG01[A]
   1079   SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
   1080   Table: STAG01  Alias: A
   1081     Card: Original: 0.000000  Rounded: 1  Computed: 0.000000  Non Adjusted: 0.000000
   1082   Scan IO  Cost (Disk) =   2.000000
   1083   Scan CPU Cost (Disk) =   7121.440000
   1084   Total Scan IO  Cost  =   2.000000 (scan (Disk))
   1085                        =   2.000000
   1086   Total Scan CPU  Cost =   7121.440000 (scan (Disk))
   1087                        =   7121.440000
   1088   Access Path: TableScan
   1089     Cost:  2.000190  Resp: 2.000190  Degree: 0
   1090       Cost_io: 2.000000  Cost_cpu: 7121
   1091       Resp_io: 2.000000  Resp_cpu: 7121
   1092   Best:: AccessPath: TableScan
   1093          Cost: 2.000190  Degree: 1  Resp: 2.000190  Card: 0.000000  Bytes: 0.000000
   1094
   1095 Access path analysis for STAG02
   1096 ***************************************
   1097 SINGLE TABLE ACCESS PATH
   1098   Single Table Cardinality Estimation for STAG02[B]
   1099   SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
   1100   Table: STAG02  Alias: B
   1101     Card: Original: 0.000000  Rounded: 1  Computed: 0.000000  Non Adjusted: 0.000000
   1102   Scan IO  Cost (Disk) =   2.000000
   1103   Scan CPU Cost (Disk) =   7121.440000
   1104   Total Scan IO  Cost  =   2.000000 (scan (Disk))
   1105                        =   2.000000
   1106   Total Scan CPU  Cost =   7121.440000 (scan (Disk))
   1107                        =   7121.440000
   1108   Access Path: TableScan
   1109     Cost:  2.000190  Resp: 2.000190  Degree: 0
   1110       Cost_io: 2.000000  Cost_cpu: 7121
   1111       Resp_io: 2.000000  Resp_cpu: 7121
   1112   Best:: AccessPath: TableScan
   1113          Cost: 2.000190  Degree: 1  Resp: 2.000190  Card: 0.000000  Bytes: 0.000000
   1114
   1115 Access path analysis for STAG03
   1116 ***************************************
   1117 SINGLE TABLE ACCESS PATH
   1118   Single Table Cardinality Estimation for STAG03[C]
   1119   SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
   1120   Table: STAG03  Alias: C
   1121     Card: Original: 0.000000  Rounded: 1  Computed: 0.000000  Non Adjusted: 0.000000
   1122   Scan IO  Cost (Disk) =   2.000000
   1123   Scan CPU Cost (Disk) =   7121.440000
   1124   Total Scan IO  Cost  =   2.000000 (scan (Disk))
   1125                        =   2.000000
   1126   Total Scan CPU  Cost =   7121.440000 (scan (Disk))
   1127                        =   7121.440000
   1128   Access Path: TableScan
   1129     Cost:  2.000190  Resp: 2.000190  Degree: 0
   1130       Cost_io: 2.000000  Cost_cpu: 7121
   1131       Resp_io: 2.000000  Resp_cpu: 7121
   1132   Best:: AccessPath: TableScan
   1133          Cost: 2.000190  Degree: 1  Resp: 2.000190  Card: 0.000000  Bytes: 0.000000

Side note:
If you’re curious how this is cost is calculated visit here and search for the string “First, let’s jump to the formula of costing of the full table scan” – this will bring you directly to the relevant formula and the example.
For that you’ll need the following data, also reported in the trace. To be fair, after applying the relevant values you’ll get just the I/O cost = 2.000000 (Scan IO Cost (Disk) = 2.000000). The tiny number: 0.000190 comes from the costing of the CPU, skipped (CPU costing will be explained later).

-----------------------------
SYSTEM STATISTICS INFORMATION
-----------------------------
Using dictionary system stats.
  Using NOWORKLOAD Stats
  CPUSPEEDNW: 3125 millions instructions/sec (default is 100)
  IOTFRSPEED: 4096 bytes per millisecond (default is 4096)
  IOSEEKTIM:  10 milliseconds (default is 10)
  MBRC:       128 blocks (default is 8)


CPUSPEED - CPU Speed
IOTFRSPEED - I/O transfer speed
IOSEEKTIM - I/O seek time
SREADTIM - average single block read time
MREADTIM - average multiblock read time
MBRC - average multiblock read count
MAXTHR - maximum I/O system throughput
SLAVETHR - average slave I/O throughput

The root cause comes with the costing of the join permutations.
First, some theory.
If you have 3 tables: A, B, and C, the optimizer considers such join order:

(A joined to B) then joined to C
(A joined to C) then joined to B
(B joined to A) then joined to C
(B joined to C) then joined to A
(C joined to A) then joined to B
(C joined to B) then joined to A

The number of permutations analyzed during query optimization is, in general, N! (N factorial).
Let’s pause here and think to consider just how computationally expensive a hard parse can become, especially when dealing with a larger number of tables.
The process involves a significant amount of floating-point math operations. For each permutation, the optimizer evaluates three types of joins—Nested Loop (NL), Sort Merge (SM), and Hash Join (HA)—each consuming CPU cycles.

Thankfully, the CBO makes itself life easier and doesn’t aim to analyze every possible scenario. It employs a few strategies to simplify the process:

Cost Pruning:
If the cost of a join at a specific level of nesting exceeds the best plan cost computed so far, the CBO abandons further calculations for that permutation set (see the example below).
Permutation Limiting:
The optimizer caps the number of permutations to 2000 per query block, controlled by the hidden parameter _optimizer_max_permutations. This acts as a safety mechanism. Realistically, even approaching 2000 permutations per query block is quite “generous.”, isn’t it ..?
Constraint-Based Elimination:
Constraints like primary/foreign keys or join dependencies help filter out invalid or redundant join orders.
Query Rewrite:
The CBO can leverage query rewrites to reduce the number of costed joins.

However, none of these optimizations eliminate the fact that hard parsing is an extremely CPU-intensive operation, especually when a query involves numerous tables.

In our case with three tables the number of permutations is indeed: 3! = 6 (see “Join order [1-6]” below).
Even so, we observed cost pruning in action—evidenced by the message:
“Join order aborted: cost > best plan cost” – so effectively, just two permutation scenarios have been computed.
This confirms that calculations were skipped once the estimated cost exceeded the best cost computed so far. Note, “skipped” does not mean “free ride” – before the CBO could make such decision it still had to involve CPU-cycles to reach that point, thus to carry out certain calculations.
Red-marked is the one which was selected with the lowest cost computed (we will see it soon).

# cat -n  CDBTST1_ora_346481_cartj_hj.trc | grep "Join order"
  1149  Join order[1]:  STAG01[A]#0  STAG02[B]#1  STAG03[C]#2
  1289  Join order[2]:  STAG01[A]#0  STAG03[C]#2  STAG02[B]#1
  1390  Join order[3]:  STAG02[B]#1  STAG01[A]#0  STAG03[C]#2
  1523  Join order aborted: cost > best plan cost
  1525  Join order[4]:  STAG02[B]#1  STAG03[C]#2  STAG01[A]#0
  1658  Join order aborted: cost > best plan cost
  1660  Join order[5]:  STAG03[C]#2  STAG01[A]#0  STAG02[B]#1
  1754  Join order aborted: cost > best plan cost
  1756  Join order[6]:  STAG03[C]#2  STAG02[B]#1  STAG01[A]#0
  1889  Join order aborted: cost > best plan cost

Details of costing of the joins based on the 10053 trace

As mentioned, the root cause of the suboptimal performance can be explained with the costing of the join permutations.
Let’s take a closer look at some excerpts from the 10053 trace, which reports the cost of joins. The numbered lines are introduced to help navigate the trace file more easily. This way, you can quickly see how many lines have been skipped.
FYI: The second permutation, which was ultimately selected, is included in its entirety.

In our case, the CBO began with the most straightforward join order – see below.
I’ve added inline comments directly into the trace text (bold, italics). The rest of the text remains as it appears in the original trace dump.

   1149 Join order[1]:  STAG01[A]#0  STAG02[B]#1  STAG03[C]#2
   1150
   1151 ***************
   1152 Now joining: STAG02[B]#1 --> from here CBO calculates the cost of join of STAG01 to STAG02 going through several join scenarios of which the Hash Join has been evaluated as the one with the lowest cost (other two evaluations are skipped)

   1211 HA Join
   1212   HA cost: 4.016386
   1213      resc: 4.016386 resc_io: 4.000000 resc_cpu: 614493
   1214      resp: 4.016386 resp_io: 4.000000 resp_cpu: 614493
   1215 Best:: JoinMethod: Hash
   1216        Cost: 4.016386  Degree: 1  Resp: 4.016386  Card: 0.000000 Bytes:

Then, once it has the calculated cost of the join scenario with the lowest cost (it happens to be HA = Hash Join) it jumps to evaluating of the cost of joining that product join to the STAG03 table.
See “Outer table: Card: 0.000000  Cost: 4.016386  Resp: 4.016386  Degree: 1” which in fact is the product of the join between the STAG01 and STAG02 being considered the outer table for the NL (“nested loop”) join.

   1218 ***************
   1219 Now joining: STAG03[C]#2 --> evaluating of the cost of the join between: (A & B - evaluated above) with C
   1220 ***************
   1221 NL Join
   1222   Outer table: Card: 0.000000  Cost: 4.016386  Resp: 4.016386  Degree: 1  Bytes:
   1234   Inner table: STAG03  Alias: C
   1235   Access Path: TableScan
   1236     NL Join:  Cost: 6.016576  Resp: 6.016576  Degree: 1
   1237       Cost_io: 6.000000  Cost_cpu: 621614
   1238       Resp_io: 6.000000  Resp_cpu: 621614

(…)
(other join scenarios probed)
(…)
Then finally it ends up with the one which yells the lowest cost (Hash Join) :

   1278 HA Join
   1279   HA cost: 6.032583
   1280      resc: 6.032583 resc_io: 6.000000 resc_cpu: 1221864
   1281      resp: 6.032583 resp_io: 6.000000 resp_cpu: 1221864
   1282 Best:: JoinMethod: Hash
   1283        Cost: 6.032583  Degree: 1  Resp: 6.032583  Card: 0.000000 Bytes:
Then at the end of that subsection :
   1284 ***********************
   1285 Best so far:  Table#: 0  cost: 2.000190  card: 0.000000  bytes: 271.000000
   1286               Table#: 1  cost: 4.016386  card: 0.000000  bytes: 542.000000
   1287               Table#: 2  cost: 6.032583  card: 0.000000  bytes: 813.000000
   1288 ***********************

Conveniently, the 10053 trace provides a cost summary for all evaluated joins, or (the first line) cost of accessing the first table.
It’s worth noting that the calculated (join) cardinality for each phase is shown as 0, whereas the SQL Plan reports it as 1.
So, how the above summary should exactly be interpreted?
Here’s the summary with highlights :

  1285 Best so far:  Table#: 0  cost: 2.000190  card: 0.000000  bytes: 271.000000
represents the cost of accessing (with FTS) the first table: STAG01 (A) in the probed join order. Reach out to the SINGLE TABLE ACCESS PATH section, line 1093

   1286               Table#: 1  cost: 4.016386  card: 0.000000  bytes: 542.000000
Represents the cost of join STAG01 to STAG02 (with Hash Join, not mentioned in the summary). So Table#: 1 is in fact a join product

Finally : 
   1287               Table#: 2  cost: 6.032583  card: 0.000000  bytes: 813.000000
represents the cost of join of the product of join of (STAG01 and STAG2) to STAG03, also with Hash Join. Again, Table#: 2 stands for the product of a join with a table STAG03 (C)

Malicious comment – Had the CBO kept the above join scenario (FTS – HJ – HJ) it would have done much better. It obviously jumped to further evaluations though which unfortunately led it to the scenario: FTS – Cart.Join – HJ (turning out the disaster).
Let’s see the root causes below – it will only be clearly seen once

So, let’s examine the permutation which ultimately was chosen (the one with the lowest costs).
This time I kept the entire excerpt (see the line numbering), putting some comments between the lines, as before (bold, italics).
For better visibility I included the SQL Plan for reference along with the marked IDs corresponding with the immediate sections of 10053 below.
First, as a reminder:

Execution Plan
----------------------------------------------------------
SQL_ID: 1mmtvzb7u2nts
Plan hash value: 3256225140

---------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |     1 |   813 |     6   (0)| 00:00:01 |
|*  1 |  HASH JOIN                   |        |     1 |   813 |     6   (0)| 00:00:01 |
|   2 |   MERGE JOIN CARTESIAN       |        |     1 |   542 |     4   (0)| 00:00:01 | -> at first, this join will be evaluated below in 10053
|   3 |    TABLE ACCESS STORAGE FULL | STAG01 |     1 |   271 |     2   (0)| 00:00:01 |
|   4 |    BUFFER SORT               |        |     1 |   271 |     2   (0)| 00:00:01 |
|   5 |     TABLE ACCESS STORAGE FULL| STAG03 |     1 |   271 |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS STORAGE FULL  | STAG02 |     1 |   271 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

The first join evaluation (STAG01 to STAG03) :

   1289 Join order[2]:  STAG01[A]#0  STAG03[C]#2  STAG02[B]#1
   1290
   1291 ***************
   1292 Now joining: STAG03[C]#2   -> from here CBO calculates the cost of join of STAG01 to STAG03 :
   1293 ***************
   1294 NL Join
   1295   Outer table: Card: 0.000000  Cost: 2.000190  Resp: 2.000190  Degree: 1  Bytes: -> outer table STAG01 accessed with FTS, costed before: 2.000190
   1296 Access path analysis for STAG03
   1297   Scan IO  Cost (Disk) =   2.000000
   1298   Scan CPU Cost (Disk) =   7121.440000
   1299   Total Scan IO  Cost  =   2.000000 (scan (Disk))
   1300                        =   2.000000
   1301   Total Scan CPU  Cost =   7121.440000 (scan (Disk))
   1302                        =   7121.440000
   1303   Inner table: STAG03  Alias: C
   1304   Access Path: TableScan
   1305     NL Join:  Cost: 4.000380  Resp: 4.000380  Degree: 1
   1306       Cost_io: 4.000000  Cost_cpu: 14243
   1307       Resp_io: 4.000000  Resp_cpu: 14243
   1308
   1309   Best NL cost: 4.000380
   1310           resc: 4.000380  resc_io: 4.000000  resc_cpu: 14243
   1311           resp: 4.000380  resp_io: 4.000000  resc_cpu: 14243
   1312   SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
   1313 Join Card:  0.000000 = outer (0.000000) * inner (0.000000) * sel (1.000000)   -> calc. join cardinality: 0 as multiplication with (at least) one factor = 0.
   1314 Join Card - Rounded: 1 Computed: 0.000000                                     -> rounded to 1 and this is the value displayed in the SQL Plan ID = 2
   1315 Best:: JoinMethod: NestedLoop -> In reality, Cartesian Join!
   1316        Cost: 4.000380  Degree: 1  Resp: 4.000380  Card: 0.000000 Bytes:

-> to be continued in line 1317 below. Some important comments first.

Having the cost of join of STAG01 to STAG03 calculated (4.000380) the CBO starts with evaluating of the cost of joining of this join product to STAG02 (see below).
So, talking aliases and join ordering, having calculated the cost of a join (A to C) in the previous section now it evaluates the cost of join of the product (A to C) with B.

Some important notes.
1.
First, the “Cost” column of the (A to C) in the SQL Plan has been rounded to 4 from 4.000380 (Plan Id = 2) – but that’s already known, just mentioning for the record.

2.
Secondly, neither HJ nor SM scenario has been evaluated in this section.
The reason for that is unknown to me. I’d risk the theory, CBO just skipped them as the outcome numbers for a join between (STAG01 and STAG02) vs. (STAG01 and STAG03) wouldn’t differ as far as SM and HJ are concerned since the BASE STATISTICAL INFORMATION for all three tables are identical (the state of CBO’s knowledge: they are empty), therefore perhaps the numbers wouldn’t change upon simply just reverting the order of the tables.

3.
Admittedly though, the change of the join order from:
STAG01 -> STAG02 (Cost: 4.016386)
to
STAG01 -> STAG03 (Cost: 4.000380)
(just slightly) decreased the cost, but consequently surely negatively influenced the elapsed time and resource utilization.

Why would the reverting of that join order matter?
Because the join cardinality of the cartesian join (it would surely be still Cartesian-Join) between STAG01 with 1’000 rows and STAG03 with 100’000 made the cardinality: 100’000’000 (1000 x 100’000), whereas the cardinality of the cartesian join between STAG01 with 1’000 and STAG02 with 10’000 would yell “just” 10’000’000 (1’000 x 10’000) – a magnitude lower.
Recall, the product of such cartesian join then served as the input for creating the Hash-Table (Plan Id = 2) of the Hash-Join (Plan Id = 1).
Thus, the size and effort of building of the Hash-Table based on 100’000’000 rows as oppose to 10’000’000 rows makes order of magnitude difference.
But this is just a speculation “what if”- ultimately it didn’t happen.

4.
Most importantly: albeit the costing of the first, most nested join was the critical moment, however not the cartesian join itself took the longest time.
It was the Hash-Join “up-higher” in the SQL Plan which received a huge rowset for its Hash-Table to be built (manifested by large number of waits “direct path write temp” in the SQL Monitor – see the previous post for clarity) which then it probed against the STAG02 (accompanied by “direct path read temp”).

5.
As shortly mentioned somewhere before, that alone still does not represent the entire problem.
As with Hash-Joins, the Hash-Table contains not only the join predicates (so, IDs of type NUMBER) but also all other columns which are required to satisfy the query (block). This is why I introduced two test columns: PADDIE of VARCHAR2(32) and PAD of VARCHAR(512) and will show it later (in the next post) that the size (and time of building) of the Hash-Table will differ depending on the columns selected “somewhere” in the query.
This is to be seen e.g. within the “Column Projection Information (identified by operation id)” section of the SQL Plan.
This also makes the good opportunity to assess the space required in the TEMP tablespace, provided you’d be able to fairly accurately estimate the join cardinality of the Hash-Join of interest.

Coming back to the main topic though.
SQL Plan for quick reminder.
As before – red-marked is Plan Id which will be evaluated in the subsequent section of 10053 trace. Bold font is used to mark its children :

Execution Plan
----------------------------------------------------------
SQL_ID: 1mmtvzb7u2nts
Plan hash value: 3256225140

---------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |     1 |   813 |     6   (0)| 00:00:01 |
|*  1 |  HASH JOIN                   |        |     1 |   813 |     6   (0)| 00:00:01 |  -> this join will be evaluated below
|   2 |   MERGE JOIN CARTESIAN       |        |     1 |   542 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS STORAGE FULL | STAG01 |     1 |   271 |     2   (0)| 00:00:01 |
|   4 |    BUFFER SORT               |        |     1 |   271 |     2   (0)| 00:00:01 |
|   5 |     TABLE ACCESS STORAGE FULL| STAG03 |     1 |   271 |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS STORAGE FULL  | STAG02 |     1 |   271 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

So, now the Plan Id = 1 will be evaluated with its children in Plan Id=2 (Cartesian Join which serves as the input for the Hash-Table) and Plan Id=6 (probed table for a Hash-Join). Referencing the join order, now we are to analyze the join between already costed join (A-C) with B. Note the red-marked three scenarios: NL (Nested Loop), SM (Sort Merge) and HA (Hash Join).

Continuing from line 1316 :

   1317
   1318 *************** 
   1319 Now joining: STAG02[B]#1
   1320 ***************
   1321 NL Join
   1322   Outer table: Card: 0.000000  Cost: 4.000380  Resp: 4.000380  Degree: 1  Bytes:  -> “Outer table” here in the NL join should be understood as the join product of STAG01 with STAG03.
   1323 Access path analysis for STAG02
   1324   Scan IO  Cost (Disk) =   2.000000
   1325   Scan CPU Cost (Disk) =   7121.440000
   1326   Cost of predicates:
   1327     io = NOCOST, cpu = 50.000000, sel = 0.000000 flag = 2048  ("A"."ID"="B"."ID")
   1328     io = NOCOST, cpu = 50.000000, sel = 0.000000 flag = 2048  ("B"."ID"="C"."ID")
   1329   Total Scan IO  Cost  =   2.000000 (scan (Disk))
   1330                          + 0.000000 (io filter eval) (= 0.000000 (per row) * 0.000000 (#rows))
   1331                        =   2.000000
   1332   Total Scan CPU  Cost =   7121.440000 (scan (Disk))
   1333                          + 0.000000 (cpu filter eval) (= 50.000000 (per row) * 0.000000 (#rows))
   1334                        =   7121.440000
   1335   Inner table: STAG02  Alias: B
   1336   Access Path: TableScan
   1337     NL Join:  Cost: 6.000570  Resp: 6.000570  Degree: 1
   1338       Cost_io: 6.000000  Cost_cpu: 21364
   1339       Resp_io: 6.000000  Resp_cpu: 21364
   1340
   1341   Best NL cost: 6.000570
   1342           resc: 6.000570  resc_io: 6.000000  resc_cpu: 21364
   1343           resp: 6.000570  resp_io: 6.000000  resc_cpu: 21364
   1344   SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
   1345 Join Card:  0.000000 = outer (0.000000) * inner (0.000000) * sel (0.000000)
   1346 Join Card - Rounded: 1 Computed: 0.000000
   1347   Outer table:  STAG03  Alias: C
   1348     resc: 4.000380  card 0.000000  bytes:   deg: 1  resp: 4.000380
   1349   Inner table:  STAG02  Alias: B
   1350     resc: 2.000190  card: 0.000000  bytes:   deg: 1  resp: 2.000190
   1351     using dmeth: 2  #groups: 1
   1352     SORT ressource         Sort statistics
   1353       Sort width:         898 Area size:     1048576 Max Area size:   314572800
   1354       Degree:               1
   1355       Blocks to Sort: 1 Row size:     607 Total Rows:              1
   1356       Initial runs:   1 Merge passes:  0 IO Cost / pass:          0
   1357       Total IO sort cost: 0.000000      Total CPU sort cost: 37500000
   1358       Total Temp space used: 0
   1359     SORT ressource         Sort statistics
   1360       Sort width:         898 Area size:     1048576 Max Area size:   314572800
   1361       Degree:               1
   1362       Blocks to Sort: 1 Row size:     309 Total Rows:              1
   1363       Initial runs:   1 Merge passes:  0 IO Cost / pass:          0
   1364       Total IO sort cost: 0.000000      Total CPU sort cost: 37500000
   1365       Total Temp space used: 0
   1366   SM join: Resc: 8.000570  Resp: 8.000570  [multiMatchCost=0.000000]
   1367 SM Join
   1368   SM cost: 8.000570
   1369      resc: 8.000570 resc_io: 6.000000 resc_cpu: 75021364
   1370      resp: 8.000570 resp_io: 6.000000 resp_cpu: 75021364
   1371   Outer table:  STAG03  Alias: C
   1372     resc: 4.000380  card 0.000000  bytes:   deg: 1  resp: 4.000380
   1373   Inner table:  STAG02  Alias: B
   1374     resc: 2.000190  card: 0.000000  bytes:   deg: 1  resp: 2.000190
   1375     using dmeth: 2  #groups: 1
   1376     Cost per ptn: 0.016007  #ptns: 1
   1377     hash_area: 508 (max=76800) buildfrag: 1  probefrag: 1  hmbio: 127  fanout: 8  ppasses = 1.0000
   1378   Hash join: Resc: 6.016576  Resp: 6.016576  [multiMatchCost=0.000000]
   1379 HA Join
   1380   HA cost: 6.016576
   1381      resc: 6.016576 resc_io: 6.000000 resc_cpu: 621614
   1382      resp: 6.016576 resp_io: 6.000000 resp_cpu: 621614
   1383 Best:: JoinMethod: Hash
   1384        Cost: 6.016576  Degree: 1  Resp: 6.016576  Card: 0.000000 Bytes:
   1385 ***********************
   1386 Best so far:  Table#: 0  cost: 2.000190  card: 0.000000  bytes: 271.000000
   1387               Table#: 2  cost: 4.000380  card: 0.000000  bytes: 542.000000
   1388               Table#: 1  cost: 6.016576  card: 0.000000  bytes: 813.000000
   1389 ***********************

Considering, the rest 4 permutations were discarded, these two were eventually in play:

Previously analyzed permutation (A-B)-C : 

   1284 ***********************
   1285 Best so far:  Table#: 0  cost: 2.000190  card: 0.000000  bytes: 271.000000
   1286               Table#: 1  cost: 4.016386  card: 0.000000  bytes: 542.000000
   1287               Table#: 2  cost: 6.032583  card: 0.000000  bytes: 813.000000
   1288 ***********************


Currently analyzed permutation (A-C)-B :

1385 ***********************
   1386 Best so far:  Table#: 0  cost: 2.000190  card: 0.000000  bytes: 271.000000
   1387               Table#: 2  cost: 4.000380  card: 0.000000  bytes: 542.000000
   1388               Table#: 1  cost: 6.016576  card: 0.000000  bytes: 813.000000
   1389 ***********************

If you sum up these values you’ll come to terms that the cost of the second (and selected) permutation is just slightly lower – arguably, almost at the edge of the rounding error. Yet it’s obviously the winner.

I am trying to point out by that that typically the differences in costing are much better visible – the example below.
This time I will dump the outcome of the “healthy” conditions, that is, following gathering of the statistics. Under such condition the optimizer (obviously) made the best possible decision: two Hash-Joins in the best possible order.
I forced skipping creating and using the Bloom Filter (nothing against it, just for the reason of simplifying of the dump).
Mentioning that, I will show the SQL Plan accompanied with the runtime stats “TRACE EXPLAIN STATISTICS“, the output of the SQL Monitor and last but not least: the relevant excerpt of the 10053 trace (just joining section).

SQL Plan with runtime stats:
The important lines are marked red or put in bold.

19:26:24 SYS@CDBTST1->PDBPS> @tr10053
Enter value for file_identifier: HJ
old   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='&file_identifier'
new   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='HJ'

Session altered.

Elapsed: 00:00:00.00

Session altered.

Elapsed: 00:00:00.00
19:26:34 SYS@CDBTST1->PDBPS> @setts
19:26:36 SYS@CDBTST1->PDBPS> SET AUTOTRACE TRACE EXPLAIN STATISTICS
19:26:38 SYS@CDBTST1->PDBPS> @ssqlid
19:26:43 SYS@CDBTST1->PDBPS> set  echo on
19:26:43 SYS@CDBTST1->PDBPS>  set feedback on sql_id
19:26:43 SYS@CDBTST1->PDBPS> set echo off
19:26:45 SYS@CDBTST1->PDBPS> @setts
19:26:46 SYS@CDBTST1->PDBPS> SET AUTOTRACE TRACE EXPLAIN STATISTICS
19:26:46 SYS@CDBTST1->PDBPS> @testcartnew
19:26:51 SYS@CDBTST1->PDBPS>  alter session set "_bloom_serial_filter" = off ;

Session altered.

19:26:51 SYS@CDBTST1->PDBPS> select /* &comment */
19:26:51   2  /*+
19:26:51   3   monitor
19:26:51  11  */
19:26:51  12  a.pad, b.pad, c.pad
19:26:51  13  --a.paddie, b.paddie, c.paddie
19:26:51  14  from DELME.STAG01 a, DELME.STAG02 b, DELME.STAG03 c
19:26:51  15  where a.id = b.id and b.id = c.id
19:26:51  16  /
Enter value for comment: HJ
old   1: select /* &comment */
new   1: select /* HJ */

1000 rows selected.

SQL_ID: 83h92yz13r012
Elapsed: 00:00:00.10

Execution Plan
----------------------------------------------------------
Plan hash value: 2925396405

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |  1000 |  1515K|  1524   (1)| 00:00:01 |
|*  1 |  HASH JOIN                  |        |  1000 |  1515K|  1524   (1)| 00:00:01 |
|*  2 |   HASH JOIN                 |        |  1000 |  1009K|   168   (0)| 00:00:01 |
|   3 |    TABLE ACCESS STORAGE FULL| STAG01 |  1000 |   504K|    15   (0)| 00:00:01 |
|   4 |    TABLE ACCESS STORAGE FULL| STAG02 | 10000 |  5048K|   153   (0)| 00:00:01 |
|   5 |   TABLE ACCESS STORAGE FULL | STAG03 |  100K |    49M|  1355   (1)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("B"."ID"="C"."ID")
   2 - access("A"."ID"="B"."ID")

Statistics
----------------------------------------------------------
          9  recursive calls
          0  db block gets
       8690  consistent gets
          0  physical reads
          0  redo size
      21479  bytes sent via SQL*Net to client
       1442  bytes received via SQL*Net from client
         68  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1000  rows processed

Output of SQL Monitor :
Note, the true cardinalities in every Plan Id below confirm the CBO estimations from above (Rows).

SQL Monitoring Report

SQL Text
------------------------------
select /* HJ */ /*+ monitor */ a.pad, b.pad, c.pad --a.paddie, b.paddie, c.paddie from DELME.STAG01 a, DELME.STAG02 b, DELME.STAG03 c where a.id = b.id and
b.id = c.id

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  SYS (7395:61645)
 SQL ID              :  83h92yz13r012
 SQL Execution ID    :  16777216
 Duration            :  .030109s
 Fetch Calls         :  68

Global Stats
=================================================
| Elapsed |   Cpu   |  Other   | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    0.03 |    0.03 |     0.00 |    68 |   8690 |
=================================================

SQL Plan Monitoring Details (Plan Hash Value=2925396405)
=============================================================================================================================================
| Id |           Operation           |  Name  |  Rows   | Cost |   Time    | Start  | Execs |   Rows   |  Mem  | Activity | Activity Detail |
|    |                               |        | (Estim) |      | Active(s) | Active |       | (Actual) | (Max) |   (%)    |   (# samples)   |
=============================================================================================================================================
|  0 | SELECT STATEMENT              |        |         |      |         1 |     +0 |     1 |     1000 |     . |          |                 |
|  1 |   HASH JOIN                   |        |    1000 | 1524 |         1 |     +0 |     1 |     1000 |   5MB |          |                 |
|  2 |    HASH JOIN                  |        |    1000 |  168 |         1 |     +0 |     1 |     1000 |   5MB |          |                 |
|  3 |     TABLE ACCESS STORAGE FULL | STAG01 |    1000 |   15 |         1 |     +0 |     1 |     1000 |     . |          |                 |
|  4 |     TABLE ACCESS STORAGE FULL | STAG02 |   10000 |  153 |         1 |     +0 |     1 |    10000 |     . |          |                 |
|  5 |    TABLE ACCESS STORAGE FULL  | STAG03 |    100K | 1355 |         1 |     +0 |     1 |     100K |     . |          |                 |
=============================================================================================================================================

Finally, the relevant section from the 10053 trace :
First, grepping after the “Join order”.
Just the first permutation, which turned out the cheapest, was fully evaluated – but this does not yell the entire truth, either – see below.
The 5 of all 6 (=3!) have been discarded. I will enclose the example of one of them – quite interesting how much work sticks behind even such abandoned permutation.

# cat -n CDBTST1_ora_91326_HJ.trc | grep "Join order"

selected, turned out the cheapest :
  1142  Join order[1]:  STAG01[A]#0  STAG02[B]#1  STAG03[C]#2
discarded :
  1282  Join order[2]:  STAG01[A]#0  STAG03[C]#2  STAG02[B]#1
  1308  Join order aborted: cost > best plan cost
  1310  Join order[3]:  STAG02[B]#1  STAG01[A]#0  STAG03[C]#2
  1451  Join order aborted: cost > best plan cost
  1453  Join order[4]:  STAG02[B]#1  STAG03[C]#2  STAG01[A]#0
  1519  Join order aborted: cost > best plan cost
  1521  Join order[5]:  STAG03[C]#2  STAG01[A]#0  STAG02[B]#1
  1547  Join order aborted: cost > best plan cost
  1549  Join order[6]:  STAG03[C]#2  STAG02[B]#1  STAG01[A]#0
  1623  Join order aborted: cost > best plan cost

The full dump of the first winning permutation – see the relevance in the SQL Plan above under “Cost (CPU%))” :
Relevant sections marked red or bold, or just commented with (->italics bold)

   1136 OPTIMIZER STATISTICS AND COMPUTATIONS
   1137 ***************************************
   1138 GENERAL PLANS
   1139 ***************************************
   1140 Considering cardinality-based initial join order.
   1141 Permutations for Starting Table :0
   1142 Join order[1]:  STAG01[A]#0  STAG02[B]#1  STAG03[C]#2
   1143
   1144 ***************
   1145 Now joining: STAG02[B]#1      -> meaning: join of the pair: (A with B)
   1146 ***************
   1147 NL Join                       -> costing of nested loop join of A with B (in this order)
   1148   Outer table: Card: 1000.000000  Cost: 15.020259  Resp: 15.020259  Degree: 1  Bytes:
   1149 Access path analysis for STAG02
   1150   Scan IO  Cost (Disk) =   151.358000
   1151   Scan CPU Cost (Disk) =   8124138.560000
   1152   Cost of predicates:
   1153     io = NOCOST, cpu = 50.000000, sel = 0.000100 flag = 2048  ("A"."ID"="B"."ID")
   1154   Total Scan IO  Cost  =   151.358000 (scan (Disk))
   1155                          + 0.000000 (io filter eval) (= 0.000000 (per row) * 10000.000000 (#rows))
   1156                        =   151.358000
   1157   Total Scan CPU  Cost =   8124138.560000 (scan (Disk))
   1158                          + 500000.000000 (cpu filter eval) (= 50.000000 (per row) * 10000.000000 (#rows))
   1159                        =   8624138.560000
   1160   Inner table: STAG02  Alias: B
   1161   Access Path: TableScan
   1162     NL Join:  Cost: 151602.997287  Resp: 151602.997287  Degree: 1
   1163       Cost_io: 151373.000000  Cost_cpu: 8624898275
   1164       Resp_io: 151373.000000  Resp_cpu: 8624898275
   1165
   1166   Best NL cost: 151602.997287        -> cost of the nested loop (A -> B). Very high
   1167           resc: 151602.997287  resc_io: 151373.000000  resc_cpu: 8624898275
   1168           resp: 151602.997287  resp_io: 151373.000000  resc_cpu: 8624898275
   1169   SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
   1170 Join Card:  1000.000000 = outer (1000.000000) * inner (10000.000000) * sel (1.0000e-04)
   1171 Join Card - Rounded: 1000 Computed: 1000.000000
   1172   Outer table:  STAG01  Alias: A
   1173     resc: 15.020259  card 1000.000000  bytes:   deg: 1  resp: 15.020259
   1174   Inner table:  STAG02  Alias: B
   1175     resc: 153.216644  card: 10000.000000  bytes:   deg: 1  resp: 153.216644
   1176     using dmeth: 2  #groups: 1
   1177     SORT ressource         Sort statistics
   1178       Sort width:         898 Area size:     1048576 Max Area size:   314572800
   1179       Degree:               1
   1180       Blocks to Sort: 71 Row size:     579 Total Rows:           1000
   1181       Initial runs:   1 Merge passes:  0 IO Cost / pass:          0
   1182       Total IO sort cost: 0.000000      Total CPU sort cost: 37949004
   1183       Total Temp space used: 0
   1184     SORT ressource         Sort statistics
   1185       Sort width:         898 Area size:     1048576 Max Area size:   314572800
   1186       Degree:               1
   1187       Blocks to Sort: 709 Row size:     579 Total Rows:          10000
   1188       Initial runs:   2 Merge passes:  1 IO Cost / pass:        312
   1189       Total IO sort cost: 1021.000000      Total CPU sort cost: 60928830
   1190       Total Temp space used: 10937000
   1191   SM join: Resc: 1191.873645  Resp: 1191.873645  [multiMatchCost=0.000000]
   1192 SM Join
   1193   SM cost: 1191.873645
   1194      resc: 1191.873645 resc_io: 1189.000000 resc_cpu: 107761688
   1195      resp: 1191.873645 resp_io: 1189.000000 resp_cpu: 107761688
   1196   Outer table:  STAG01  Alias: A
   1197     resc: 15.020259  card 1000.000000  bytes:   deg: 1  resp: 15.020259
   1198   Inner table:  STAG02  Alias: B
   1199     resc: 153.216644  card: 10000.000000  bytes:   deg: 1  resp: 153.216644
   1200     using dmeth: 2  #groups: 1
   1201     Cost per ptn: 0.046667  #ptns: 1
   1202     hash_area: 508 (max=76800) buildfrag: 65  probefrag: 646  hmbio: 127  fanout: 8  ppasses = 1.0000
   1203   Hash join: Resc: 168.283569  Resp: 168.283569  [multiMatchCost=0.000000]
   1204 HA Join
   1205   HA cost: 168.283569
   1206      resc: 168.283569 resc_io: 168.000000 resc_cpu: 10633854
   1207      resp: 168.283569 resp_io: 168.000000 resp_cpu: 10633854
   1208 Best:: JoinMethod: Hash
   1209        Cost: 168.283569  Degree: 1  Resp: 168.283569  Card: 1000.000000 Bytes:

-> costing of the three types of joins of A->B, so NL, SM and HJ (in 10053: HA) completed with the Hash-Join as the lowest (SQL Plan Id: 2)

-> This does not complete processing of the entire permutation. 
Now the cost of the join of the (A->B) with C will be computed. Also with three types: NL, SM and HJ evaluated :

   1210
   1211 ***************
   1212 Now joining: STAG03[C]#2
   1213 ***************
   1214 NL Join
   1215   Outer table: Card: 1000.000000  Cost: 168.283569  Resp: 168.283569  Degree: 1  Bytes: -> note the Cost: 168.283569 of the "outer table" - it's clearly the outcome of the previously selected "winner", so the Hash-Join of A-B.
   1216 Access path analysis for STAG03
   1217   Scan IO  Cost (Disk) =   1351.475000
   1218   Scan CPU Cost (Disk) =   74575717.760000
   1219   Cost of predicates:
   1220     io = NOCOST, cpu = 50.000000, sel = 0.000010 flag = 2048  ("B"."ID"="C"."ID")
   1221   Total Scan IO  Cost  =   1351.475000 (scan (Disk))
   1222                          + 0.000000 (io filter eval) (= 0.000000 (per row) * 100000.000000 (#rows))
   1223                        =   1351.475000
   1224   Total Scan CPU  Cost =   74575717.760000 (scan (Disk))
   1225                          + 5000000.000000 (cpu filter eval) (= 50.000000 (per row) * 100000.000000 (#rows))
   1226                        =   79575717.760000
   1227   Inner table: STAG03  Alias: C
   1228   Access Path: TableScan
   1229     NL Join:  Cost: 1353765.302710  Resp: 1353765.302710  Degree: 1
   1230       Cost_io: 1351643.000000  Cost_cpu: 79586351614
   1231       Resp_io: 1351643.000000  Resp_cpu: 79586351614
   1232
   1233   Best NL cost: 1353765.302710
   1234           resc: 1353765.302710  resc_io: 1351643.000000  resc_cpu: 79586351614
   1235           resp: 1353765.302710  resp_io: 1351643.000000  resc_cpu: 79586351614
   1236   SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
   1237 Join Card:  1000.000000 = outer (1000.000000) * inner (100000.000000) * sel (1.0000e-05)
   1238 Join Card - Rounded: 1000 Computed: 1000.000000
   1239   Outer table:  STAG02  Alias: B
   1240     resc: 168.283569  card 1000.000000  bytes:   deg: 1  resp: 168.283569
   1241   Inner table:  STAG03  Alias: C
   1242     resc: 1354.988686  card: 100000.000000  bytes:   deg: 1  resp: 1354.988686
   1243     using dmeth: 2  #groups: 1
   1244     SORT ressource         Sort statistics
   1245       Sort width:         898 Area size:     1048576 Max Area size:   314572800
   1246       Degree:               1
   1247       Blocks to Sort: 141 Row size:     1148 Total Rows:           1000
   1248       Initial runs:   2 Merge passes:  1 IO Cost / pass:         64
   1249       Total IO sort cost: 205.000000      Total CPU sort cost: 41417745
   1250       Total Temp space used: 2352000
   1251     SORT ressource         Sort statistics
   1252       Sort width:         898 Area size:     1048576 Max Area size:   314572800
   1253       Degree:               1
   1254       Blocks to Sort: 7101 Row size:     580 Total Rows:         100000
   1255       Initial runs:   2 Merge passes:  1 IO Cost / pass:       3108
   1256       Total IO sort cost: 10209.000000      Total CPU sort cost: 287025717
   1257       Total Temp space used: 109241000
   1258   SM join: Resc: 11946.030748  Resp: 11946.030748  [multiMatchCost=0.000000]
   1259 SM Join
   1260   SM cost: 11946.030748
   1261      resc: 11946.030748 resc_io: 11935.000000 resc_cpu: 413653033
   1262      resp: 11946.030748 resp_io: 11935.000000 resp_cpu: 413653033
   1263   Outer table:  STAG02  Alias: B
   1264     resc: 168.283569  card 1000.000000  bytes:   deg: 1  resp: 168.283569
   1265   Inner table:  STAG03  Alias: C
   1266     resc: 1354.988686  card: 100000.000000  bytes:   deg: 1  resp: 1354.988686
   1267     using dmeth: 2  #groups: 1
   1268     Cost per ptn: 0.286667  #ptns: 1
   1269     hash_area: 508 (max=76800) buildfrag: 128  probefrag: 6470  hmbio: 127  fanout: 8  ppasses = 1.0000
   1270   Hash join: Resc: 1523.558922  Resp: 1523.558922  [multiMatchCost=0.000000]
   1271 HA Join
   1272   HA cost: 1523.558922
   1273      resc: 1523.558922 resc_io: 1521.000000 resc_cpu: 95959572
   1274      resp: 1523.558922 resp_io: 1521.000000 resp_cpu: 95959572
   1275 Best:: JoinMethod: Hash
   1276        Cost: 1523.558922  Degree: 1  Resp: 1523.558922  Card: 1000.000000 Bytes:
   1277 ***********************
   1278 Best so far:  Table#: 0  cost: 15.020259  card: 1000.000000  bytes: 517000.000000
   1279               Table#: 1  cost: 168.283569  card: 1000.000000  bytes: 1034000.000000
   1280               Table#: 2  cost: 1523.558922  card: 1000.000000  bytes: 1552000.000000
   1281 ***********************

Again, very useful summary at the end of the “join section” :

   1278 Best so far:  Table#: 0  cost: 15.020259  card: 1000.000000  bytes: 517000.000000
   1279               Table#: 1  cost: 168.283569  card: 1000.000000  bytes: 1034000.000000
   1280               Table#: 2  cost: 1523.558922  card: 1000.000000  bytes: 1552000.000000
   1281 ***********************

You can easily match the numbers from “cost:” above with the corresponding “Cost (%CPU)” from the SQL Plan (rounded) :

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |  1000 |  1515K|  1524   (1)| 00:00:01 |
|*  1 |  HASH JOIN                  |        |  1000 |  1515K|  1524   (1)| 00:00:01 |
|*  2 |   HASH JOIN                 |        |  1000 |  1009K|   168   (0)| 00:00:01 |
|   3 |    TABLE ACCESS STORAGE FULL| STAG01 |  1000 |   504K|    15   (0)| 00:00:01 |
|   4 |    TABLE ACCESS STORAGE FULL| STAG02 | 10000 |  5048K|   153   (0)| 00:00:01 |
|   5 |   TABLE ACCESS STORAGE FULL | STAG03 |   100K|    49M|  1355   (1)| 00:00:01 |
--------------------------------------------------------------------------------------

One more surely interesting thing is, Cost Pruning (marked as “Join order aborted: cost > best plan cost” in the 10053 trace) also involves CPU cycles.
Typically it’s very efficient technique and brings quite a saving on CPU cycles, but I will show rather extreme example of Cost Pruning.
“Extreme”, since in that case the effort incurred “to save” seems to be even exceeding the full (and chosen) permutation. Admittedly, it’s rarely the case. Besides, it’s another opportunity to show several interesting spots.

Below the example of such Cost Pruning. Cost evaluation for the permutation for Join order[3]: STAG02[B]#1 STAG01[A]#0 STAG03[C]#2 (so: (B-A) – C ) with some interesting spottings. As before, pay attention to the inline comments marked with (->bold).
I deliberately left the entire dump of this discarded permutation to show the entire effort (see the consecutive line numbers on the left hand side).

   1310 Join order[3]:  STAG02[B]#1  STAG01[A]#0  STAG03[C]#2
   1311
   1312 ***************
   1313 Now joining: STAG01[A]#0
   1314 ***************
   1315 NL Join
   1316   Outer table: Card: 10000.000000  Cost: 153.216644  Resp: 153.216644  Degree: 1  Bytes:
   1317 Access path analysis for STAG01
   1318   Scan IO  Cost (Disk) =   13.854300
   1319   Scan CPU Cost (Disk) =   759715.200000
   1320   Cost of predicates:
   1321     io = NOCOST, cpu = 50.000000, sel = 0.000100 flag = 2048  ("A"."ID"="B"."ID")
   1322   Total Scan IO  Cost  =   13.854300 (scan (Disk))
   1323                          + 0.000000 (io filter eval) (= 0.000000 (per row) * 1000.000000 (#rows))
   1324                        =   13.854300
   1325   Total Scan CPU  Cost =   759715.200000 (scan (Disk))
   1326                          + 50000.000000 (cpu filter eval) (= 50.000000 (per row) * 1000.000000 (#rows))
   1327                        =   809715.200000
   1328   Inner table: STAG01  Alias: A
   1329   Access Path: TableScan
   1330     NL Join:  Cost: 138912.140697  Resp: 138912.140697  Degree: 1
   1331       Cost_io: 138696.000000  Cost_cpu: 8105276139
   1332       Resp_io: 138696.000000  Resp_cpu: 8105276139
   1333
   1334   Best NL cost: 138912.140697            -> lower than previously calculated (A-B), but other join types will be calculated below :SM and HJ (HA)
   1335           resc: 138912.140697  resc_io: 138696.000000  resc_cpu: 8105276139
   1336           resp: 138912.140697  resp_io: 138696.000000  resc_cpu: 8105276139
   1337   SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
   1338 Join Card:  1000.000000 = outer (10000.000000) * inner (1000.000000) * sel (1.0000e-04)
   1339 Join Card - Rounded: 1000 Computed: 1000.000000
   1340   Outer table:  STAG02  Alias: B
   1341     resc: 153.216644  card 10000.000000  bytes:   deg: 1  resp: 153.216644
   1342   Inner table:  STAG01  Alias: A
   1343     resc: 15.020259  card: 1000.000000  bytes:   deg: 1  resp: 15.020259
   1344     using dmeth: 2  #groups: 1
   1345     SORT ressource         Sort statistics
   1346       Sort width:         898 Area size:     1048576 Max Area size:   314572800
   1347       Degree:               1
   1348       Blocks to Sort: 709 Row size:     579 Total Rows:          10000
   1349       Initial runs:   2 Merge passes:  1 IO Cost / pass:        312
   1350       Total IO sort cost: 1021.000000      Total CPU sort cost: 60928830
   1351       Total Temp space used: 10937000
   1352     SORT ressource         Sort statistics
   1353       Sort width:         898 Area size:     1048576 Max Area size:   314572800
   1354       Degree:               1
   1355       Blocks to Sort: 71 Row size:     579 Total Rows:           1000
   1356       Initial runs:   1 Merge passes:  0 IO Cost / pass:          0
   1357       Total IO sort cost: 0.000000      Total CPU sort cost: 37949004
   1358       Total Temp space used: 0
   1359   SM join: Resc: 1191.873645  Resp: 1191.873645  [multiMatchCost=0.000000]
   1360 SM Join
   1361   SM cost: 1191.873645              -> the same as previously. No wonder, in both cases both tables need to be sorted, order does not matter
   1362      resc: 1191.873645 resc_io: 1189.000000 resc_cpu: 107761688
   1363      resp: 1191.873645 resp_io: 1189.000000 resp_cpu: 107761688

-> data for evaluating of the Hash Join (B-A) starts here :
   1364   Outer table:  STAG02  Alias: B
   1365     resc: 153.216644  card 10000.000000  bytes:   deg: 1  resp: 153.216644
   1366   Inner table:  STAG01  Alias: A
   1367     resc: 15.020259  card: 1000.000000  bytes:   deg: 1  resp: 15.020259
   1368     using dmeth: 2  #groups: 1
   1369     Cost per ptn: 247.139023  #ptns: 1
   1370     hash_area: 508 (max=76800) buildfrag: 646  probefrag: 65  hmbio: 127  fanout: 8  ppasses = 1.0000            -> clearly order is: B-A. Building of the Hash-Table 646 (10x bigger), probed 65 (10x smaller, clearly the table A is meant)
   1371   Hash join: Resc: 415.375926  Resp: 415.375926  
[multiMatchCost=0.000000]

-> now it's getting interesting. The cost of the Hash-Join (B-A) got calculated as 415, so higher than best computed previously (HJ: A-B): 168. The CBO clearly reverts the order and calculates again (or at best, copies) the cost of: A-B :

   1372   Outer table:  STAG01  Alias: A
   1373     resc: 15.020259  card 1000.000000  bytes:   deg: 1  resp: 15.020259
   1374   Inner table:  STAG02  Alias: B
   1375     resc: 153.216644  card: 10000.000000  bytes:   deg: 1  resp: 153.216644
   1376     using dmeth: 2  #groups: 1
   1377     Cost per ptn: 0.046667  #ptns: 1
   1378     hash_area: 508 (max=76800) buildfrag: 65  probefrag: 646  hmbio: 127  fanout: 8  ppasses = 1.0000
   1379   Hash join: Resc: 168.283569  Resp: 168.283569  [multiMatchCost=0.000000]        -> Cost of the Hash-Join (A-B): 168. No surprise, it calculated it previously.
-> Finally, it reports swapping! So from now on, it will continue the calculations of the join scenarios with C based on the Hash-Join with A as Hash-Table and B as Probed-Table :
   1380 HA Join
   1381   HA cost: 168.283569 swapped
   1382      resc: 168.283569 resc_io: 168.000000 resc_cpu: 10633854
   1383      resp: 168.283569 resp_io: 168.000000 resp_cpu: 10633854
   1384 Best:: JoinMethod: Hash
   1385        Cost: 168.283569  Degree: 1  Resp: 168.283569  Card: 1000.000000 Bytes:

-> now the product of the swapped! Hash-Join will serve as the input for ALL three scenarios (NL,SM,HJ) of joining it with C.
-> Surprisingly, all three scenarios will be evaluated again (perhaps copied?) with obviously the same outcome - compare the marked costs of NL,SM and HA with the previous and best permutation. No wonder, all in all we are joining the (swapped) A-B with C. We did that before..

   1386
   1387 ***************
   1388 Now joining: STAG03[C]#2
   1389 ***************
   1390 NL Join
   1391   Outer table: Card: 1000.000000  Cost: 168.283569  Resp: 168.283569  Degree: 1  Bytes:
   1392 Access path analysis for STAG03
   1393   Scan IO  Cost (Disk) =   1351.475000
   1394   Scan CPU Cost (Disk) =   74575717.760000
   1395   Cost of predicates:
   1396     io = NOCOST, cpu = 50.000000, sel = 0.000010 flag = 2048  ("B"."ID"="C"."ID")
   1397   Total Scan IO  Cost  =   1351.475000 (scan (Disk))
   1398                          + 0.000000 (io filter eval) (= 0.000000 (per row) * 100000.000000 (#rows))
   1399                        =   1351.475000
   1400   Total Scan CPU  Cost =   74575717.760000 (scan (Disk))
   1401                          + 5000000.000000 (cpu filter eval) (= 50.000000 (per row) * 100000.000000 (#rows))
   1402                        =   79575717.760000
   1403   Inner table: STAG03  Alias: C
   1404   Access Path: TableScan
   1405     NL Join:  Cost: 1353765.302710  Resp: 1353765.302710  Degree: 1
   1406       Cost_io: 1351643.000000  Cost_cpu: 79586351614
   1407       Resp_io: 1351643.000000  Resp_cpu: 79586351614
   1408
   1409   Best NL cost: 1353765.302710     -> NL cost
   1410           resc: 1353765.302710  resc_io: 1351643.000000  resc_cpu: 79586351614
   1411           resp: 1353765.302710  resp_io: 1351643.000000  resc_cpu: 79586351614
   1412   SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
   1413 Join Card:  1000.000000 = outer (1000.000000) * inner (100000.000000) * sel (1.0000e-05)
   1414 Join Card - Rounded: 1000 Computed: 1000.000000
   1415   Outer table:  STAG01  Alias: A
   1416     resc: 168.283569  card 1000.000000  bytes:   deg: 1  resp: 168.283569
   1417   Inner table:  STAG03  Alias: C
   1418     resc: 1354.988686  card: 100000.000000  bytes:   deg: 1  resp: 1354.988686
   1419     using dmeth: 2  #groups: 1
   1420     SORT ressource         Sort statistics
   1421       Sort width:         898 Area size:     1048576 Max Area size:   314572800
   1422       Degree:               1
   1423       Blocks to Sort: 141 Row size:     1148 Total Rows:           1000
   1424       Initial runs:   2 Merge passes:  1 IO Cost / pass:         64
   1425       Total IO sort cost: 205.000000      Total CPU sort cost: 41417745
   1426       Total Temp space used: 2352000
   1427     SORT ressource         Sort statistics
   1428       Sort width:         898 Area size:     1048576 Max Area size:   314572800
   1429       Degree:               1
   1430       Blocks to Sort: 7101 Row size:     580 Total Rows:         100000
   1431       Initial runs:   2 Merge passes:  1 IO Cost / pass:       3108
   1432       Total IO sort cost: 10209.000000      Total CPU sort cost: 287025717
   1433       Total Temp space used: 109241000
   1434   SM join: Resc: 11946.030748  Resp: 11946.030748  [multiMatchCost=0.000000]
   1435 SM Join
   1436   SM cost: 11946.030748       -> SM cost
   1437      resc: 11946.030748 resc_io: 11935.000000 resc_cpu: 413653033
   1438      resp: 11946.030748 resp_io: 11935.000000 resp_cpu: 413653033
   1439   Outer table:  STAG01  Alias: A
   1440     resc: 168.283569  card 1000.000000  bytes:   deg: 1  resp: 168.283569
   1441   Inner table:  STAG03  Alias: C
   1442     resc: 1354.988686  card: 100000.000000  bytes:   deg: 1  resp: 1354.988686
   1443     using dmeth: 2  #groups: 1
   1444     Cost per ptn: 0.286667  #ptns: 1
   1445     hash_area: 508 (max=76800) buildfrag: 128  probefrag: 6470  hmbio: 127  fanout: 8  ppasses = 1.0000
   1446   Hash join: Resc: 1523.558922  Resp: 1523.558922  [multiMatchCost=0.000000]
   1447 HA Join
   1448   HA cost: 1523.558922      -> Hash-Join cost
   1449      resc: 1523.558922 resc_io: 1521.000000 resc_cpu: 95959572
   1450      resp: 1523.558922 resp_io: 1521.000000 resp_cpu: 95959572

-> Finally, the trace reports that the calculated total cost for the third permutation exceeds the up to now computed the best cost and gets aborted. To be precise, it is equal, not greater.

   1451 Join order aborted: cost > best plan cost
   1452 ***********************

Let me recap.
This example above is certainly somewhat extreme, as usually the Cost Pruning mechanism is able to discard such permutation much earlier, therefore not bearing such burden of the associated math operations (obviously hammering CPU).
So, in the outlined example the trace reports that the calculated total cost for the third permutation exceeds (Join order aborted: cost > best plan cost) the up to now computed the best cost and gets aborted.
To be precise: it is equal, not greater and it most of all, it gets aborted after having gone through the entire calculation process.
So, in that case the effort incurred is arguably even bigger than the one associated with the fully calculated 1st (and selected) permutation.

Summing up, it’s hard to blame the optimizer – it was working with incorrect information and simply applied the math based on the data it was given.
What warrants attention, however, with the state of its false knowledge about the participating tables being empty and just subtle differences in the calculated values representing cost (extending to the third decimal place) the execution times may differ in several orders of magnitude.

Piotr Sajda Oracle Tuning Blog

Cartesian Join – the usual suspect (part2)

Analysis of joins with 10053 CBO trace

Examining of the selected sections of the 10053 trace file

BASE STATISTICAL INFORMATION

SINGLE TABLE ACCESS PATH

Details of costing of the joins based on the 10053 trace

Leave a Reply Cancel reply

All tags

Latest posts