One day, while browsing publicly available resources (blogs, white papers), I realized there’s a lack of easily accessible information on the following topic:
How to map the cost-based optimizer (CBO) formulas and numbers to the SQL Plan for the most basic scenario — comparing index range scan access vs. full table scan.

As we know, when the optimizer has a choice (e.g., there’s a suitable index for the predicate), it compares these two scenarios and makes a decision — whether to use index access or perform a full table scan. This decision is especially important in Exadata environment, where full table scans open up opportunities for additional performance optimizations.

In my attempt to address it, I’ll run a few test scenarios, referencing relevant sections of the 10053 trace, use and explain simple cost formulas and will match them with the corresponding lines from the SQL Plan. I’ll explain how these pieces fit together.
I will skip the CPU costing for two reasons: typically the CPU costing constitutes rather tiny part of the total cost. Secondly: to keep it simple.
Whenever applicable though, I will put the comments.

I’ll also show how, in certain situations, conditions that theoretically should degrade performance (i.e., increase cost) can lead to a positive surprise — although, of course, we don’t always want such surprises to happen.

A few words about the 10053 trace:
The content of the 10053 trace is a topic in itself, and I can’t stress enough how much a patient performance tuning expert can extract from it.
Arguably, it’s the second most common trace after 10046, as it reflects the decisions made by the CBO while building the final SQL Plan.
Of course, I’ll skip many (majority) of the 10053 details and focus on the key fragments that relate to the topic of this article.
At the end of this post though I recommend several blog posts and the video definitely worth reading and/or watching which touch the internals of 10053 event and CBO costing.


Have fun!


Here’s my test environment

I have created a test table TSTCBO with three columns. One of the columns, ID_SORTED, contains perfectly ordered numbers that align with the index entries. The column ID_MESS behaves in the opposite way — it stores values that are almost perfectly scattered around the table’s blocks.
As you may guess, the clustering factor (CLUF) for the ID_SORTED will be very low (close to number of table’s blocks), whereas the CLUF for the ID_MESS will show very high number being almost equal to the number of rows.
Finally, I created two indexes, one on each of these columns and gathered the optimizer statistics. The column PAD is simply a filler.

Here are my preparations and some data which is important for the CBO while calculating the cost :

20:40:39 SYS@CDBTST1->PDBTST> create table TSTUS.TSTCBO as 
select rownum as ID_SORTED, 
trunc( dbms_random.value(1,10e6) ) as ID_MESS, 
rpad('X',256,'X') as pad 
from dual connect by level <= 10e6 ;

Table created.

20:41:29 SYS@CDBTST1->PDBTST> create index TSTUS.IDX_ID_SORTED on TSTUS.TSTCBO ( ID_SORTED ) ;

Index created.

20:42:27 SYS@CDBTST1->PDBTST> create index TSTUS.IDX_ID_MESS on TSTUS.TSTCBO ( ID_MESS ) ;

Index created.


20:42:45 SYS@CDBTST1->PDBTST> alter session set workarea_size_policy=MANUAL ;

Session altered.

20:42:45 SYS@CDBTST1->PDBTST> alter session set sort_area_size=2047483648 ;

Session altered.

20:42:45 SYS@CDBTST1->PDBTST> alter session set hash_area_size=2047483648 ;

Session altered.

20:42:45 SYS@CDBTST1->PDBTST> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => '&OWNER' , tabname => '&tablename',cascade => true, estimate_percent => &estimate_percent, granularity => 'ALL', degree => &parallel_slaves);
Enter value for owner: TSTUS
Enter value for tablename: TSTCBO
Enter value for estimate_percent: 100
Enter value for parallel_slaves: 4

PL/SQL procedure successfully completed.

13:46:25 SYS@CDBTST1->PDBTST> select count( distinct(id_mess) ) cnt  from TSTUS.TSTCBO ;

             CNT
----------------
       6,321,696

OWNER                     TABLE_NAME                     COL_ID COLUMN_NAME                    DATA_TYPE             NDV     DENSITY  NUM_NULLS AVG_COL_LEN LOW_VAL                             HI_VAL                              HISTOGRAM       NUM_BUCKETS LAST_ANALYZED         SAMPLE_SIZE
------------------------- ------------------------------ ------ ------------------------------ ------------ ------------ ----------- ---------- ----------- ----------------------------------- ----------------------------------- --------------- ----------- -------------------- ------------
TSTUS                     TSTCBO                              2 ID_MESS                        NUMBER          6,321,696   .00000016          0           6 1                                   9999997                             NONE                      1 23-09-2024 20:42:59    10,000,000
TSTUS                                                         1 ID_SORTED                      NUMBER         10,000,000   .00000010          0           6 1                                   10000000                            NONE                      1 23-09-2024 20:42:59    10,000,000
TSTUS                                                         3 PAD                            VARCHAR2                1  1.00000000          0         257 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX    NONE                      1 23-09-2024 20:42:59    10,000,000


SEGMENT_NAME         SEGMENT_TYPE       TABLESPACE_NAME                      SEG_MB   LOBSIZE_MB     TOTAL_MB
-------------------- ------------------ ------------------------------ ------------ ------------ ------------
TSTUS.TSTCBO         TABLE              PSTS                                  3,012                     3,012
TSTUS.IDX_ID_SORTED  INDEX              PSTS                                    192                       192
TSTUS.IDX_ID_MESS    INDEX              PSTS                                    192                       192
OWNER              TABLENAME                      INDEX_NAME                     INDEX_TYPE      UNIQUENES COMPRESSION   IDX_PX         CLUSTF          TabBlks          TabRows       NDV LEAF_BLOCKS COLS PAR BLEVEL LEAF_BL_KEY DATA_BL_KEY ORPHANED_E STATUS     MYLAST_ANALYZED      VISIBILIT           MB
------------------ ------------------------------ ------------------------------ --------------- --------- ------------- -------- ------------ ---------------- ---------------- --------- ----------- ---- --- ------ ----------- ----------- ---------- ---------- -------------------- --------- ------------
TSTUS              TSTCBO                         IDX_ID_MESS                    NORMAL          NONUNIQUE DISABLED      1           9,999,973          385,478       10,000,000   6355456       23371    1 NO       2           1           1 NO         VALID      24-09-24 06:00       VISIBLE            192
TSTUS                                             IDX_ID_SORTED                  NORMAL          NONUNIQUE DISABLED      1             384,616          385,478       10,000,000   9914368       24002    1 NO       2           1           1 NO         VALID      24-09-24 06:00       VISIBLE            192
                   ******************************                                                                                                                                                                                                                                                   ------------
                   sum                                                                                                                                                                                                                                                                                       384

TEST with ordered ID

I will run the very simple SELECT statement which reaches our for 1/10 of the total table’s volume (1e6 rows out of 10e6 rows in total).
As you can see, I am flushing the buffer cache to gather the fresh runtime statistics (for that, I set “SET AUTOTRACE TRACE EXPLAIN”). I also set the 10053 event in order to intercept the CBO decisions.
The lines of interest are highlighted.

Here it is:

21:10:39 SYS@CDBTST1->PDBTST> @setts
21:10:41 SYS@CDBTST1->PDBTST> SET AUTOTRACE TRACE EXPLAIN STATISTICS
21:10:41 SYS@CDBTST1->PDBTST>
21:10:41 SYS@CDBTST1->PDBTST> @ssqlid
21:10:44 SYS@CDBTST1->PDBTST> set  echo on
21:10:44 SYS@CDBTST1->PDBTST>  set feedback on sql_id
21:10:44 SYS@CDBTST1->PDBTST> set echo off
21:10:44 SYS@CDBTST1->PDBTST> alter system flush buffer_cache ;
 
System altered.
 
21:10:45 SYS@CDBTST1->PDBTST> @tr10053
Enter value for file_identifier: ordered_1e6
old   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='&amp;file_identifier'
new   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='ordered_1e6'
21:10:45 SYS@CDBTST1->PDBTST> alter session set events ‘10053 trace name context forever, level 1’;
 
Session altered.
 
SQL_ID: 9vgruanb5369q
 
Session altered.
 
SQL_ID: 2smju82y9ky53
21:11:05 SYS@CDBTST1->PDBTST> select /* &amp;comment */ * from TSTUS.TSTCBO t where ID_SORTED &lt;= 1e6 ;
Enter value for comment: ordered_1e6
old   1: select /* &amp;comment */ * from TSTUS.TSTCBO t where ID_SORTED &lt;= 1e6
new   1: select /* ordered_1e6 */ * from TSTUS.TSTCBO t where ID_SORTED &lt;= 1e6
 
1000000 rows selected.
 
SQL_ID: fyvna3khjxwu9
Elapsed: 00:00:16.63
SQL_ID: g72kdvcacxvtf
 
Execution Plan
----------------------------------------------------------
SQL_ID: bb8ruw2zx8qww
Plan hash value: 4196823974
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |               |  1000K|   256M| 40883   (1)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TSTCBO        |  1000K|   256M| 40883   (1)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN                  | IDX_ID_SORTED |  1000K|       |  2409   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ID_SORTED"&lt;=1e6)
SQL_ID: 3s1hh8cvfan6w
 
SQL_ID: g72kdvcacxvtf
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     171311  consistent gets
      40690  physical reads
          0  redo size
  295802945  bytes sent via SQL*Net to client
     733762  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

The relevant section of 10053 trace file are below, whereby not all lines will be useful for us.
You can equally skip the content of that window as I will highlight the lines of interest along with the analysis which will follow – here is just the consistent fragment to give you the notion how it looks like.
Tip: when examining the 10053 trace with the intention to analyze the cost of the FTS and possible INDEX SCAN look for the string “SINGLE TABLE ACCESS PATH” and follow down until you’ll reach the section “****** Costing Index”.
Both sections contain the analysis paths and accompanied costs calculations based on CBO statistics exclusively or SYSTEM STATISTICS with an aid of CBO derived stats. Note, I used the “or” on purpose and will explain the reason later.

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TSTCBO  Alias:  T
  #Rows: 10000000  SSZ: 0  LGR: 0  #Blks:  385478  AvgRowLen:  269.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 1
  #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000
Index Stats::
  Index: IDX_ID_MESS  Col#: 2
  LVLS: 2  #LB: 23371  #DK: 6355456  LB/K: 1.00  DB/K: 1.00  CLUF: 9999973.00  NRW: 10000000.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 8192 BSZ: 1
  KKEISFLG: 1
  Index: IDX_ID_SORTED  Col#: 1
  LVLS: 2  #LB: 24002  #DK: 9914368  LB/K: 1.00  DB/K: 1.00  CLUF: 384616.00  NRW: 10000000.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 8192 BSZ: 1
  KKEISFLG: 1
try to generate single-table filter predicates from ORs for query block SEL$1 (#0)
finally: "T"."ID_SORTED"<=1e6

=======================================
SPD: BEGIN context at query block level
=======================================
Query Block SEL$1 (#0)
Return code in qosdSetupDirCtx4QB: NOCTX
=====================================
SPD: END context at query block level
=====================================
Access path analysis for TSTCBO
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for TSTCBO[T]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE

kkecdn: Single Table Predicate:"T"."ID_SORTED"<=1e6
  Column (#1): ID_SORTED(NUMBER)
    AvgLen: 6 NDV: 9914368 Nulls: 0 Density: 0.000000 Min: 1.000000 Max: 10000000.000000
  Estimated selectivity: 0.100000 , col: #1
  Table: TSTCBO  Alias: T
    Card: Original: 10000000.000000  Rounded: 1000000  Computed: 1000000.108637  Non Adjusted: 1000000.108637
  Scan IO  Cost (Disk) =   66758.000000
  Scan CPU Cost (Disk) =   4285158448.320000
  Cost of predicates:
    io = NOCOST, cpu = 50.000000, sel = 0.100000 flag = 2048  ("T"."ID_SORTED"<=1e6)
  Total Scan IO  Cost  =   66758.000000 (scan (Disk))
                         + 0.000000 (io filter eval) (= 0.000000 (per row) * 10000000.000000 (#rows))
                       =   66758.000000
  Total Scan CPU  Cost =   4285158448.320000 (scan (Disk))
                         + 500000000.000000 (cpu filter eval) (= 50.000000 (per row) * 10000000.000000 (#rows))
                       =   4785158448.320000
  Access Path: TableScan
    Cost:  66885.604225  Resp: 66885.604225  Degree: 0
      Cost_io: 66758.000000  Cost_cpu: 4785158448
      Resp_io: 66758.000000  Resp_cpu: 4785158448
****** Costing Index IDX_ID_SORTED
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
  Estimated selectivity: 0.100000 , col: #1
  Access Path: index (RangeScan)
    Index: IDX_ID_SORTED
    resc_io: 40865.000000  resc_cpu: 681018036
    ix_sel: 0.100000  ix_sel_with_filters: 0.100000
    Cost: 40883.160481  Resp: 40883.160481  Degree: 1
  Best:: AccessPath: IndexRange
  Index: IDX_ID_SORTED
         Cost: 40883.160481  Degree: 1  Resp: 40883.160481  Card: 1000000.108637  Bytes: 0.000000

***************************************

As you can see, the CBO opted for an Index Range Scan after comparing two key figures: the cost of a Full Table Scan (FTS) and the cost of an Index Range Scan. It chose the path with the lower cost — the option with the smaller value.
For the time being, I will leave the costing of FTS aside and will focus on the one which was chosen, i.e. INDEX RANGE SCAN.
As you can see, I have used the ID_SORTED column to retrieve the data from the table. ID_SORTED has very low CLUF.

Let us start from the basics and bring the general formula for the Index Range Scan cost calculations taking into account the corresponding lines of the SQL Plan.
So, in general, it can be put as follows :

I/O Cost of Index Range Scan = ( I/O cost of index access ) + ( I/O cost of table access based on this index )

So, it comprises two parts:

Reading of the index leaf blocks (let’s call it “index part”)
AND
Reading of the table (i.e. reaching out to the corresponding blocks of the table based on the ROWIDs following accessing the index leaf-blocks) – further referred to as “table part”.

Getting deeper and turning the above into the formula.
Total I/O Cost of Index Range Index Scan is sum of:

cost of index access: Index_LVL + CEIL( #LB * ix_sel )
AND
cost of the table access: CLUF * ix_sel

These terms are explained below along with the relevant entries from 10053 trace.

Index Part
===========
Let’s focus on the part of costing of accessing just the index segment.
For that we need to reach out to two sections of 10053 trace file:

1.
***************************************
BASE STATISTICAL INFORMATION
***********************
Index Stats::
  Index: IDX_ID_SORTED  Col#: 1
  LVLS: 2  #LB: 24002  #DK: 9914368  LB/K: 1.00  DB/K: 1.00  CLUF: 384616.00  NRW: 10000000.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 8192 BSZ: 1

2.
****** Costing Index IDX_ID_SORTED
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
  Estimated selectivity: 0.100000 , col: #1
  Access Path: index (RangeScan)
    Index: IDX_ID_SORTED
    resc_io: 40865.000000  resc_cpu: 681018036
    ix_sel: 0.100000  ix_sel_with_filters: 0.100000
    Cost: 40883.160481  Resp: 40883.160481  Degree: 1

Let’s follow the CBO’s calculations :

Level of the index, LVLS : 2
Number of leaf blocks of that index, #LB : 24002
ix_sel: 0.100000 (calculated selectivity based on the predicate: 1e6 out of total 10e6 rows, therefore the selectivity: 0.1).

After replacing the variables with the values in the formula we get :

2 + ceil( 24002 * 0.1 )  = 2 + ceil( 2400.2 ) = 2 + 2401 = 2403

As you can see, 2403 is roughly the value which is reported in the line with “INDEX RANGE SCAN” in the SQL Plan, namely 2409.

Note, this value representing the cost of accessing the index (still w/o table part!) is nowhere mentioned in the 10053 trace file, yet it will be incorporated in further calculations.

Btw. if you feel like experimenting with the hint /*+ NO_CPU_COSTING */ you’ll likely get the matching values, I mean the one being the result of the above formula and the one which will be reported in the SQL plan.
Let’s jump to the second component now.

Table Part
=============

Now, let’s calculate the “table part”, the second component.

As you can see, the value which is shown in the SQL plan in the line which seems to be representing just the “table part” (TABLE ACCESS BY INDEX ROWID BATCHED) is 40883.
Anticipating, be careful though. It represents the total cost, so comprising both, the cost of accessing the index and the table.
Here is how it’s calculated.

So let’s calculate the cost of accessing the “table part” alone.
The formula is: CLUF * ix_sel

Sticking to the way Oracle handles the costing of the index access, the above formula (so I believe) comes from the : (#TblBlks * ix_sel) * (CLUF / #TblBlks).
Meaning, the amount of table’s blocks satisfying the filtering criteria corrected by (multiplied by) the effort of accessing them (CLUF/#TblBlks).
If you take a closer look, you’ll notice that the total cost of the range index scan (both index and table parts) is purely driven by the number of single I/Os associated with accessing the index and the table.
What I am trying to point out by that, it is not influenced by factors like system statistics—unlike the cost of a full table scan, which heavily depends on those stats (we’ll cover that in more detail later).
Well, to be completely accurate, it is indirectly dependent on system statistics. This is because the cost of index access is always compared to the cost of an FTS, and as mentioned, the FTS cost largely relies on system statistics.

Now we need the following sections of 10053 trace (this time also the data related to the table itself):

1.
***************************************
BASE STATISTICAL INFORMATION
***********************
Index Stats::
  Index: IDX_ID_SORTED  Col#: 1
  LVLS: 2  #LB: 24002  #DK: 9914368  LB/K: 1.00  DB/K: 1.00  CLUF: 384616.00  NRW: 10000000.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 8192 BSZ: 1


and (again) :
2.
****** Costing Index IDX_ID_SORTED
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
  Estimated selectivity: 0.100000 , col: #1
  Access Path: index (RangeScan)
    Index: IDX_ID_SORTED
    resc_io: 40865.000000  resc_cpu: 681018036
    ix_sel: 0.100000  ix_sel_with_filters: 0.100000
    Cost: 40883.160481  Resp: 40883.160481  Degree: 1

Short remainder, we are at the piece of costing of the table access “table part” of the entire formula :

The entire index range scan cost =  Index_LVL + CEIL( #LB * ix_sel ) + ceil( cost of table access ), where the “cost of table access” is CLUF * ix_sel

Terms :
CLUF: 384616.00 (Clustering factor)
ix_sel: 0.100000  (explained above)

Quick math after replacing the variables with the values from the 10053 trace:

ceil( CLUF * ix_sel ) = ceil( 384616 * 0.100000 ) = ceil( 38461.6 ) = 38462

Well, now in turn the figure 38462 can’t be found in any of the lines of the SQL Plan.

Why is that?

Because the cost of TABLE ACCESS BY INDEX ROWID (batched): 40883 in the SQL Plan represents the accumulated value derived from the bare INDEX RANGE SCAN (“index part”) and the value which I refer to as “table part”.

So: 2409 (index part) + 38462 (table part) = 40871 (total I/O cost)

The above calculated value of 40871, which now represents the entire cost comprising the “index part” and “table part” is again very near to what 10053 and the SQL plan show : 40883.


Comments to the test from above

  • If you switch off the batching (alter session set “_optimizer_batch_table_access_by_rowid” = false), the cost will stay the same, although one might argue that the effort in general will be (should be) lower.
    Apparently, CBO costing does not take this optimization into account. Probably because of too deep dependency of how the I/O batching is realized. That in turn surely depends on the underlying OS.
  • The bare resc_io: 40865, so the cost without the cpu factoring in 10053 trace is also slightly different from what I came with (my figure: 40871).
    I will be fair, I can’t explain that without getting deeper.
    The cost figure present in the SQL Plan does comprise the CPU factoring, yet even applying the /*+ NO_CPU_COSTING */  makes the values shown in the 10053 and SQL Plan only partially matching the outcome of the formula.I decided not to explore it. Surely people like Jonathan Lewis would have good explanation, or perhaps they already put it somewhere in their blogs.
  • You might be wondering why, when calculating the cost that clearly involves the table segment, there’s no direct value from the 10053 trace which would be withdrawn from any section referring to that table (all these values come from the index relevant data).
    Well, it does. Note, the clustering factor (CLUF) represents the correlation between the index and the table. So, the “table part” is encapsulated in the second part of the main formula and is used in the best possible way, that is, it expresses the correlation between the index and the table (I can’t resist referring to it as “scale of mess between the indexed value and the table”).
  • if you pay little attention you’ll surely notice that the cost associated with index scan is purely based on the number of single I/Os which are associated with index+table access. To support that, take a look at the runtime “Statistics” section following the SQL Plan with the 40690 physical reads, being very close to resc_io: 40865 from 10053. Keep in mind that 40883 (visible in SQL Plan and in the 10053 trace) is the resc_io corrected by the CPU factoring.
-----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |               |  1000K|   256M| 40883   (1)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TSTCBO        |  1000K|   256M| 40883   (1)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN                  | IDX_ID_SORTED |  1000K|       |  2409   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     171311  consistent gets
      40690  physical reads
  • The last thing to be spotted: the duration of the query which used range index scan with perfectly pre-sorted data within the table’s blocks needed ~16 seconds.
    This post is not so much about the times elapsed & tuning. It is rather to emphasize how to match the lines of SQL Plan with the corresponding sections of the 10053 trace file.
    Nonetheless, I can’t resist devoting some comments on that topic in the subsequent parts of this post, especially while comparing the results of the next tests (FTS).


TEST with messed up ID (MESS_ID) and high clustering factor (high CLUF)

Now, I will essentially repeat that test, this time though I will use the ID_MESS in the predicate section (the one whose values are everything but sorted within the table’s blocks).
So, the test query will look like : select /* &comment */ * from TSTUS.TSTCBO t where ID_MESS <= 1e6 ;

Let’s run it :

20:30:08 SYS@CDBTST1->PDBTST> @ssqlid
20:30:11 SYS@CDBTST1->PDBTST> set  echo on
20:30:11 SYS@CDBTST1->PDBTST>  set feedback on sql_id
20:30:11 SYS@CDBTST1->PDBTST> set echo off
20:30:11 SYS@CDBTST1->PDBTST>
20:30:11 SYS@CDBTST1->PDBTST> @fb
20:30:11 SYS@CDBTST1->PDBTST> alter system flush buffer_cache ;

System altered.

20:30:12 SYS@CDBTST1->PDBTST>
20:30:12 SYS@CDBTST1->PDBTST> @tr10053
Enter value for file_identifier: messy_1e6
old   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='&amp;file_identifier'
new   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='messy_1e6'

Session altered.

SQL_ID: guudpgw2jaba0
Elapsed: 00:00:00.00

Session altered.

SQL_ID: 2smju82y9ky53
Elapsed: 00:00:00.00
13:01:19 SYS@CDBTST1->PDBTST>
13:01:19 SYS@CDBTST1->PDBTST> select /* &amp;comment */ * from TSTUS.TSTCBO t where ID_MESS &lt;= 1e6 ;
Enter value for comment: messy_1e6
old   1: select /* &amp;comment */ * from TSTUS.TSTCBO t where ID_MESS &lt;= 1e6
new   1: select /* messy_1e6 */ * from TSTUS.TSTCBO t where ID_MESS &lt;= 1e6

1000329 rows selected.

SQL_ID: b3y68t4bz5ckh
Elapsed: 00:00:03.89
SQL_ID: g72kdvcacxvtf

Execution Plan
----------------------------------------------------------
SQL_ID: 0f3mqmwtw9fsv
Plan hash value: 3826654347

------------------------------------------------------------------------------------
| Id  | Operation                 | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |        |  1000K|   256M| 66890   (1)| 00:00:03 |
|*  1 |  TABLE ACCESS STORAGE FULL| TSTCBO |  1000K|   256M| 66890   (1)| 00:00:03 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("ID_MESS"&lt;=1e6)
       filter("ID_MESS"&lt;=1e6)
SQL_ID: 3s1hh8cvfan6w

SQL_ID: g72kdvcacxvtf

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     384622  consistent gets
     384617  physical reads
          0  redo size
   28612899  bytes sent via SQL*Net to client
     734000  bytes received via SQL*Net from client
      66690  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000329  rows processed


The corresponding sections of 10053 trace which will be useful for our analysis are below.
As before, I will retrieve the strictly relevant parts later, here’s the entire section :

-----------------------------
SYSTEM STATISTICS INFORMATION
-----------------------------
Using dictionary system stats.
  Using NOWORKLOAD Stats
  CPUSPEEDNW: 3125 millions instructions/sec (default is 100)
  IOTFRSPEED: 4096 bytes per millisecond (default is 4096)
  IOSEEKTIM:  10 milliseconds (default is 10)
  MBRC:       128 blocks (default is 8)

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TSTCBO  Alias:  T
  #Rows: 10000000  SSZ: 0  LGR: 0  #Blks:  385478  AvgRowLen:  269.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 1
  #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000
Index Stats::
  Index: IDX_ID_MESS  Col#: 2
  LVLS: 2  #LB: 23371  #DK: 6355456  LB/K: 1.00  DB/K: 1.00  CLUF: 9999973.00  NRW: 10000000.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 8192 BSZ: 1
  KKEISFLG: 1
  Index: IDX_ID_SORTED  Col#: 1
  LVLS: 2  #LB: 24002  #DK: 9914368  LB/K: 1.00  DB/K: 1.00  CLUF: 384616.00  NRW: 10000000.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 8192 BSZ: 1
  KKEISFLG: 1
try to generate single-table filter predicates from ORs for query block SEL$1 (#0)
finally: "T"."ID_MESS"<=1e6

=======================================
SPD: BEGIN context at query block level
=======================================
Query Block SEL$1 (#0)
Return code in qosdSetupDirCtx4QB: NOCTX
=====================================
SPD: END context at query block level
=====================================
Access path analysis for TSTCBO
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for TSTCBO[T]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE

kkecdn: Single Table Predicate:"T"."ID_MESS"<=1e6
  Column (#2): ID_MESS(NUMBER)
    AvgLen: 6 NDV: 6355456 Nulls: 0 Density: 0.000000 Min: 1.000000 Max: 9999997.000000
  Estimated selectivity: 0.100000 , col: #2
  Table: TSTCBO  Alias: T
    Card: Original: 10000000.000000  Rounded: 1000001  Computed: 1000000.973451  Non Adjusted: 1000000.973451
  Scan IO  Cost (Disk) =   66758.000000
  Scan CPU Cost (Disk) =   4465158468.320000
  Cost of predicates:
    io = NOCOST, cpu = 50.000000, sel = 0.100000 flag = 2048  ("T"."ID_MESS"<=1e6)
  Total Scan IO  Cost  =   66758.000000 (scan (Disk))
                         + 0.000000 (io filter eval) (= 0.000000 (per row) * 10000000.000000 (#rows))
                       =   66758.000000
  Total Scan CPU  Cost =   4465158468.320000 (scan (Disk))
                         + 500000000.000000 (cpu filter eval) (= 50.000000 (per row) * 10000000.000000 (#rows))
                       =   4965158468.320000
  Access Path: TableScan
    Cost:  66890.404226  Resp: 66890.404226  Degree: 0
      Cost_io: 66758.000000  Cost_cpu: 4965158468
      Resp_io: 66758.000000  Resp_cpu: 4965158468
****** Costing Index IDX_ID_MESS
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
  Estimated selectivity: 0.100000 , col: #2
  Access Path: index (RangeScan)
    Index: IDX_ID_MESS
    resc_io: 1002339.000000  resc_cpu: 7528097438
    ix_sel: 0.100000  ix_sel_with_filters: 0.100000
    Cost: 1002539.749265  Resp: 1002539.749265  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 66890.404226  Degree: 1  Resp: 66890.404226  Card: 1000000.973451  Bytes: 0.000000

***************************************

Clearly, following comparing the costs of index based access and full table scan, the CBO chose (following the calculations) to apply FTS. See entry: “Best:: AccessPath: TableScan Cost: 66890.404226

The numbers it compared are :
Cost of index range scan (accumulated: index part + table part) : 1002539 
Cost of full table scan (FTS) : 66758
Both incorporate CPU factoring, yet this add-on can almost be ignored.

CBO had no doubts, the cost of FTS is by far cheaper and we are talking here about the order of magnitude.
Undoubtedly (and I will come to that), the extraordinarily high cost of index based access is caused by the high CLUF (clustering factor).

First, let’s jump to the formula of costing of the full table scan :

Full Table Scan (FTS) I/O Cost   =   1 + CEIL( #MRds * (MREADTIM / SREADTIM))

Where :
#MRds     =      #Blks / MBRC
SREADTIM =      IOSEEKTIM + db_block_size / IOTFRSPEED
MREADTIM =      IOSEEKTIM + db_block_size * MBRC / IOTFRSPEED

For the accurate meaning of those terms I recommend that you reach out to Neil Chandler’s blog under this link. In addition, this one is also worth to take a look at.
Here’s just the simplified summary for our purpose :
#Blks – number of blocks of the table in question (the only one from the CBO statistics!).
MBRC – Multi Block Read Count (typically derived from db_file_multiblock_read_count)
SREADTIM – Single Block Read Time (calculated as above)
MREADTIM – Multi Block Read Time (calculated as above)

Let’s list the factors from the 10053 trace which we will soon use in the formula :
Firstly and foremost, this time the calculated cost will largely depend on the data derived from the system statistics (there will be just one exception which comes from the CBO stats: namely, number of blocks of the table below the HWM).

-----------------------------
SYSTEM STATISTICS INFORMATION
-----------------------------
Using dictionary system stats.
  Using NOWORKLOAD Stats
  CPUSPEEDNW: 3125 millions instructions/sec (default is 100)
  IOTFRSPEED: 4096 bytes per millisecond (default is 4096)
  IOSEEKTIM:  10 milliseconds (default is 10)
  MBRC:       128 blocks (default is 8)

Further on, we will need the db_block_size (instance parameter – 8192) and just only one value which comes directly from the CBO statistics, that is #Blks (number of blocks of the table below the high water mark (HWM)).

Let’s do the math.
First, let’s calculate the number of multi block reads which are expected to be carried out while reading the entire table (FTS):

#MRds = #Blks / MBRC  
#MRds = 385478 / 128
#MRds = 3011.55

Now, estimated Single Block Read Time (SREADTIM):
SREADTIM     =     IOSEEKTIM + db_block_size / IOTFRSPEED
SREADTIM     =     10 + 8192 / 4096
SREADTIM      =      12

Now, Multiblock Read Time (MREADTIM):
MREADTIM     =      IOSEEKTIM + db_block_size * MBRC / IOTFRSPEED
MREADTIM     =      10  + 8192 * 128 / 4096
MREADTIM     =      266

Finally, having these three factors calculated we can put them all under the formula :
FTS I/O Cost = 1 + CEIL( #MRds * (MREADTIM / SREADTIM))
FTS I/O Cost = 1 + CEIL( 3011.55 * (266 / 12))
FTS I/O Cost = 66757

If we reach out to the corresponding section of 10053 trace file as well as to SQL Plan, we will spot the (closely) matching values :

10053 trace file :

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TSTCBO  Alias:  T
  #Rows: 10000000  SSZ: 0  LGR: 0  #Blks:  385478  AvgRowLen:  269.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 1

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for TSTCBO[T]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE

kkecdn: Single Table Predicate:"T"."ID_MESS"<=1e6
  Column (#2): ID_MESS(NUMBER)
    AvgLen: 6 NDV: 6355456 Nulls: 0 Density: 0.000000 Min: 1.000000 Max: 9999997.000000
  Estimated selectivity: 0.100000 , col: #2
  Table: TSTCBO  Alias: T
    Card: Original: 10000000.000000  Rounded: 1000001  Computed: 1000000.973451  Non Adjusted: 1000000.973451
  Scan IO  Cost (Disk) =   66758.000000
  Scan CPU Cost (Disk) =   4465158468.320000
  Cost of predicates:
    io = NOCOST, cpu = 50.000000, sel = 0.100000 flag = 2048  ("T"."ID_MESS"<=1e6)
  Total Scan IO  Cost  =   66758.000000 (scan (Disk))
                         + 0.000000 (io filter eval) (= 0.000000 (per row) * 10000000.000000 (#rows))
                       =   66758.000000 
  Total Scan CPU  Cost =   4465158468.320000 (scan (Disk))
                         + 500000000.000000 (cpu filter eval) (= 50.000000 (per row) * 10000000.000000 (#rows))
                       =   4965158468.320000
  Access Path: TableScan
    Cost:  66890.404226  Resp: 66890.404226  Degree: 0

SQL Plan :


------------------------------------------------------------------------------------
| Id  | Operation                 | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |        |  1000K|   256M| 66890   (1)| 00:00:03 |
|*  1 |  TABLE ACCESS STORAGE FULL| TSTCBO |  1000K|   256M| 66890   (1)| 00:00:03 |
------------------------------------------------------------------------------------

As you can see, the cost depicted in the SQL Plan is the final cost from 10053 incorporating the CPU factoring.


Comments to the test from above

Comment #1.
That below should be emphasized aloud and I haven’t mentioned that deliberately waiting till this moment.
If you take a look at the elapsed time of the test query it took ~4 sec. as oppose to the previous run which took ~16 sec.

So, a query which in theory should take much longer runs 4 time faster.

Let’s dive a bit deeper.
This SQL was executed with full table scan.
Why? Because the CBO calculated and compared two numbers: cost of index based access (pretty high: 1002539) and the cost of FTS (much lower: 66890).
Obviously, it decided to go for FTS as its cost is significantly lower.
Now, why is the cost of index based access so high?
Because the clustering factor (CLUF) grew to extreme. If you analyze the formula of costing the index access you’ll spot that – comparing the previous test scenario – this is the only factor which drastically changed.

Short reminder, the entire cost of index range scan (index and table part) is calculated based on this formula –
it’s the sum of :
cost of index access = Index_LVL + CEIL( #LB * ix_sel )
AND
cost of the table access = CLUF * ix_sel

If you go through the corresponding sections of 10053 (or just apply logical thinking..) you’ll surely spot that all the factors except CLUF remain or are very close to the previously used values when the CBO decided to use the index IDX_ID_SORTED.
If you don’t wanna waste your time, scroll some lines below, I have carried out that analysis for you.
Here’s the corresponding excerpt from 10053 trace :

****** Costing Index IDX_ID_MESS
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
  Estimated selectivity: 0.100000 , col: #2
  Access Path: index (RangeScan)
    Index: IDX_ID_MESS
    resc_io: 1002339.000000  resc_cpu: 7528097438
    ix_sel: 0.100000  ix_sel_with_filters: 0.100000
    Cost: 1002539.749265  Resp: 1002539.749265  Degree: 1

In the examined case the only index which could be used is the IDX_ID_MESS.
There is no reason that the physical characteristics (so, also the CBO stats) of that index might vastly change when comparing it to the previously used IDX_ID_SORTED.
This is also a good opportunity to defeat fairly common misunderstanding that rebuilding of an index may improve (lower) the clustering factor.
All in all, the new index needs similar number of leaf-blocks, it has the same level (btw. the least important in the formula) and the CBO calculated the same selectivity for the predicate which I used for testing, that is <= 1e6.
Referring to the above, in fact, the number of rows returned is slightly higher as ( trunc( dbms_random.value(1,1e6) ) didn’t manage to produce distinct values within the specified range).
Ultimately, it has no significance at all, since the CBO calculated the selectivity as 0.1 (slightly wrong) and at the end this value was used in the formula.

Now – don’t take it seriously, but the above can be considered a rare case of quite efficient tuning which relies on making the situation worse, and thanks to that the final outcome is much better : )
To be specific, because we made the clustering factor extremely suboptimal, the CBO after applying the math, decided to choose the full table scan which in theory should yell longer elapsed times than the theoretically (and mathematically!) better choice of using quite optimal range index scan over the perfectly pre-sorted range of values (I am talking now about the previous test case with 16 seconds elapsed time and theoretically efficient range index scan based on ID_SORTED).
Now, with full table scan which was previously discarded due to higher cost than the “efficient” index based access with the elapsed time close to ~16 seconds, it runs just ~4 seconds (!)

13:01:19 SYS@CDBTST1->PDBTST> select /* &comment */ * from TSTUS.TSTCBO t where ID_MESS <= 1e6 ;
Enter value for comment: messy_1e6
old   1: select /* &comment */ * from TSTUS.TSTCBO t where ID_MESS <= 1e6
new   1: select /* messy_1e6 */ * from TSTUS.TSTCBO t where ID_MESS <= 1e6
 
1000329 rows selected.
 
SQL_ID: b3y68t4bz5ckh
Elapsed: 00:00:03.89


This CBO decision would surely be correct in the non-Exadata world, but the examined case benefited from very decent offload with 88.5 % efficiency. The number is obvious if you realize that the query returned slightly more than the 10% of the entire data volume- see the predicate ID_MESS <= 1e6 from the table with 10e6 rows).
It also reached to flash cache, but let’s be fair – all queries read from the flash cache, regardless to whether they used FTS or index access.

Let’s verify the statistics though.

Below is the outcome of my script (honestly, not all is mine, there are several entries which I stole from several other blogs, mainly from Tanel Poder’s, I guess).
The outcome line is rather long and it shows many useful information regarding , let’s just focus on some of them :

Offl[%] = 88.52  (explained above. IO_CELL_OFFLOAD_ELIGIBLE_BYTES and IO_INTERCONNECT_BYTES combined, see the script)
Flash[%] = 100%  (nice, I/O latency surely close to 100 microseconds, but see the comments below)
avgIO[KB] = 926  (ah, definitely FTS, considering the size of single I/O request)
phyreqs/exec = 3323. This requires short explanation supported with quick math:
3323 (PIOs) * 926 (size of single I/O) = 3’077’098 [KB], which makes the entire segment size. If you reach out to the script you’ll find the source of that: as the name of the source column suggests, PHYSICAL_READ_REQUESTS is the number of I/O requests.
So, it’s not the value which was returned by offload (IO_INTERCONNECT_BYTES). It’s merely the value which was requested to be read from the storage.
avg[ms]/exec 1677 ms. Note, my SQL session reported nearly 4sec (4000 ms). The difference should be accounted for the relatively long lasting fetch phase of ~1 mln rows, not incorporated within the ELAPSED_TIME from v$sql).

If you fancy it, you’ll find the SQL script which produced that output is below at the end of that post.

                                 Bind  Bind           Adapt 4reoptim              SQL  SQL
INS TYPE    SQL_ID           CHLD Sens  Aware?   sha   Plan  cardfeed #inval Binds Prof Patch PLAN_HASHVAL CBO_HASHVAL PARS_SCHEMA                    Exec-from:      #pars  TOTAL_EXECS     cpu/exec[us] avg[ms]/exec avg[s]/exec  AVGPX Offl[%] Flash[%]    rows/exec buffgets/exec phyreqs/exec avgIO[KB] FIRST_SEEN           LAST_ACTIVE
---- ------- --------------- ----- ----- -------- ----- ----- -------- ------ ----- ---- ----- ------------ ----------- ------------------------------ ------------ -------- ------------ ---------------- ------------ ----------- ------ ------- -------- ------------ ------------- ------------ --------- -------------------- --------------------
   1 SELECT  b3y68t4bz5ckh       0 N     N        Y           N             0 No    No   No      3826654347  2071179281 SYS                            top-lvl-sql         1            1        1,107,990        1,677           2      0   88.52   100.00    1,000,329        384622         3323       926 25-09-2024 21:02:14  25-09-2024 21:02:17

Comment #2.
Careful reader might have spotted another oddity here (perhaps there are more of them, I have just that one on my mind :).
If you take a close look at the summary of SYSTEM STATISTICS INFORMATION from the 10053 trace section, you’ll notice that the MBRC equals to 128, whereas the default value is 8.
This is because I have experimented with that factor and forgot to set it back to 8.
Now, I encourage you to experiment with the combination of the numbers participating in that formula by the means of an xls sheet (Excel is pretty good with such simple simulations “what if”).
So, for example, keeping all others factors intact, provided you’d change just the MBRC back to 8, you’ll receive the cost of FTS equal to 104’402, so still away from the 1’002’539 which is the total index range cost.
Btw. you can easily manipulate the SYSTEM STATISTICS with dbms_stats.set_system_stats, so for example dbms_stats.set_system_stats(‘MBRC’, 128) would set the MBRC to 128 system wide(!) .





-----------------------------
SYSTEM STATISTICS INFORMATION
-----------------------------
Using dictionary system stats.
  Using NOWORKLOAD Stats
  CPUSPEEDNW: 3125 millions instructions/sec (default is 100)
  IOTFRSPEED: 4096 bytes per millisecond (default is 4096)
  IOSEEKTIM:  10 milliseconds (default is 10)
  MBRC:       128 blocks (default is 8)

I cannot emphasize more how powerful this feature (offload) is. I dare to state that I am not overly optimistic here.
Pity though, the currently utilized optimization approach does not fully take that into account while calculating the SQL Plan.
It’s obvious that the CBO utilizes its optimization algorithms somewhat in separation from the fact that it operates in the Exadata environment.
I have witnessed situations when even though the CBO has calculated the FTS to be a better choice than index scan and there were appropriate conditions for offload to kick in, the second phase (runtime) decision failed, because there were too many blocks of the target segment cached in the buffer cache.
I have published another post describing that caveat.

To be fair, I am presenting somewhat extreme situations and – now being extreme in the opposite direction – for example it’s fairly obvious that index unique access will always be faster than full table scan with an offload over a large segment, even if accompanied by storage index.
In such obvious cases the math carried by the CBO will be correct and won’t stay in contradiction to the expected elapsed times.
The problems may appear when the CBO needs to deal with SQLs e.g. applying relatively wide range scans, which in addition tend to present non-uniform data distribution which isn’t (or cannot be) properly handled by histograms.

TEST with messy ID with forced index access

A Promise is a promise – I am referring to the following which I mentioned above:
If you go through the corresponding sections of 10053 (or just are in the mood to apply logical thinking), all the factors except CLUF remain or are very close to the previously used values when the CBO decided to use the index IDX_ID_SORTED.

If you don’t, scroll some lines below, I have carried out that analysis for you.

Here it is.

20:17:09 SYS@CDBTST1->PDBTST> alter system flush buffer_cache ;

System altered.

20:17:17 SYS@CDBTST1->PDBTST>  set feedback on sql_id
20:17:21 SYS@CDBTST1->PDBTST> SET AUTOTRACE TRACE EXPLAIN STATISTICS
20:17:22 SYS@CDBTST1->PDBTST>
20:17:22 SYS@CDBTST1->PDBTST> @tr10053
20:17:27 SYS@CDBTST1->PDBTST>
20:17:27 SYS@CDBTST1->PDBTST> set timing on
20:17:27 SYS@CDBTST1->PDBTST>
20:17:27 SYS@CDBTST1->PDBTST> ALTER SESSION SET TRACEFILE_IDENTIFIER='&amp;file_identifier';
Enter value for file_identifier: badcluf_hinted
old   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='&amp;file_identifier'
new   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='badcluf_hinted'

Session altered.

SQL_ID: ct5j3q4zkk4pu
Elapsed: 00:00:00.00
20:17:40 SYS@CDBTST1->PDBTST> alter session set events '10053 trace name context forever, level 1';

Session altered.

SQL_ID: 2smju82y9ky53
Elapsed: 00:00:00.00
20:17:40 SYS@CDBTST1->PDBTST>
20:17:41 SYS@CDBTST1->PDBTST> select /* &amp;comment */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSTCBO T  where ID_MESS &lt;= 1e6
15:19:15   2  /
Enter value for comment: badcluf_hinted
old   1: select /* &amp;comment */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSTCBO T  where ID_MESS &lt;= 1e6
new   1: select /* badcluf_hinted */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSTCBO T  where ID_MESS &lt;= 1e6

1000329 rows selected.

SQL_ID: 88kwk2u0zyp4m
Elapsed: 00:00:32.63
SQL_ID: g72kdvcacxvtf

Execution Plan
----------------------------------------------------------
SQL_ID: 4ja279tkwxv31
Plan hash value: 3542504988

---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |  1000K|   256M|  1002K  (1)| 00:00:40 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TSTCBO      |  1000K|   256M|  1002K  (1)| 00:00:40 |
|*  2 |   INDEX RANGE SCAN                  | IDX_ID_MESS |  1000K|       |  2346   (1)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID_MESS"&lt;=1e6)
SQL_ID: 3s1hh8cvfan6w

SQL_ID: g72kdvcacxvtf

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    1069104  consistent gets
     361957  physical reads
          0  redo size
  295900289  bytes sent via SQL*Net to client
     734033  bytes received via SQL*Net from client
      66690  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000329  rows processed

For completion, the output of the script I have mentioned above :

                                                           Candid.
                                 Bind  Bind           Adapt 4reoptim              SQL  SQL
INS TYPE    SQL_ID         CHLD Sens  Aware?   sha   Plan  cardfeed #inval Binds Prof Patch PLAN_HASHVAL CBO_HASHVAL PARS_SCHEMA                    Exec-from:      #pars  TOTAL_EXECS     cpu/exec[us] avg[ms]/exec avg[s]/exec  AVGPX Offl[%] Flash[%]    rows/exec buffgets/exec phyreqs/exec avgIO[KB] FIRST_SEEN           LAST_ACTIVE
---- ------- ------------- ----- ----- -------- ----- ----- -------- ------ ----- ---- ----- ------------ ----------- ------------------------------ ------------ -------- ------------ ---------------- ------------ ----------- ------ ------- -------- ------------ ------------- ------------ --------- -------------------- --------------------
   1 SELECT  88kwk2u0zyp4m     0 N     N        Y           N             0 No    No   No      3542504988  2071179281 SYS                            top-lvl-sql         1            1       14,492,766       26,141          26      0     .00   100.00    1,000,329       1069104       361957         8 26-09-2024 15:19:31  26-09-2024 15:20:04


The relevant sections of 10053 trace file :

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TSTCBO  Alias:  T
  #Rows: 10000000  SSZ: 0  LGR: 0  #Blks:  385478  AvgRowLen:  269.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 1
  #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000
Index Stats::
  Index: IDX_ID_MESS  Col#: 2
  LVLS: 2  #LB: 23371  #DK: 6355456  LB/K: 1.00  DB/K: 1.00  CLUF: 9999973.00  NRW: 10000000.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 8192 BSZ: 1
  KKEISFLG: 1
    User hint to use this index


=======================================
SPD: BEGIN context at query block level
=======================================
Query Block SEL$1 (#0)
Return code in qosdSetupDirCtx4QB: NOCTX
=====================================
SPD: END context at query block level
=====================================
Access path analysis for TSTCBO
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for TSTCBO[T]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE

kkecdn: Single Table Predicate:"T"."ID_MESS"<=1e6
  Column (#2): ID_MESS(NUMBER)
    AvgLen: 6 NDV: 6355456 Nulls: 0 Density: 0.000000 Min: 1.000000 Max: 9999997.000000
  Estimated selectivity: 0.100000 , col: #2
  Table: TSTCBO  Alias: T
    Card: Original: 10000000.000000  Rounded: 1000001  Computed: 1000000.973451  Non Adjusted: 1000000.973451
****** Costing Index IDX_ID_MESS
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
  Estimated selectivity: 0.100000 , col: #2
  Access Path: index (RangeScan)
    Index: IDX_ID_MESS
    resc_io: 1002339.000000  resc_cpu: 7528097438
    ix_sel: 0.100000  ix_sel_with_filters: 0.100000
    Cost: 1002539.749265  Resp: 1002539.749265  Degree: 1
  Best:: AccessPath: IndexRange
  Index: IDX_ID_MESS
         Cost: 1002539.749265  Degree: 1  Resp: 1002539.749265  Card: 1000000.973451  Bytes: 0.000000

***************************************

Again, boring reminder: the total I/O cost of index range scan is the sum of:
cost of index access = Index_LVL + CEIL( #LB * ix_sel )
AND
cost of the table access = ceil ( CLUF * ix_sel )

The first part, which is the “index part” is obviously comparable with the costing of IDX_ID_SORTED.
Let’s calculate it :

Index_LVL + CEIL( #LB * ix_sel ) = 2 + ceil( 23371 * 0.1 ) = 2 + 2338 = 2340.

As before, that value is not visible in the 10053 trace, but it is to be spotted in the SQL Plan (2346) – the value very close to the result of our calculation.

Now, cost of accessing the “table part” through that index based on the estimated selectivity = ceil ( CLUF * ix_se ) :

ceil( CLUF * ix_sel ) = ceil ( 9999973 * 0.1 ) = 999998
The total cost = 2340 + 999998  =  1002338
Which again, is very close to the I/O cost reported in the 10053 trace: resc_io: 1002339.000000

As before, the value present in SQL Plan : 1002K (K is the multiplicator and stands for 1000) in the line with “TABLE ACCESS BY INDEX ROWID BATCHED” is calculated as the sum of the cost of index access + cost of table access (rounded).

Worth to mention, the CBO didn’t bother costing the FTS as we used hint – this is visible close to the top of the show section of the 10053 trace: User hint to use this index.

You might be wondering again.

Previously (while performing the index range scan with IDX_ID_SORTED), the number of physical reads reported within the Statistics right underneath the SQL Plan almost perfectly matched the figure representing the I/O cost.

Previously :

-----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |               |  1000K|   256M| 40883   (1)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TSTCBO        |  1000K|   256M| 40883   (1)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN                  | IDX_ID_SORTED |  1000K|       |  2409   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     171311  consistent gets
      40690  physical reads

As we already know, for the index range scan the I/O cost lies very near the number of physical reads (8KB size), provided all the blocks were read from the disk, so following e.g. flushing of the buffer cache.

So, why is it so different in that case ?
Take a close look :


---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |  1000K|   256M|  1002K  (1)| 00:00:40 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TSTCBO      |  1000K|   256M|  1002K  (1)| 00:00:40 |
|*  2 |   INDEX RANGE SCAN                  | IDX_ID_MESS |  1000K|       |  2346   (1)| 00:00:01 |
---------------------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    1069104  consistent gets
     361957  physical reads

So, the 1002K (the accurate number in the 10053 trace file: 1’002’339) is rather away from 361’957, isn’t it ?

I will repeat myself (slightly rephrasing) from my other post :
The reason becomes obvious if you realize that the ordering of ID_MESS is.. messy, i.e. the ID_MESS is scattered around the table’s blocks – it’s everything but being adjacent to the ordered data in the index.
Meaning, we deal with very high clustering factor, being close to number of rows and away from the number of blocks in the table.
Having that in mind you’ll get to the following conclusion:
The first several batch reads brought back the portion of rows which may not be used on the immediate basis, but will surely be needed later.
In the meantime they got cached in the SGA. This means, later such batch size (simplifying: number of physical reads) will get reduced as some rows will be satisfied from the buffer cache.
At first, this sounds quite promising, but in reality, it’s problematic.
I am explaining this behavior in that post: AVG_DATA_BLOCKS_PER_KEY and AVG_LEAF_BLOCKS_PER_KEY demystified. If you need immediate explanation, jump to the TEST #1 within this post, skipping the intro.

There’s another side effect of having such a high CLUF, and it’s tied to other potential issues — all negative (as I’ve covered it in the above mentioned post).
With a high CLUF, a lot of the data brought into the buffer cache ends up being redundant. To put it in perspective, imagine a full table scan satisfying your query but reading far too many blocks than necessary. It’s a similar situation taking e.g. a 8KB block into account – the number of rows within such single block satisfying your predicate is very low (perhaps just 1).
As a result, each single I/O fetch only brings very few rows matching the predicate. Unfortunately, these blocks end up at the cold end of the LRU list, so they don’t get flushed as quickly as blocks from a full table scan. And yet, for the purpose of this predicate, most of those blocks are essentially useless.

I hope it helps.


Blogs and the video definitely worth reading and/or watching which touch the internals of 10053 event and CBO costing.

There are plenty of good articles, but I feel obliged to recommend these two which are pretty complementary:

  1. Immortal “Wolfgang Breitling: A Look under the Hood of CBO – the 10053 Event” – google it and don’t get discouraged that it reaches back to Oracle 9i.
  2. Another one which I really like: “Event 10053 Tracing the CBO” by Carlos Sierra on YB: here. Not very fresh either (11 yo as of the date of publishing this post), but I am sure you’ll appreciate the comprehensive way Carlos explains it.

Some others definitely worth reading:

Ah, the body of the SQL script which I mentioned some lines above :

set lines 430

col first_seen form a20
col type form a7
col LAST_ACTIVE form a20
col "cpu/exec[us]" form 999,999,999,999
col CBO_HASHVAL form 9999999999
col ins form 999
col chld form 9999
col PARS_SCHEMA form a30
col "bi-sens" form a5
col "bi-aware" form a8
col  "sha" form a5
col "#inval" form 9999
col "#pars" form 999,999
col "Binds" form a5
col "SQLProf" form a4  head "SQL|Prof"
col "SQLPatch" form a5  head "SQL|Patch"
col "avgIO[KB]" form 9999
col avgpx form 99999
col "Exec-from:" form a12
col "conc/exec" form 999,999
col "avg[s]/exec" form 999,999
col "avg[ms]/exec" form 999,999
col "I/O_svd[%]" form 99.99
col PLAN_HASHVAL form 99999999999
col  "Offl[%]"  form 999.99
col  "Flash[%]"  form 999.99
col "Line#" form 9999999
col "Exe-from:line#" form a28
col "Reoptim?" form A8 head "Candid.|4reoptim.|cardfeed"
col "bi-aware" head "Bind|Aware?"
col "Adap-sql" form a5 head "Adapt|Plan"
col "bi-sens" head "Bind|Sens"
col FORCE_MATCHING_SIGNATURE form 99999999999999999999999999999
col total_execs form 999,999,999
col "rows/exec" form 999,999,999



SELECT
inst_id ins, decode(command_type,2,'INSERT',3,'SELECT',6,'UPDATE',7,'DELETE') type,
sql_id, CHILD_NUMBER chld,
IS_BIND_SENSITIVE "bi-sens"  ,
IS_BIND_AWARE  "bi-aware"  ,
IS_SHAREABLE   "sha",
case when
 ( select to_number( substr( banner , ( select instr(banner,'Release ') + length('Release ') from v$version where banner like 'Oracle Database%'), 2 ) )  from v$version  where banner like 'Oracle Database%' ) >= 12
 then
 IS_RESOLVED_ADAPTIVE_PLAN
 else 'N/A'
end "Adap-sql",
case when
 ( select to_number( substr( banner , ( select instr(banner,'Release ') + length('Release ') from v$version where banner like 'Oracle Database%'), 2 ) )  from v$version  where banner like 'Oracle Database%' ) >= 12
 then
 IS_REOPTIMIZABLE
 else 'N/A'
end "Reoptim?",
INVALIDATIONS "#inval",
case when BIND_DATA  is null then 'No'  else 'Yes' end "Binds",
case when SQL_PROFILE is null then 'No' else 'Yes' end  "SQLProf",
case when SQL_PATCH is null then 'No' else 'Yes' end  "SQLPatch",
-- hash_value,
PLAN_HASH_VALUE PLAN_HASHVAL, OPTIMIZER_ENV_HASH_VALUE CBO_HASHVAL,
-- FORCE_MATCHING_SIGNATURE,
PARSING_SCHEMA_NAME pars_schema,
-- case when PROGRAM_ID is null or PROGRAM_ID = 0 then 'no-dep' else 'exec from PL/SQL' end "Exe-from:line#",
-- case when PROGRAM_ID is null then 'no-dep' else (select  object_name  from dba_objects o where PROGRAM_ID = o.OBJECT_ID)||':'||PROGRAM_LINE# end "Exe-from:line#",
case when PROGRAM_ID is null OR  PROGRAM_ID = 0 then 'top-lvl-sql' else (select  substr(object_name,1,10)  from dba_objects o where PROGRAM_ID = o.OBJECT_ID) end "Exec-from:",
-- PROGRAM_LINE# "Line#" ,
PARSE_CALLS "#pars",
executions total_execs,
CPU_TIME/decode(nvl(executions,0),0,1,executions) "cpu/exec[us]",
round( (elapsed_time/1000)/decode(nvl(executions,0),0,1,executions) / decode(px_servers_executions,0,1,px_servers_executions/decode(nvl(executions,0),0,1,executions)) , 2) "avg[ms]/exec",
-- CONCURRENCY_WAIT_TIME/decode(nvl(executions,0),0,1,executions) "conc/exec",
round( (elapsed_time/1000/1000)/decode(nvl(executions,0),0,1,executions) / decode(px_servers_executions,0,1,px_servers_executions/decode(nvl(executions,0),0,1,executions)) , 2) "avg[s]/exec",
px_servers_executions/decode(nvl(executions,0),0,1,executions) avgpx,
case when IO_CELL_OFFLOAD_ELIGIBLE_BYTES > 0 then
decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,0,100 * (IO_CELL_OFFLOAD_ELIGIBLE_BYTES - IO_INTERCONNECT_BYTES)/decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,1,IO_CELL_OFFLOAD_ELIGIBLE_BYTES))
else 0 end "Offl[%]",
case when OPTIMIZED_PHY_READ_REQUESTS > 0 then
-- decode(PHYSICAL_READ_REQUESTS,0,0,100 * (OPTIMIZED_PHY_READ_REQUESTS-PHYSICAL_READ_REQUESTS) / decode(PHYSICAL_READ_REQUESTS,0,1,PHYSICAL_READ_REQUESTS))
 OPTIMIZED_PHY_READ_REQUESTS/PHYSICAL_READ_REQUESTS*100
else 0 end "Flash[%]",
-- PHYSICAL_READ_REQUESTS, OPTIMIZED_PHY_READ_REQUESTS,
round(ROWS_PROCESSED/ decode(nvl(executions,0),0,1,executions) ,2) "rows/exec",
round(BUFFER_GETS/ decode(nvl(executions,0),0,1,executions)  ,0) "buffgets/exec",
round(PHYSICAL_READ_REQUESTS/decode(nvl(executions,0),0,1,executions)  ,0) "phyreqs/exec",
-- round( (DISK_READS/ decode(nvl(executions,0),0,1,executions)  )  /   (decode(PHYSICAL_READ_REQUESTS,0,1)/  decode(executions,0,1) )   ,1 ) "avgIO[KB]",
round( ( (PHYSICAL_READ_BYTES/1024)/ decode(nvl(executions,0),0,1,executions)  )  /   (decode(nvl(PHYSICAL_READ_REQUESTS,0),0,1,PHYSICAL_READ_REQUESTS)/  decode(nvl(executions,0),0,1,executions)    )   ,1 ) "avgIO[KB]",
to_date(FIRST_LOAD_TIME,'yyyy-mm-dd/hh24:mi:ss') first_seen,
to_char(LAST_ACTIVE_TIME,'dd-mm-yyyy hh24:mi:ss') LAST_ACTIVE
FROM GV$SQL
WHERE
    sql_id = nvl('&sqlid' , sql_id)
AND to_char(inst_id) = nvl('&inst_id' , to_char(inst_id))
AND upper(sql_text) like upper(nvl('&sql_text_prefix%' , sql_text))
AND upper(PARSING_SCHEMA_NAME) like upper(nvl('&parsing_schema',PARSING_SCHEMA_NAME))
AND upper(PARSING_SCHEMA_NAME) not like upper(nvl('&NOT_THIS_parsing_schema','__deFinitelLy__does_not_Exisssst'))
AND sql_text not like 'SELECT /* donotlookup_thisquery */%'
AND PLAN_HASH_VALUE like nvl('&PLAN_HASH_VALUE%' , PLAN_HASH_VALUE)
AND FORCE_MATCHING_SIGNATURE = to_number( nvl('&FORCE_MATCHING_SIGNATURE',FORCE_MATCHING_SIGNATURE) )
AND LAST_ACTIVE_TIME > ( sysdate -   nvl('&How_Many_Minutes_Back',99999999)/24/60 )
AND    (elapsed_time/1000/1000)/decode(nvl(executions,0),0,1,executions) / decode(px_servers_executions,0,1,px_servers_executions/decode(nvl(executions,0),0,1,executions)) >= to_number( nvl('&min_ela_secs',0) )
order by   "avg[s]/exec", LAST_ACTIVE_TIME, CHILD_NUMBER, LAST_ACTIVE_TIME,  total_execs
/

2 responses to “Calculations of I/O cost in 10053 trace vs. SQL Plan”

  1. Immense knowledge with conviction. Thanks for explaining detailed and well formatted.

    1. Nitin, thank you.
      Indeed, formatting takes time.. I slowly become the WP expert : )

Leave a Reply

Your email address will not be published. Required fields are marked *