AVG_DATA_BLOCKS_PER_KEY and AVG_LEAF_BLOCKS_PER_KEY demystified

You surely came across these two columns while going through DBA_% or Oracle documentation.

AVG_LEAF_BLOCKS_PER_KEY
AVG_DATA_BLOCKS_PER_KEY

They are to be spotted “here and there”. I bet though you may tend to skip them as not important or.. not really well understood, so not important.
I will make an attempt to demystify them and show how valuable they may be while trying to understand sudden drop of performance of some SQL queries following e.g. the following scenario :
Imagine a table reorganization (redefinition, etc) with its target table which could have been created with the thoughtfully chosen ORDER BY clause. On one hand that outcome table will be created along with nicely presorted column(s) – well, surely there must have been a reason for that..
There can be a danger lurking behind the scenes though. Namely, the values of another column – up to that time perhaps fairly nicely presorted within the source table, now got scattered around the blocks of the new table. The situation may get really nasty if this messed up column appears in the predicate section of some SQLs which unexpectedly are running unbearably slow now.
Why?
The CBO got new statistics and recalculated the cost for a SQL, then either kept on with index range scan, but with much higher ratio of the physical reads or jumped to full table scan, which paradoxically may even be a better choice if all takes place on Exadata.
Anyway, let us assume things went wrong.
Taking the opportunity, let’s clear out another fairly common misunderstanding that rebuilding of the index will help. Unfortunately, it won’t. Only the change carried out on the level of the underlying table can improve the performance and it will hopefully be obvious soon enough.

Introduction

Both: AVG_LEAF_BLOCKS_PER_KEY and AVG_DATA_BLOCKS_PER_KEY are to be found in DBA_INDEXES, or the family thereof, DBA_IND_PARTITIONS, etc.
But as much as AVG_LEAF_BLOCKS_PER_KEY is indeed strictly related to the index characteristics, the AVG_DATA_BLOCKS_PER_KEY is not (so strictly).

Let’s reach out to doc for AVG_DATA_BLOCKS_PER_KEY : https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/ALL_INDEXES.html

Here is what you can read about AVG_LEAF_BLOCKS_PER_KEY :
Average number of leaf blocks in which each distinct value in the index appears, rounded to the nearest integer. For indexes that enforce UNIQUE and PRIMARY KEY constraints, this value is always 1.

And here is the definition of AVG_DATA_BLOCKS_PER_KEY :
Average number of data blocks in the table that are pointed to by a distinct value in the index rounded to the nearest integer. This statistic is the average number of data blocks that contain rows that contain a given value for the indexed columns.

It may easily slip out from the radar that at the first place the AVG_DATA_BLOCKS_PER_KEY is the column which describes the underlying table, not the index. A bit misleading, considering these statistics are to be found in the tables describing indexes, isn’t it?
So how should that be interpreted?

Let’s make up an example.
Your SQL plan picks an index block range scan and needs to read, say, 4 index leaf blocks for a value of MYSTRING ‘ABC’.
Then it calculates the rowids and jumps to the table’s blocks.
Now, ideally if it visits just 4 table blocks in order to satisfy that range scan to retrieve the selected value (not being present in that index) from that table, e.g. some ADDRESS column.

So, again keeping it simple – ideally it would need just 9 physical reads (I added the root index block: 1 + 4 + 4).
But unfortunately, it may also need more reads from the corresponding table, say, 16.
That means, the indexed values must be scattered around in the table’s blocks, so they are not so densely packed (sorted along the index).
In that case the query would need 21 physical reads (1 + 4 + 16).
So, anticipating, in the latter case the values of AVG_LEAF_BLOCKS_PER_KEY and AVG_DATA_BLOCKS_PER_KEY would yell 4 and 16, respectively.

Note, the metrics AVG_LEAF_BLOCKS_PER_KEY and AVG_DATA_BLOCKS_PER_KEY can give you a good sense of feeling how “messy” the data is within the table, particularly when the indexed value isn’t unique. While these metrics (esp. AVG_DATA_BLOCKS_PER_KEY) aren’t obviously the same as clustering factor (CLUF), they tend to be closely related under certain conditions. We’ll dive deeper into this connection shortly. I will also post another article explaining the anatomy of clustering factor, also supported by live examples. So, don’t be surprised that the term clustering factor will emerge “here and there” throughout this post.
In addition, the AVG_LEAF_BLOCKS_PER_KEY and AVG_DATA_BLOCKS_PER_KEY, although being listed in the 10053 trace, they do not seem to be utilized by the optimizer while calculating the cost of a SQL query (CLUF is), yet they still are useful indicators for.. you.

To be more pictorial, consider a query :
select COLOR from Table where STRING3 = ‘ABC’;
Looking at my masterpiece below and assuming index range scan, this query will need to visit just one table block in order to retrieve the values of COLOR based on the predicate STRING3 = ‘ABC’ since all required data is located in just one table block :

Data in the index leaf block (obviously sorted):
 ______________
|              |
|rowidx ABC    |
|rowidy ABC    |
|rowidz ABC    |
|ABD           |
|ABE           |
|ABE           |
|ABF           |
|______________|

Corresponding table block (block #n)
_____________________
|                    |
|rowidx ABC 1 Blue   |
|rowidz ABC 2 Black  |
|rowidy ABC 2 Grey   |
|(....) ABD 1 White         |
|…                   |
|…___________________|

Now, sticking to this scenario and assuming scattered data in the table below, the same query:
select COLOR from Table where STRING3 = ‘ABC’
will now need three physical reads from the table in order to fetch the values of column COLOR based on the same predicate ‘ABC’ (see below).

The rows in the table containing ‘ABC’ are no longer so nicely packed – they are scattered around these three blocks.
Note, this circumstance may lead to even less optimistic scenario: the blocks themselves don’t need obviously be adjacent on disk, they may be “away” from one another. This will surely worsen the situation, since – without getting into details of e.g. possible caching on storage level, etc. – while being read directly from HDD, such reads may involve the prolonged HDD head move and seek time.

Data in the index leaf block (index is sorted – nothing changes here):
 ______________
|              |
|rowidx ABC    |
|rowidy ABC    |
|rowidz ABC    |
|ABD           |
|ABE           |
|ABE           |
|ABF           |
|______________|


But the corresponding data in the table now spans 3 blocks :

table block #5
 ____________________
|                    |
|XYZ 1 Green         |
|rowidx ABC 2 Black  |
|AVF 5 Brown         |
|ABD 1 White         |
|____________________|

table block #804
 ____________________
|                    |
|rowidz ABC 1 Blue   |
|HGZ 0 Purple        |
|ZUF 9 Brown         |
|ABD 1 White         |
|____________________|

table block #14264
 ____________________
|                    |
|XRT 4 Pink          |
|ABR 3 Black         |
|AVU 8 Brown         |
|rowidy ABC 2 Grey   |
|____________________|

So let’s get even deeper and make up a live scenario

I have created a simple table paying attention that the column TSDATE gets very messy, that means, the values which will then be nicely sorted within the index will be scattered around the table’s blocks.
I achieved that by: sysdate-trunc( dbms_random.value(0,365*10) ) as TSDATE

SYS@TSTCDB1->TSTPDB01> create table TSUSR.TESTAVGD 
as 
select rownum as ID, sysdate-trunc( dbms_random.value(0,365*10) ) as TSDATE , rpad('X',100,'X') as padding100 
from dual connect by level &lt;= 20e6 ;

Table created.



SYS@TSTCDB1->TSTPDB01> create index TSUSR.idx_TESTAVGD_TSDATE on TSUSR.TESTAVGD (TSDATE) tablespace TSTTSPACE ;

Index created.


SYS@TSTCDB1->TSTPDB01> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => ‘&amp;OWNER’ , tabname => ‘&amp;TAB’,cascade => true, estimate_percent => &amp;estimate_percent, granularity => ‘ALL’, degree => &amp;parallel_slaves);

Enter value for owner: TSUSR
Enter value for tab: TESTAVGD
Enter value for estimate_percent: 100
Enter value for parallel_slaves: 4

PL/SQL procedure successfully completed.

I will now create logically identical table, but this time with perfectly sorted TSDATE values.
Then eventually I have created the corresponding index IDX_TBAVGD_SORTED_TSDATE on TSDATE.

SYS@TSTCDB1->TSTPDB01> create table TSUSR.TESTAVGD_SORTED tablespace TSTTSPACE as select * from TSUSR.TESTAVGD order by TSDATE ;

Table created.


This is what I got :

INDEX_NAME                                                                       COLUMN_NAME                    ORDER 
-------------------------------------------------------------------------------- ------------------------------ ----- 
TSUSR.TESTAVGD_SORTED->IDX_TBAVGD_SORTED_TSDATE                                  TSDATE                             1 
TSUSR.TESTAVGD->IDX_TESTAVGD_TSDATE                                              TSDATE                             1

So, for both indexes, the Table blocks (TabBlks), Leaf Blocks, Table rows, AVG_LEAF_BLOCKS_PER_KEY are obviously the same.

What largely differs are the :
CLUSTF (clustering factor)
AVG_DATA_BLOCKS_PER_KEY

I need you to note these two numbers for future reference, that is :
AVG_DATA_BLOCKS_PER_KEY for the pre-sorted table : 91
AVG_DATA_BLOCKS_PER_KEY for the messy table : 5435

Admittedly, it’s an extreme scenario, but not entirely unrealistic.

INDEX_NAME                                                                       COLUMN_NAME                    ORDER 
-------------------------------------------------------------------------------- ------------------------------ ----- 
TSUSR.TESTAVGD_SORTED->IDX_TBAVGD_SORTED_TSDATE                                  TSDATE                             1 
TSUSR.TESTAVGD->IDX_TESTAVGD_TSDATE                                              TSDATE                             1 




TABLENAME                      INDEX_NAME                      AVG_LEAF_BLOCKS_PER_KEY   AVG_DATA_BLOCKS_PER_KEY CLUSTF      TabBlks      TabRows       NDV LEAF_BLOCKS      
------------------------------ ------------------------------  ------------------------  ----------------------- ----------- ------------ -----------  ---- ------------
TESTAVGD_SORTED                IDX_TBAVGD_SORTED_TSDATE        14                        91                       333,802    334,146      20,000,000   3650 53051            
TESTAVGD                       IDX_TESTAVGD_TSDATE             14                        5435                     19,838,660 334,148      20,000,000   3650 53051

TEST #1

Test with the messed up table. High ratio of AVG_DATA_BLOCKS_PER_KEY against AVG_LEAF_BLOCKS_PER_KEY. High Clustering Factor

Let’s see the SQL Plan of a query on the “messy” table which goes through the index range scan reaching out to table.

SYS@TSTCDB1->TSTPDB01> select * from TSUSR.TESTAVGD where TSDATE >=  sysdate-90 ;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 4026916625
 
--------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |   498K|    54M| 91054   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS STORAGE FULL| TESTAVGD |   498K|    54M| 91054   (1)| 00:00:04 |
--------------------------------------------------------------------------------------

Ooops… CBO refused to use the index following the calculation of the cost of a full table scan of TESTAVGD having compared it with the alternative cost of the range index scan.

So why exactly has the CBO rejected the range index scan?
The answer is: the final value of the formula expressing the cost of index range scan got largely increased by just one component: it is the CLUF (clustering factor).
Then, following the cost comparison of both, FTS and Index Range Scan, the CBO decided to go for full table scan.
Note, the “index component” of the cost of such index range scan won’t be huge, but increased by the cost of reading the corresponding rows from the table makes it a game changer.
I have showed it in another post how is the cost of the range index scan calculated and how it is compared with the cost of accessing the same data through full table scan.

So, for our purpose I need to “encourage” the CBO to choose index range scan.
There are several ways to do it, let’s fake the number of table blocks (see the faked “BLOCKS” below). With that the cost of the FTS will increase considerably, thus the index range scan will appear to CBO more attractive :

SYS@TSTCDB1->TSTPDB01> exec dbms_stats.set_table_stats('&amp;owner','&amp;table',numblks=>&amp;set_new_number_of_blocks ) ;
Enter value for owner: TSUSR
Enter value for table: TESTAVGD
Enter value for set_new_number_of_blocks: 334148*10
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.10
SYS@TSTCDB1->TSTPDB01>
SYS@TSTCDB1->TSTPDB01> @lstab
 
TABLE_NAME                                       TABLESPACE_NAME COMPRESS COMPRESS_FOR       BLKSIZE     BLOCKS         NUM_ROWS AVG_ROW_LEN   
------------------------------------------------ --------------- -------- --------------- ---------- ---------- ---------------- ----------- 
TSUSR.TESTAVGD_SORTED                            TSTTSPACE      DISABLED                       8192     334146       20,000,000         115  
TSUSR.TESTAVGD                                   TSTTSPACE      DISABLED                       8192    3341480       20,000,000         115

Let’s try again – this time the FTS won’t appear so attractive and the index access path will be chosen :

SYS@TSTCDB1->TSTPDB01> SET AUTOTRACE TRACE EXPLAIN STATISTICS

 
SYS@TSTCDB1->TSTPDB01> @tr
SYS@TSTCDB1->TSTPDB01> set echo on
SYS@TSTCDB1->TSTPDB01>
SYS@TSTCDB1->TSTPDB01> exec dbms_monitor.session_trace_enable(session_id=>&amp;sid_to_trace,serial_num=>&amp;serial_to_trace,binds=>true,waits=>true);
Enter value for sid_to_trace: 249
Enter value for serial_to_trace: 4463
 
PL/SQL procedure successfully completed.
 
SYS@TSTCDB1->TSTPDB01> set echo off
SYS@TSTCDB1->TSTPDB01> select * from TSUSR.TESTAVGD where TSDATE >=  sysdate-90 ;
 
471382 rows selected.
 
Elapsed: 00:00:24.36
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1510503301
 
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |   476K|    52M|   473K  (1)| 00:00:19 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TESTAVGD            |   476K|    52M|   473K  (1)| 00:00:19 |
|*  2 |   INDEX RANGE SCAN                  | IDX_TESTAVGD_TSDATE |   476K|       |  1301   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("TSDATE">=SYSDATE@!-90)
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     500667  consistent gets
     254913  physical reads
          0  redo size
   11161666  bytes sent via SQL*Net to client
     346098  bytes received via SQL*Net from client
      31427  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     471382  rows processed

As you noticed, I have also set the 10046 event to trace my session :
exec dbms_monitor.session_trace_enable(session_id=>&sid_to_trace,serial_num=>&serial_to_trace,binds=>true,waits=>true);

We have several statistics worth examining .
First, physical reads.

Since AUTOTRACE reports the number of physical reads in the units of 8KB we can try to match the number reported above with the intercepted number of physical reads in the raw 10046 trace file (I flushed the buffer cache before).
Let’s get the object ids first in order to match them in the 10046 trace file :

OBJECT                                                       SUBOBJECT_NAME             OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE          
------------------------------------------------------------ ------------------------- ---------- -------------- -------------------- 
TSUSR.TESTAVGD                                                                             138275         138275 TABLE                
TSUSR.IDX_TESTAVGD_TSDATE                                                                  138277         138277 INDEX

..and apply some grepping on the raw trace file :

[oracle @ TSTCDB1 @ testdbhost01 ]$ # cat TSTCDB1_ora_210677.trc  | grep physical | egrep 'obj#=138277|obj#=138275' | wc -l
31998

Wait.. the AUTOTRACE says I have read 254913 8KB blocks and the 10046 raw trace file reported the number: 31998 . So it does not match.
Let’s grep out just index (object_id = 138277) related physical reads and subtract the obtained number from the total – by that we’ll get the physical reads of the table :

[oracle @ TSTCDB1 @ testdbhost01 ]$ # cat TSTCDB1_ora_210677.trc  | grep physical | egrep 'obj#=138277|obj#=138275' | grep 138277 | wc -l
1326

31998-1326 = 30672 –> so this is supposed to be the number of physical reads performed on behalf of the table.
It still does not match – not even close.
Well, actually it does.. and does not.
Just take a close look again what the SQL plan tells you: TABLE ACCESS BY INDEX ROWID BATCHED.
The reads are batched and this is what is reported in the 10046 trace file. Even more interesting – we’ll see it later – the trace file reports just the “placeholders”, silently skipping the real I/Os.

Example of such one entry from the 10046 trace :

WAIT #140708079983920: nam=’cell list of blocks physical read’ ela= 358 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655735269

If you take a look at the sections: cellhash#=0 and diskhash#=0 (yes, I am on Exadata) it’s obvious it can’t be a final physical I/O read. Still, there is very useful data which should draw attention: blocks=15.
So, as an example, in that very line above the batch size amounts to 15 physical I/Os which were not reported here, but were surely performed as a cluster of reads underneath.

As oppose to the index read :

WAIT #140708079983920: nam=’cell single block physical read: flash cache’ ela= 283 cellhash#=1810516308 diskhash#=3394455528 bytes=8192 obj#=138277 tim=34169656139538

It’s clearly the ultimate I/O call – note, the cellhash & diskhash are provided. So is the size of the I/O (8192 bytes, so it may e.g. be an index leaf block).

Let’s get a bit further. So what’s the average batch size?
If we are about to rely on AUTOTRACE, it’s supposed to be the number of reported physical reads in the AUTOTRACE divided by the number of reads performed while going after the table’s blocks:

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     500667  consistent gets
     254913  physical reads
          0  redo size
   11161666  bytes sent via SQL*Net to client
     346098  bytes received via SQL*Net from client
      31427  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     471382  rows processed

254913 / 30672 = 8.310 – so this many blocks on average are read on behalf of a single batch read.
You might ask, why do I talk about the average value- shouldn’t that be just the fixed value? Are there any deviations? We will get to that soon.

First, let’s verify that number based on the raw 10046 trace file.

For better understanding what I am after :
In sample#1 – I grep for physical reads directed to the table (it’s just the tiny fragment of the trace file)
In sample#2 – I list the same lines, but narrow the result to batch size only.

sample #1:
cat TSTCDB1_ora_210677.trc  | grep physical | egrep 'obj#=138277|obj#=138275' | grep -v 138277   --> I filtered out all index related I/Os and left over physical reads directed to the table :
 
WAIT #140708079983920: nam='cell list of blocks physical read' ela= 358 cellhash#=0 diskhash#=0 blocks=13 obj#=138275 tim=34169655687570
WAIT #140708079983920: nam='cell list of blocks physical read' ela= 568 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655689565
WAIT #140708079983920: nam='cell list of blocks physical read' ela= 419 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655691236
WAIT #140708079983920: nam='cell list of blocks physical read' ela= 319 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655692890
WAIT #140708079983920: nam='cell list of blocks physical read' ela= 505 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655694254
WAIT #140708079983920: nam='cell list of blocks physical read' ela= 395 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655695751
WAIT #140708079983920: nam='cell list of blocks physical read' ela= 451 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655696782
WAIT #140708079983920: nam='cell list of blocks physical read' ela= 512 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655698108
WAIT #140708079983920: nam='cell list of blocks physical read' ela= 431 cellhash#=0 diskhash#=0 blocks=14 obj#=138275 tim=34169655699284
 
 
Sample #2:
cat TSTCDB1_ora_210677.trc  | grep physical | egrep 'obj#=138277|obj#=138275' | grep -v 138277  | awk -F "blocks=" '{print $2}' | awk -F " obj#" '{print $1}'
8
11
12
9
8
11
6
7
13
8

Finally, the average batch size :


cat TSTCDB1_ora_210677.trc  | grep physical | egrep 'obj#=138277|obj#=138275' | grep -v 138277  | awk -F "blocks=" '{print $2}' | awk -F " obj#" '{sum += $1} END {print "Avg of batch size : " sum/NR}'
Avg of batch size : 8.47607 [batch size]
 
Pretty close to the result obtained from AUTOTRACE:  254913/30672 = 8.310 [batch size]

Coming back to “You might ask, why do I talk about the average value?”
Let’s get back to this topic.

If you had had that raw trace file at hand and had taken a close look, you would have noticed that the very first lines reported the “15”, then.. just shoot a glance (I used ‘less’ to scroll through the output from the beginning, jumping several pages down, and then even further):

cat TSTCDB1_ora_210677.trc  | grep physical | egrep ‘obj#=138277|obj#=138275’ | grep -v 138277  | awk -F “blocks=” ‘{print $2}’ | awk -F ” obj#” ‘{print $1}’ | less

Very beginning of the trace file (almost all are read with the batch size of 15 single block reads) :

13
15
15
15
15
15
15
15
14
15
15

 
But some hundreds line below it changes to not so many "fifteens" :
11
12
14
15
11
14
13
9
14
12
10
9
12
9
13
10
 
Then again, scrolling further down leads us to gradually smaller numbers : 
11
10
9
7
11
7
9
9
10
12
9
6
7
 
Then at the very end of the trace file, look: they are not even close to 15 : 
9
8
4
6
8
3
7
7
3
10
7
3
3
3

Clearly, as time progresses, the batch size decreases.

The reason becomes obvious if you realize that TSDATE is scattered around table’s blocks – it’s everything but being adjacent to the ordered data in the index.

With that in mind, you’ll reach the following conclusion:
The first batch reads brought back the portion of rows which may not be used on the immediate basis, but will surely be needed later.
In the meantime they got cached in the SGA.
As the scan continued, the batch size gradually reduced because some rows were already in the cache, having been read during earlier operations.

So, nice optimization, right? This is the best what Oracle can do under such circumstance, but it comes with a cost.
Let’s be clear—such unsorted data negatively impacts both buffer cache and CPU utilization.
The excessive buffer cache usage is straightforward: we may have to read significantly more blocks to return the portion of data that satisfies the predicate. These blocks, once cached, occupy more space in the SGA.
But why does it impact CPU? This is somewhat speculative, but it feels accurate.
Each subsequent batch I/O must be checked against the buffer cache to potentially reduce the size of the batched read if some rows our query needs are already cached. Essentially, the system is asking, “why fetch from disk if it’s already in memory?” However, scanning through the longer chains of cached buffers requires background mechanisms—and one thing is for sure: all of these checks rely on CPU cycles.
I’m not saying that with perfectly pre-sorted data within the table’s blocks, these checks don’t happen—they do. The key difference is that with unsorted data, there are significantly more of these checks taking place, statistically speaking.

Let’s come back to the true topic of this post though.
Kind reminder :

OWNER              TABLENAME                      INDEX_NAME                     INDEX_TYPE      UNIQUENES COMPRESSION   IDX_PX         CLUSTF          TabBlks          TabRows       NDV LEAF_BLOCKS COLS PAR BLEVEL LEAF_BL_KEY DATA_BL_KEY
------------------ ------------------------------ ------------------------------ --------------- --------- ------------- -------- ------------ ---------------- ---------------- --------- ----------- ---- --- ------ ----------- -----------
TSUSR              TESTAVGD                       IDX_TESTAVGD_TSDATE            NORMAL          NONUNIQUE DISABLED      1          19,838,660          334,148       20,000,000      3650       53051    1 NO       2          14        5435

The most important numbers from above (scroll to the right) :

AVG_LEAF_BLOCKS_PER_KEY (in my script LEAF_BL_KEY) : 14
AVG_DATA_BLOCKS_PER_KEY (in my script DATA_BL_KEY) : 5435
Clustering factor: 19,838,660
TabRows: 20,000,000
Table Blocks (TabBlks): 334,148

So any range scan having the index access underneath is supposed to be highly inefficient.
If you scroll up to my simple example explaining the relationship between the AVG_LEAF_BLOCKS_PER_KEY and AVG_DATA_BLOCKS_PER_KEY and apply this simple math from above you’ll get the following :

AVG_DATA_BLOCKS_PER_KEY divided by AVG_LEAF_BLOCKS_PER_KEY : 5435 / 14 = 388.

That means, on average for every index leaf block read we will need to carry out 388 physical reads from the table. Pretty high, if we compare it with the effort of an index range scan based on a lower AVG_DATA_BLOCKS_PER_KEY (so, with IDX_TBAVGD_SORTED_TSDATE).

Let’s verify that.
I will reach out to the very beginning (the reason explained above) of the raw trace file.
The pattern clearly confirms that –
The first line corresponds to the single index physical read (index leaf block) followed by the number of batched reads from the table (lines 2-27), almost all of them encompassing 15 blocks.
To show the cycle I kept the second portion following the same pattern, so the next index read (1 block) at line 29 followed by again pretty large portion of reads from the table (lines 30-54).
You may remember: the obj#=138277 in the 10046 raw trace file is the index OBJECT_ID and obj#=138275 is the OBJECT_ID of the table.

1  WAIT #140708079983920: nam='cell single block physical read: flash cache' ela= 397 cellhash#=1810516308 diskhash#=3394455528 bytes=8192 obj#=138277 tim=34169655716067 --> index block leaf read
 2  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 542 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655717178
 3  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 524 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655718470
 4  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 531 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655720133
 5  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 542 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655721400
 6  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 329 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655722895
 7  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 372 cellhash#=0 diskhash#=0 blocks=14 obj#=138275 tim=34169655724360
 8  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 425 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655725483
 9  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 379 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655726903
10  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 533 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655728490
11  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 355 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655729570
12  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 409 cellhash#=0 diskhash#=0 blocks=14 obj#=138275 tim=34169655731064
13  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 295 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655732372
14  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 376 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655733831
15  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 358 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655735269
16  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 330 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655737073
17  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 381 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655738204
18  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 512 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655739783
19  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 361 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655741177
20  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 476 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655742697
21  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 453 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655743619
22  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 341 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655744986
23  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 897 cellhash#=0 diskhash#=0 blocks=13 obj#=138275 tim=34169655746324
24  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 403 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655747490
25  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 545 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655748464
26  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 705 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655750273
27  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 712 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655751740
28
29  WAIT #140708079983920: nam='cell single block physical read: flash cache' ela= 363 cellhash#=1810516308 diskhash#=3394455528 bytes=8192 obj#=138277 tim=34169655752338 --> again, index block leaf read
30  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 360 cellhash#=0 diskhash#=0 blocks=14 obj#=138275 tim=34169655753277
31  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 507 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655754216
32  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 561 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655755196
33  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 501 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655756125
34  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 309 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655756976
35  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 380 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655757899
36  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 317 cellhash#=0 diskhash#=0 blocks=14 obj#=138275 tim=34169655758686
37  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 372 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655759807
38  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 398 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655760646
39  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 531 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655761678
40  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 427 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655762569
41  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 268 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655763325
42  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 400 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655764137
43  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 408 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655765036
44  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 376 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655765844
45  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 343 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655766621
46  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 315 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655767480
47  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 414 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655768370
48  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 578 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655769352
49  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 370 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655770182
50  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 647 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655771226
51  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 455 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655772100
52  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 324 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655772894
53  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 375 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655773762
54  WAIT #140708079983920: nam='cell list of blocks physical read' ela= 445 cellhash#=0 diskhash#=0 blocks=15 obj#=138275 tim=34169655774627

Let’s do the math.

I took the lines numbered 2-27 (26 lines) which correspond to the physical reads from the table (obj#=138275).
Almost each of them carried out 15 single (batched) block physical reads (see blocks=15, in just two cases it was 14 and once it was 13 – for simplicity I will assume 15 for all).
So the total number of single blocks (8KB) from the table can be calculated as : 26*15 = 390

Again – very close to what we obtained straight from the optimizer statistics :
AVG_DATA_BLOCKS_PER_KEY divided by AVG_LEAF_BLOCKS_PER_KEY :
5435/14 = 388.

It’s important to note that while these three metrics—CLUF, AVG_DATA_BLOCKS_PER_KEY, and AVG_LEAF_BLOCKS_PER_KEY—are closely related, CLUF doesn’t directly depend on the uniqueness of an indexed value.

Take, for example, a primary key (PK) index created on an ID column that stores integer values (NUMBER). This index will obviously be unique, but the clustering factor (CLUF) could still be high or low, depending on how scattered the data is in the table. The CLUSTERING_FACTOR statistic will reflect this distribution.
However, in such case even with a high CLUF, both AVG_DATA_BLOCKS_PER_KEY and AVG_LEAF_BLOCKS_PER_KEY will still be equal to 1 because there is only one indexed value per index key (due to its uniqueness). As a result, this unique value will only exist in one table block.

TEST #2.

Test with pre-sorted table. Low ratio of AVG_DATA_BLOCKS_PER_KEY against AVG_LEAF_BLOCKS_PER_KEY. Low Clustering Factor

Let us compare it with the index range scan carried out against the table with almost an ideal CLUF, so very close to the number of blocks in the table (and much lower than the number of rows).
At the same time, the result of: AVG_DATA_BLOCKS_PER_KEY divided by AVG_LEAF_BLOCKS_PER_KEY will now be much lower than in the previous test.
That means, we have a situation in which the indexed values are densely packed within table’s blocks, thus the index range scan will be much cheaper.
Again, the related data below (I have copy-pasted both lines, but we will be busy with the table TESTAVGD_SORTED and its index IDX_TBAVGD_SORTED_TSDATE).

To remind you, the table was created as shown below. We have a sorted chain of TSDATE packed in table blocks, ideally adhering to its index.
The command was:
SQL> create table TSUSR.TESTAVGD_SORTED as select * from TSUSR.TESTAVGD order by TSDATE ;

OWNER              TABLENAME                      INDEX_NAME                     INDEX_TYPE      UNIQUENES COMPRESSION   IDX_PX         CLUSTF          TabBlks          TabRows       NDV LEAF_BLOCKS COLS PAR BLEVEL LEAF_BL_KEY DATA_BL_KEY 
------------------ ------------------------------ ------------------------------ --------------- --------- ------------- -------- ------------ ---------------- ---------------- --------- ----------- ---- --- ------ ----------- ----------- 
TSUSR              TESTAVGD_SORTED                IDX_TBAVGD_SORTED_TSDATE       NORMAL          NONUNIQUE DISABLED      1             333,802          334,146       20,000,000      3650       53051    1 NO       2          14          91 
TSUSR              TESTAVGD                       IDX_TESTAVGD_TSDATE            NORMAL          NONUNIQUE DISABLED      1          19,838,660          334,148       20,000,000      3650       53051    1 NO       2          14        5435

Let’s execute the test query.

Btw. besides its main purpose the “SET AUTOTRACE TRACE EXPLAIN STATISTICS” (I will set it again) is a nice trick to time squeeze the FETCH phase, if you deal with large number of rows returned by your query which you are not interested in.

Another small manipulation, I have set “_optimizer_batch_table_access_by_rowid” to false which disables batch reads (also visible in the SQL Plan below by the lack of the keyword BATCHED (There will be “TABLE ACCESS BY INDEX ROWID”, not “TABLE ACCESS BY INDEX ROWID BATCHED”).
Having batching set to off we will be able to see the pattern of the cycle: reading the index leaf block followed by single reads from the table – this time all with single block reads (8KB).
Btw. setting this parameter to OFF in the previous test did not bring the expected result, it managed to decrease the batching, but it didn’t entirely eliminate it.

Test itself (I have flushed the cache in another session – so the command is not visible below).

SYS@TSTCDB1->TSTPDB01> @tr
SYS@TSTCDB1->TSTPDB01> exec dbms_monitor.session_trace_enable(session_id=>&amp;sid_to_trace,serial_num=>&amp;serial_to_trace,binds=>true,waits=>true);
Enter value for sid_to_trace: 3155
Enter value for serial_to_trace: 47361
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.00
SYS@TSTCDB1->TSTPDB01>  alter session set "_optimizer_batch_table_access_by_rowid" = false ;
 
Session altered.
 
Elapsed: 00:00:00.00
SYS@TSTCDB1->TSTPDB01> @setts
SYS@TSTCDB1->TSTPDB01> SET AUTOTRACE TRACE EXPLAIN STATISTICS
SYS@TSTCDB1->TSTPDB01> select * from  TSUSR.TESTAVGD_SORTED where TSDATE >= sysdate-90 ;
 
471382 rows selected.
 
Elapsed: 00:00:04.27
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1986553601
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                          |   475K|    52M|  9210   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID        | TESTAVGD_SORTED          |   475K|    52M|  9210   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IDX_TBAVGD_SORTED_TSDATE |   475K|       |  1267   (1)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("TSDATE">=SYSDATE@!-90-115)
 
 
Statistics
----------------------------------------------------------
         47  recursive calls
          0  db block gets
      71926  consistent gets
       9429  physical reads
          0  redo size
   62634601  bytes sent via SQL*Net to client
     346105  bytes received via SQL*Net from client
      31427  SQL*Net roundtrips to/from client
         10  sorts (memory)
          0  sorts (disk)
     471382  rows processed
 
 
 
Again, as a reminder : 
 
OBJECT                                                       SUBOBJECT_NAME             OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE          
------------------------------------------------------------ ------------------------- ---------- -------------- -------------------- 
TSUSR.IDX_TBAVGD_SORTED_TSDATE                                                             138280         138280 INDEX                
TSUSR.TESTAVGD_SORTED                                                                      138278         138278 TABLE

Let’s try to match the numbers reported by AUTOTRACE with the physical reads from the 10046 raw trace file (they match perfectly) :

[oracle @ TSTCDB1 @ testdbhost01 ]$ # cat   /u01/app/oracle/diag/rdbms/gctest/TSTCDB1/trace/TSTCDB1_ora_314171.trc  | grep physical | egrep '138278|138280' | wc -l
9429

Let’s do some math again.
First, I will reach out to the bare statistics:

OWNER              TABLENAME                      INDEX_NAME                     INDEX_TYPE      UNIQUENES COMPRESSION   IDX_PX         CLUSTF          TabBlks          TabRows       NDV LEAF_BLOCKS COLS PAR BLEVEL LEAF_BL_KEY DATA_BL_KEY 
------------------ ------------------------------ ------------------------------ --------------- --------- ------------- -------- ------------ ---------------- ---------------- --------- ----------- ---- --- ------ ----------- ----------- 
TSUSR              TESTAVGD_SORTED                IDX_TBAVGD_SORTED_TSDATE       NORMAL          NONUNIQUE DISABLED      1             333,802          334,146       20,000,000      3650       53051    1 NO       2          14          91

AVG_DATA_BLOCKS_PER_KEY divided by AVG_LEAF_BLOCKS_PER_KEY : 91/14 = 6.5
So, this time we may expect just 6.5 single block reads from the table per each index leaf block read.
Do you remember the ratio from the above test with high ratio of AVG_DATA_BLOCKS_PER_KEY divided by AVG_LEAF_BLOCKS_PER_KEY?
It was: 5435/14 = 388 blocks from the table with each index leaf block read – huge difference!

Now, let’s verify this above calculated number 6.5 using the true state of facts, so the trace file.
With batching set to false we can’t get it clearer. Indeed, following reading the index leaf block (line 372) we needed the subsequent 6 single physical reads (8KB) from the table (lines 373-388).
The pattern remains – I deliberately left two next cycles (it clearly follows the sequence: 1 index leaf block read followed by ~6 reads of table’s blocks).

10046 (excerpt) :
  
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 351 cellhash#=3018416126 diskhash#=710373631 bytes=8192 obj#=138280 tim=24836891217090 --> index block leaf read
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 205 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891217628
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 233 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891218152
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 195 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891218689
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 220 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891219208
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 198 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891219694
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 301 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891220356 
  
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 295 cellhash#=3018416126 diskhash#=710373631 bytes=8192 obj#=138280 tim=24836891220735 --> index block leaf read
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 418 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891221467
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 276 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891222290
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 277 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891222837
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 258 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891223443
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 202 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891223937
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 192 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891224474
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 178 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891224907
  
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 267 cellhash#=3018416126 diskhash#=710373631 bytes=8192 obj#=138280 tim=24836891225247 --> index block leaf read
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 288 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891225859
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 327 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891226511
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 229 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891227051
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 192 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891227571
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 212 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891228036
WAIT #139732705131064: nam='cell single block physical read: flash cache' ela= 173 cellhash#=3018416126 diskhash#=264260487 bytes=8192 obj#=138278 tim=24836891228558

Two immediate spots :

1. The low ratio AVG_DATA_BLOCKS_PER_KEY / AVG_LEAF_BLOCKS_PER_KEY (low clustering factor) saves memory and CPU cycles
Note: 9429 physical reads
These 9429 physical reads got cached requiring much less memory and there was no need for revisiting of the buffers during this execution as the time (and range scanning of the table) progressed.
Conversely, the high ratio of AVG_DATA_BLOCKS_PER_KEY / AVG_LEAF_BLOCKS_PER_KEY (inefficient clustering factor) tends to waste buffer cache and CPU cycles as explained above.

2. The number of bytes reported in the SQL plan as “Bytes” does not take into account the AVG_DATA_BLOCKS_PER_KEY / AVG_LEAF_BLOCKS_PER_KEY (or CLUF).
Seemingly obvious – but the “Cost” clearly does.

Take a look at these two SQL Plans –
first from TSUSR.TESTAVGD_SORTED (low AVG_DATA_BLOCKS_PER_KEY / AVG_LEAF_BLOCKS_PER_KEY)
and
second from TSUSR.TESTAVGD (high AVG_DATA_BLOCKS_PER_KEY / AVG_LEAF_BLOCKS_PER_KEY),
Note: both are reporting the same number of “Bytes” (52M), but very different costs :

SYS@TSTCDB1->TSTPDB01> set autotr trace expl
SYS@TSTCDB1->TSTPDB01> select * from  TSUSR.TESTAVGD_SORTED where TSDATE >= sysdate-90 ;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 252577507
 
--------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                          |   475K|    52M|  9208   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TESTAVGD_SORTED          |   475K|    52M|  9208   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_TBAVGD_SORTED_TSDATE |   475K|       |  1267   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
 
 
SYS@TSTCDB1->TSTPDB01> select * from  TSUSR.TESTAVGD where TSDATE >= sysdate-90 ;

 
Execution Plan
----------------------------------------------------------
Plan hash value: 1106486315
 
---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |   475K|    52M|   473K  (1)| 00:00:19 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TESTAVGD            |   475K|    52M|   473K  (1)| 00:00:19 |
|*  2 |   INDEX RANGE SCAN          | IDX_TESTAVGD_TSDATE |   475K|       |  1267   (1)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Where does it get that number 52M from?
“Bytes” are apparently calculated as the multiplication of sum of AVG_COL_LEN for all selected columns (I selected all columns: <select * from>) and the estimated number of “Rows” :

COLUMN_ID OWNER_TABLE_COL                       DATA_TYPE     LENGTH PRECISION SCALE NULLS LAST_ANALYZED                  GLO AVG_COL_LEN 
--------- ------------------------------------- ------------- ------ --------- ----- ----- ------------------------------ --- ----------- 
        1 TSUSR.TESTAVGD_SORTED -> ID           NUMBER            22                 Y     17-04-2024 16:48:09            YES           6 
        2 TSUSR.TESTAVGD_SORTED -> TSDATE       DATE               7                 Y     17-04-2024 16:48:09            YES           8 
        3 TSUSR.TESTAVGD_SORTED -> PADDING100   VARCHAR2         100                 Y     17-04-2024 16:48:09            YES         101 
                                                                                                                              -----------
sum                                                                                                                                   115
 
 
COLUMN_ID OWNER_TABLE_COL                        DATA_TYPE     LENGTH PRECISION SCALE NULLS LAST_ANALYZED                  GLO AVG_COL_LEN 
--------- -------------------------------------- ------------- ------ --------- ----- ----- ------------------------------ --- ----------- 
        1 TSUSR.TESTAVGD -> ID                   NUMBER            22                 Y     17-04-2024 16:47:46            YES           6 
        2 TSUSR.TESTAVGD -> TSDATE               DATE               7                 Y     17-04-2024 16:47:46            YES           8 
        3 TSUSR.TESTAVGD -> PADDING100           VARCHAR2         100                 Y     17-04-2024 16:47:46            YES         101 
                                                                                                                               -----------
sum                                                                                                                                    115

475000 “Rows” * 115 sum in bytes of all selected columns = 54625000 [bytes]
54625000/1024 = 53344 [KB]
53344/1024 = 52 [MB] reported in SQL Plan in the section “Rows”.
That said, it’s clear that the SQL Plan completely relies on CBO statistics.

Calculations of I/O cost in 10053 trace vs. SQL Plan
If you are interested how the “Cost” is calculated (arguably, far more important than “Rows”), then why the number of Cost=1267 next to the “INDEX RANGE SCAN” is common for both SQLs and why do they (hugely) differ in the line “TABLE ACCESS BY INDEX ROWID”.
Lastly: where do these numbers come from, I encourage you again to visit this post (“Calculations of I/O cost in 10053 trace vs. SQL Plan”).

Last but not least (maybe you spotted it).
Ultimately, time is what matters here.
These two test queries took 24 seconds and 4 seconds, respectively—a significant gap. This discrepancy stems from two main factors: considerably more I/Os and far higher CPU usage in the first query.
While a difference of 24 seconds versus 4 seconds for a single run might not seem alarming, it becomes a serious concern when placed in the context of a busy, highly concurrent database with relatively long-running SQLs.
In such cases, such inefficient index range scan within a deeply nested NESTED LOOP can become a worry, especially when each subsequent scan must retrieve data from disk because some buffers have been flushed from the cache in the meantime

Elapsed: 00:00:24.36 with high ratio AVG_DATA_BLOCKS_PER_KEY divided by AVG_LEAF_BLOCKS_PER_KEY being ~388 (also high CLUF).
Elapsed: 00:00:04.77 with low ratio AVG_DATA_BLOCKS_PER_KEY divided by AVG_LEAF_BLOCKS_PER_KEY being ~6 (also low CLUF)

Piotr Sajda Oracle Tuning Blog

Leave a Reply Cancel reply

All tags

Latest posts