Clustering Factor (CLUF) – the anatomy and its impact on performance – part4

This post is the continuation of the previous part

True price of high Clustering Factor: elapsed time, SGA and CPU

This time let’s try to quantify the usage of the resources applying much broader index range scan.
I’ll force retrieving 10% (1 million) of rows and will pay close attention to :
Elapsed times with low and high CLUF combined with analysis of impact of batched I/O
Memory used (buffer cache) for low and high CLUF
Utilized CPU time for low and high CLUF

As usual, AUTOTRACE will accompany the analysis.
On top, to show the impact of I/O batching (you surely noticed the TABLE ACCESS BY INDEX ROWID BATCHED in the SQL Plan) I will toggle the _optimizer_batch_table_access_by_rowid true/false.
By default _optimizer_batch_table_access_by_rowid is set to true.




Elapsed times with low and high CLUF combined with analysis of impact of batched I/O


Test with low CLUF (batched / not batched)


Retrieving 1 million rows (10%). Low CLUF, batched

SYS@CDBTST1->TPDB1> alter system flush buffer_cache ;

System altered.

SYS@CDBTST1->TPDB1> SET AUTOTRACE TRACE EXPLAIN STATISTICS

SYS@CDBTST1->TPDB1> select /* &comm */ /*+ index(T IDX_ID_SORTED) */ * from TSTUS.TSCLUF T where id_sorted <= 1e6 ;
Enter value for comm: LOWCLUF
old   1: select /* &comm */ /*+ index(T IDX_ID_SORTED) */ * from TSTUS.TSCLUF T where id_sorted <= 1e6
new   1: select /* LOWCLUF */ /*+ index(T IDX_ID_SORTED) */ * from TSTUS.TSCLUF T where id_sorted <= 1e6

1000000 rows selected.

Elapsed: 00:00:16.25

Execution Plan
----------------------------------------------------------
Plan hash value: 4196823974

-----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |               |  1000K|   256M| 40883   (1)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TSCLUF        |  1000K|   256M| 40883   (1)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN                  | IDX_ID_SORTED |  1000K|       |  2409   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID_SORTED"<=1e6)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     171311  consistent gets
      40690  physical reads
          0  redo size
  295802927  bytes sent via SQL*Net to client
     733784  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

Not getting into details yet – it took 16 seconds.

Let’s disable I/O batching and repeat the test (buffer cache will get flushed).
Retrieving 1 million rows (10%). Low CLUF, not batched

SYS@CDBTST1->TPDB1> alter system flush buffer_cache ;

System altered.

SYS@CDBTST1->TPDB1> alter session set "_optimizer_batch_table_access_by_rowid" = false ;

Session altered.

SYS@CDBTST1->TPDB1> select /* &comm */ /*+ index(T IDX_ID_SORTED) */ * from TSTUS.TSCLUF T where id_sorted <= 1e6 ;
Enter value for comm: 1e6_lowcluf_nobatched
old   1: select /* &comm */ /*+ index(T IDX_ID_SORTED) */ * from TSTUS.TSCLUF T where id_sorted <= 1e6
new   1: select /* 1e6_lowcluf_nobatched */ /*+ index(T IDX_ID_SORTED) */ * from TSTUS.TSCLUF T where id_sorted <= 1e6

1000000 rows selected.

Elapsed: 00:00:16.12

Execution Plan
----------------------------------------------------------
Plan hash value: 4266397642

---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |  1000K|   256M| 40883   (1)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TSCLUF        |  1000K|   256M| 40883   (1)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | IDX_ID_SORTED |  1000K|       |  2409   (1)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID_SORTED"<=1e6)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     171311  consistent gets
      40690  physical reads
          0  redo size
  295802927  bytes sent via SQL*Net to client
     733802  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

Time elapsed: still 16 seconds.
Note, in the SQL Plan the line “TABLE ACCESS BY INDEX ROWID BATCHED” got replaced with bare “TABLE ACCESS BY INDEX ROWID“, but the batching has not even been engaged in the first scenario.
It’s not clearly visible from the two above outputs (well, partially it is – the time elapsed in both cases is the same), but it would be apparent from the 10046 traces. The difference will be clear though in case of the high CLUF soon.


Test with high CLUF (batched / not batched)

Retrieving 1 million rows (10%). High CLUF, batched.

SYS@CDBTST1->TPDB1> alter system flush buffer_cache ;

System altered.

SYS@CDBTST1->TPDB1> select /* &comm */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSCLUF T where id_mess <= 1e6 ;
Enter value for comm: HICLUF_BATCHED
old   1: select /* &comm */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSCLUF T where id_mess <= 1e6
new   1: select /* HICLUF_BATCHED */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSCLUF T where id_mess <= 1e6

1000329 rows selected.

Elapsed: 00:00:33.36

Execution Plan
----------------------------------------------------------
Plan hash value: 3542504988

---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |  1000K|   256M|  1002K  (1)| 00:00:40 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TSCLUF      |  1000K|   256M|  1002K  (1)| 00:00:40 |
|*  2 |   INDEX RANGE SCAN                  | IDX_ID_MESS |  1000K|       |  2346   (1)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID_MESS"<=1e6)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    1069104  consistent gets
     361957  physical reads
          0  redo size
  295900271  bytes sent via SQL*Net to client
     734033  bytes received via SQL*Net from client
      66690  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000329  rows processed

Time elapsed: 33 seconds.
Nearly twice as long as the query with low CLUF.
Clearly, the number of physical reads is much higher due to reasons explained in the previous parts of this post.
Admittedly, this time Oracle applied its “defending mechanism” and creatively engaged the I/O batching.
Let us see the how much we gained.
Let us temporarily disable the batching and re-run the same test -still with high CLUF.

Retrieving 1 million rows (10%). High CLUF, not batched.

SYS@CDBTST1->TPDB1> alter session set "_optimizer_batch_table_access_by_rowid" = false ;

Session altered.

SYS@CDBTST1->TPDB1> alter system flush buffer_cache ;

System altered.

SYS@CDBTST1->TPDB1> select /* &comm */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSCLUF T where id_mess <= 1e6 ;
Enter value for comm: HICLUF_NOT_BATCHED
old   1: select /* &comm */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSCLUF T where id_mess <= 1e6
new   1: select /* HICLUF_NOT_BATCHED */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSCLUF T where id_mess <= 1e6

1000329 rows selected.

Elapsed: 00:00:48.24

Execution Plan
----------------------------------------------------------
Plan hash value: 4078491162

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |  1000K|   256M|  1002K  (1)| 00:00:40 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TSCLUF      |  1000K|   256M|  1002K  (1)| 00:00:40 |
|*  2 |   INDEX RANGE SCAN          | IDX_ID_MESS |  1000K|       |  2346   (1)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID_MESS"<=1e6)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    1069093  consistent gets
     361957  physical reads
          0  redo size
  295900271  bytes sent via SQL*Net to client
     734037  bytes received via SQL*Net from client
      66690  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000329  rows processed

Ah – time elapsed: 48 seconds.
The I/O recalculated using the units of 8KB stays intact (see “Statistics” for both executions), yet this time I/O batching was off.
Admittedly, I should say: it was not that much or not that aggressively utilized. “Not that much“, since the engine does not entirely obey the disabling – it’s visible if you profile the 10046 trace counting single reads (“cell single block physical read” – each with 8KB) vs. the batched reads (“cell list of blocks physical read”).
Disabling the batching with “alter session set ‘_optimizer_batch_table_access_by_rowid’ = false; ” does not completely eliminate the batching.


Memory used (buffer cache)

These two comparisons below are the immediate result of the respective executions from above – that is, with low and high CLUF.
As a reminder, each time the buffer cache was emptied.
Note, I/O batching does not influence the usage of memory – in both cases Oracle had to read certain amount of of data from disk to memory to satisfy the predicate and it doesn’t matter which algorithm it applied to accomplish that.

Buffer Cache Memory usage (low CLUF)


If you query the physical sizes of the involved segments [MB] and compare them with the amount of the buffers cached (also expressed in [MB]), you’ll easily spot that – according to expectations, roughly 10% of the physical segments has been cached.

Size on disk: 

SQL> @segsize

SEGMENT_NAME                                                 SEGMENT_TYPE       TABLESPACE_NAME       SEG_MB   LOBSIZE_MB     TOTAL_MB
------------------------------------------------------------ ------------------ --------------- ------------ ------------ ------------
TSTUS.IDX_ID_SORTED                                          INDEX              TS01                     192                       192
TSTUS.TSCLUF                                                 TABLE              TS01                   3,012                     3,012
Amount of cached buffers upon the execution of the test query:

SQL> @bhobj

INST_ID OWNER                          OBJECT_NAME                    OBJECT_TYPE                   B_CACHE   MB_CACHE
------- ------------------------------ ------------------------------ -------------------- ---------------- ----------
        TSTUS                          IDX_ID_SORTED                  INDEX                      18,251,776         17
        TSTUS                          TSCLUF                         TABLE                     315,080,704        300


Buffer Cache Memory usage (high CLUF)

Let’s check the buffer cache for high CLUF now.
First, let’s check the size of the involved segments on disk.
Needless to mention, the size of the table is the same – it’s the same table.
Seemingly obvious – the size of the index is the same as well.
In the worst case it’d be very similar, if not being the same – all in all the indexed entries are (nearly) the same and are ordered.

Size on disk: 
SQL> @segsize


SEGMENT_NAME                                                 SEGMENT_TYPE       TABLESPACE_NAME                      SEG_MB   LOBSIZE_MB     TOTAL_MB
------------------------------------------------------------ ------------------ ------------------------------ ------------ ------------ ------------
TSTUS.IDX_ID_MESS                                            INDEX              PSTS                                    192                       192
TSTUS.TSCLUF                                                 TABLE              TS01                                    3,012                   3,012
Amount of cached buffers upon the execution of the test query (high CLUF):
SQL> @bhobj

INST_ID OWNER                          OBJECT_NAME                    OBJECT_TYPE                   B_CACHE   MB_CACHE
------- ------------------------------ ------------------------------ -------------------- ---------------- ----------
      1 TSTUS                          IDX_ID_MESS                    INDEX                      18,276,352         17
        TSTUS                          TSCLUF                         TABLE                   2,946,875,392       2810

Ah, we wasted quite an amount of memory!
For logically the same amount of data now Oracle had to read 2810 MB (93% of the entire segment of TSCLUF), as oppose to the case with low CLUF: 300 MB (~10% of the TSCLUF table).
Such high wastage is no wonder if you realize that with the tested scenario of high CLUF (again: our case is extreme, but not completely unrealistic) Oracle managed to store just 1 row (satisfying the predicate) in the entire block of the test table.

One more thing to spot.
As you can see, the amount of memory for the sake of both indexes is virtually the same: 17MB and it constitutes ~10% of each index size.
Again, it should not be astonishing if you realize that the “price” of range scanning of the index, regardless to CLUF is basically the same – in both cases we logically and physically need roughly 10% of the entire index segment since index represents sorted structure across the leaf blocks (I am not touching hashed indexes and/or reversed indexes).

Scripts used above :

-- segsize.sql (Piotr Sajda)
compute sum of SEG_MB on report
compute sum of LOBSIZE_MB on report
compute sum of total_mb on report
break on report

col LOBSIZE_MB form 999,999,999
col SEG_MB     form 999,999,999
col TOTAL_MB   form 999,999,999
col segment_name form a60

undefine owner

WITH lobs as
(
SELECT   l.OWNER owner, l.TABLE_NAME table_name,  round( (sum(seg.bytes)/1024/1024) , 2)  LOBSIZE_MB
  FROM DBA_LOBS l, dba_segments seg
  WHERE
      seg.owner like nvl(upper('&&owner'),seg.owner)
  and  (l.SEGMENT_NAME = seg.SEGMENT_NAME or l.INDEX_NAME  = seg.SEGMENT_NAME)
  and (seg.segment_type like 'LOB%' or seg.segment_type like 'INDEX%')
  and l.OWNER        = seg.OWNER
  group by l.OWNER, l.TABLE_NAME
)
SELECT seg.owner||'.'||seg.segment_name as segment_name, seg.segment_type,  seg.TABLESPACE_NAME,
trunc(sum(bytes)/1024/1024,2) SEG_MB, max(lobs.LOBSIZE_MB) LOBSIZE_MB, trunc(sum(bytes)/1024/1024,2) +  nvl( max(lobs.LOBSIZE_MB), 0 )  as total_mb
FROM dba_segments seg, lobs
WHERE
    seg.owner like nvl(upper('&&owner'),seg.owner)
and seg.segment_name like upper(nvl('&object',seg.segment_name))
and seg.segment_type like upper(nvl('&type',seg.segment_type))
and seg.TABLESPACE_NAME like upper(nvl('&ts',seg.TABLESPACE_NAME))
AND seg.owner = lobs.owner(+)
AND seg.segment_name = lobs.table_name(+)
AND seg.segment_type not like 'LOB%'
group by seg.owner , seg.TABLESPACE_NAME, seg.segment_name, seg.segment_type
order by total_mb
/

undefine owner
--bhobj.sql (Piotr Sajda)
break on INST_ID
compute sum of MB_CACHE on INST_ID

col B_CACHE form 999,999,999,999


select
b.inst_id, o.OWNER, o.OBJECT_NAME, o.OBJECT_TYPE, trunc((count(o.OBJECT_NAME)*t.BLOCKSIZE)) B_CACHE,
round(  trunc((count(o.OBJECT_NAME)*t.BLOCKSIZE)/1024/1024),  1  ) MB_CACHE
  from dba_objects o, gv$bh b, sys.TS$ t
  where o.DATA_OBJECT_ID =b.OBJD
  and b.status not like 'free'
  and b.TS#=t.TS#
  and b.inst_id =  to_number( nvl('&inst',b.inst_id) )
  and o.OWNER like upper(nvl('%&user%',o.OWNER))
  and upper(o.OBJECT_NAME) like upper(nvl('&objname',o.OBJECT_NAME))
group by b.inst_id,  o.OWNER, o.OBJECT_NAME,o.OBJECT_TYPE, t.BLOCKSIZE
having round(  trunc((count(o.OBJECT_NAME)*t.BLOCKSIZE)/1024/1024),  1  )  >= to_number( nvl('&min_mb',0) )
order by b.inst_id, B_CACHE,  o.OWNER, o.OBJECT_NAME, o.OBJECT_TYPE,  B_CACHE
/


Utilized CPU time for low and high CLUF

In order to estimate how much we are about to waste in terms of CPU cycles (time spent on CPU) I had to apply the following:
For both scenarios I executed the test SQL and did not flush the buffer cache. The subsequent execution will predominantly utilize CPU performing logical reads (LIO).

For both, I have set up the 10046 tracing, this time though I will not analyze the raw files but will examine the output of tkprof which will nicely profile the cpu components, down to the the cpu time spent while accessing the buffers cached on behalf of each segment.

Let’s start from the low CLUF.
Here’s the second execution of the test SQL :

SQL> SET AUTOTRACE TRACE EXPLAIN STATISTICS

SQL> exec dbms_monitor.session_trace_enable(session_id=>&sid_to_trace,serial_num=>&serial_to_trace,binds=>true,waits=>true);
Enter value for sid_to_trace: 1700
Enter value for serial_to_trace: 60567

PL/SQL procedure successfully completed.

SQL> select /* &comm */ /*+ index(T IDX_ID_SORTED) */ * from TSCLUF T where id_sorted <= 1e6
/
Enter value for comm: lowww
old   1: select /* &comm */ /*+ index(T IDX_ID_SORTED) */ * from TSCLUF T where id_sorted <= 1e6
new   1: select /* lowww */ /*+ index(T IDX_ID_SORTED) */ * from TSCLUF T where id_sorted <= 1e6

1000000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4196823974

-----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |               |  1000K|   256M| 40883   (1)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TSCLUF        |  1000K|   256M| 40883   (1)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN                  | IDX_ID_SORTED |  1000K|       |  2409   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID_SORTED"<=1e6)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     171311  consistent gets
          0  physical reads
          0  redo size
   28603451  bytes sent via SQL*Net to client
     733785  bytes received via SQL*Net from client
      66668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000000  rows processed

Clearly, the Statistics show 0 physical reads (all reads were LIOs).
Let’s check the CPU utilization – as mentioned, I have used tkprof with default parameters and navigated to the following section of the output file:

Tkprof stats for low CLUF:


select /* lowww */ /*+ index(T IDX_ID_SORTED) */ *
from
 TSCLUF T where id_sorted <= 1e6


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    66668      1.52       1.93          0     171311          0     1000000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    66670      1.53       1.93          0     171311          0     1000000

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
   1000000    1000000    1000000  TABLE ACCESS BY INDEX ROWID BATCHED TSCLUF (cr=171311 pr=0 pw=0 time=876584 us starts=1 cost=40883 size=269000000 card=1000000)
   1000000    1000000    1000000   INDEX RANGE SCAN IDX_ID_SORTED (cr=68746 pr=0 pw=0 time=320505 us starts=1 cost=2409 size=0 card=1000000)(object id 181826)

The total time spent on CPU is 1.53 seconds. If we extract just the fetch phase (1.52 seconds) which is crucial for us and break it down to time spent while performing the logical IOs (so, engaging CPU) on behalf of each involved segments, i.e. IDX_ID_SORTED (index) and TSCLUF (table), we obtain the following CPU time components:
IDX_ID_SORTED with time=320505 us, which makes ~320 ms
TSCLUF with time=876584 us, which makes ~877 ms

That makes 1197ms (877ms + 320ms) and in seconds: 1.2s which is the lion part of 1.52s from above.
Not getting into details of the missing CPU time components, the number TSCLUF with time=876584 us is crucial for the purpose of the comparison with the scenario with high CLUF which we will run now.
The hard parse time /mis=1/ has been apparently almost omitted due to rounding. It’s 2.7ms – see below.

Hard parse time directly from the raw 10046 trace :

PARSING IN CURSOR #140605198666136 len=93 dep=0 uid=0 oct=3 lid=0 tim=50234004390152 hv=2932894152 ad='4b315f910' sqlid='gdazb6yrd0uf8'
select /* lowww */ /*+ index(T IDX_ID_SORTED) */ * from TSCLUF T where id_sorted <= 1e6
END OF STMT
PARSE #140605198666136:c=2700,e=2663,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=4196823974,tim=50234004390151


Let’s estimate the CPU component for high CLUF

Basically, I will repeat the above scenario involving all prerequisites, this time obviously will use the ID_MESS <= 1e6 for high CLUF.
Here it is:

SQL> SET AUTOTRACE TRACE EXPLAIN STATISTICS

SQL> exec dbms_monitor.session_trace_enable(session_id=>&sid_to_trace,serial_num=>&serial_to_trace,binds=>true,waits=>true);
Enter value for sid_to_trace: 7631
Enter value for serial_to_trace: 4189

PL/SQL procedure successfully completed.


SQL> select /* &comm */ /*+ index(T IDX_ID_MESS) */ * from delme.TSCLUF T where id_mess <= 1e6 ;
Enter value for comm: dddd
old   1: select /* &comm */ /*+ index(T IDX_ID_MESS) */ * from delme.TSCLUF T where id_mess <= 1e6
new   1: select /* dddd */ /*+ index(T IDX_ID_MESS) */ * from delme.TSCLUF T where id_mess <= 1e6

1000329 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3542504988

---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |  1000K|   256M|  1002K  (1)| 00:00:40 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TSCLUF      |  1000K|   256M|  1002K  (1)| 00:00:40 |
|*  2 |   INDEX RANGE SCAN                  | IDX_ID_MESS |  1000K|       |  2346   (1)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID_MESS"<=1e6)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    1069093  consistent gets
          0  physical reads
          0  redo size
   26778663  bytes sent via SQL*Net to client
     734023  bytes received via SQL*Net from client
      66690  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000329  rows processed

As before, the Statistics show 0 physical reads (just LIOs which keep CPU busy).
Worth mentioning – unlike the test with low CLUF, this time the forcing index range scan with /*+ index(T IDX_ID_MESS) */ could be excused.
With higher CLUF, costing of full table scan against index range scan has higher chances to win – I am explaining this in details in this post.

Tkprof stats for high CLUF: 


select /* dddd */ /*+ index(T IDX_ID_MESS) */ *
from
 TSCLUF T where id_mess <= 1e6 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    66690      2.84       3.47          0    1069093          0     1000329
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    66692      2.84       3.48          0    1069093          0     1000329

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
   1000329    1000329    1000329  TABLE ACCESS BY INDEX ROWID BATCHED TSCLUF (cr=1069093 pr=0 pw=0 time=2236932 us starts=1 cost=1002540 size=269000269 card=1000001)
   1000329    1000329    1000329   INDEX RANGE SCAN IDX_ID_MESS (cr=68767 pr=0 pw=0 time=339050 us starts=1 cost=2346 size=0 card=1000001)(object id 181827)

As before, let’s focus on the fetch phase (2.84 seconds).
As before – we need to break it down onto time spent while performing the logical IOs on behalf of each involved segment, so this time: IDX_ID_MESS (index) and TSCLUF (table).
We have the following CPU time components assigned to traversing and filtering the buffers of each of the cached portions of these two segments:

IDX_ID_MESS with time=339050 us, which makes ~339 ms (virtually no difference and.. no wonder )
TSCLUF with time=2236932 us, which makes ~2237 ms (~2.2 seconds!)

Let’s shout it aloud:
Because of high CLUF the CPU time accounted to logical reads of the affected table jumped from ~877 ms to ~2237 ms (almost 3 x).
Such a huge difference becomes obvious if you realize that this time Oracle had to perform many more logical reads in order to filter out virtually the same amount of rows.

You might say, “oh, it’s still within the range of seconds, so who cares”.
Well, yes/no.
If this is indeed just a single SQL statement not being executed concurrently, we’ll lose “just couple of CPU cycles” and – let’s not forget that: waste memory.
Things change if high concurrency comes into play.
Then we may add the following scenario – arguably, the most painful.
Namely, let’s imagine that accessing of such table is accomplished within a nested loop with this table as as a “driven table”, so the one which is accessed over the join predicate.
Now, if such joining predicate (changing each loop) yells rather high clustering factor, accessing of such driven table produces plenty of wasted LIOs (so, hammering CPU) and is wasting memory – we saw that just 10% of the table’s content satisfying the predicate incurred 93% of that table landing in cache.
With unfortunate circumstances (assume that a driving table/rowset is rather big, so being within the range of many dozen of thousands), the content of such driven table over time will probably be fully cached. So virtually, as if a full table scan of relatively large table got fully cached.
But let’s not forget – these cached buffers represent rather high redundancy if we take efficiency of a single range scan into account.
Even worse. Unlike with full table scans, these buffers will survive in the buffer cache longer (as originating from the single-block reads), still being relatively useless.

Last but not least.
If after studying these 4 parts you’re still in doubt, reach out to Richard’s article – you’ll find this topic very well explained (you’ll need to navigate to it within other topics in his post), accompanied with several other “gotchas”.

Leave a Reply

Your email address will not be published. Required fields are marked *