This post is the continuation of the previous part
True price of high Clustering Factor: elapsed time, SGA and CPU
This time let’s try to quantify the usage of the resources applying much broader index range scan.
I’ll force retrieving 10% (1 million) of rows and will pay close attention to :
Elapsed times with low and high CLUF combined with analysis of impact of batched I/O
Memory used (buffer cache) for low and high CLUF
Utilized CPU time for low and high CLUF
As usual, AUTOTRACE will accompany the analysis.
On top, to show the impact of I/O batching (you surely noticed the TABLE ACCESS BY INDEX ROWID BATCHED in the SQL Plan) I will toggle the _optimizer_batch_table_access_by_rowid true/false.
By default _optimizer_batch_table_access_by_rowid is set to true.
Elapsed times with low and high CLUF combined with analysis of impact of batched I/O
Test with low CLUF (batched / not batched)
Retrieving 1 million rows (10%). Low CLUF, batched
SYS@CDBTST1->TPDB1> alter system flush buffer_cache ;
System altered.
SYS@CDBTST1->TPDB1> SET AUTOTRACE TRACE EXPLAIN STATISTICS
SYS@CDBTST1->TPDB1> select /* &comm */ /*+ index(T IDX_ID_SORTED) */ * from TSTUS.TSCLUF T where id_sorted <= 1e6 ;
Enter value for comm: LOWCLUF
old 1: select /* &comm */ /*+ index(T IDX_ID_SORTED) */ * from TSTUS.TSCLUF T where id_sorted <= 1e6
new 1: select /* LOWCLUF */ /*+ index(T IDX_ID_SORTED) */ * from TSTUS.TSCLUF T where id_sorted <= 1e6
1000000 rows selected.
Elapsed: 00:00:16.25
Execution Plan
----------------------------------------------------------
Plan hash value: 4196823974
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 256M| 40883 (1)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TSCLUF | 1000K| 256M| 40883 (1)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | IDX_ID_SORTED | 1000K| | 2409 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID_SORTED"<=1e6)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
171311 consistent gets
40690 physical reads
0 redo size
295802927 bytes sent via SQL*Net to client
733784 bytes received via SQL*Net from client
66668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000000 rows processed
Not getting into details yet – it took 16 seconds.
Let’s disable I/O batching and repeat the test (buffer cache will get flushed).
Retrieving 1 million rows (10%). Low CLUF, not batched
SYS@CDBTST1->TPDB1> alter system flush buffer_cache ;
System altered.
SYS@CDBTST1->TPDB1> alter session set "_optimizer_batch_table_access_by_rowid" = false ;
Session altered.
SYS@CDBTST1->TPDB1> select /* &comm */ /*+ index(T IDX_ID_SORTED) */ * from TSTUS.TSCLUF T where id_sorted <= 1e6 ;
Enter value for comm: 1e6_lowcluf_nobatched
old 1: select /* &comm */ /*+ index(T IDX_ID_SORTED) */ * from TSTUS.TSCLUF T where id_sorted <= 1e6
new 1: select /* 1e6_lowcluf_nobatched */ /*+ index(T IDX_ID_SORTED) */ * from TSTUS.TSCLUF T where id_sorted <= 1e6
1000000 rows selected.
Elapsed: 00:00:16.12
Execution Plan
----------------------------------------------------------
Plan hash value: 4266397642
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 256M| 40883 (1)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| TSCLUF | 1000K| 256M| 40883 (1)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | IDX_ID_SORTED | 1000K| | 2409 (1)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID_SORTED"<=1e6)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
171311 consistent gets
40690 physical reads
0 redo size
295802927 bytes sent via SQL*Net to client
733802 bytes received via SQL*Net from client
66668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000000 rows processed
Time elapsed: still 16 seconds.
Note, in the SQL Plan the line “TABLE ACCESS BY INDEX ROWID BATCHED” got replaced with bare “TABLE ACCESS BY INDEX ROWID“, but the batching has not even been engaged in the first scenario.
It’s not clearly visible from the two above outputs (well, partially it is – the time elapsed in both cases is the same), but it would be apparent from the 10046 traces. The difference will be clear though in case of the high CLUF soon.
Test with high CLUF (batched / not batched)
Retrieving 1 million rows (10%). High CLUF, batched.
SYS@CDBTST1->TPDB1> alter system flush buffer_cache ;
System altered.
SYS@CDBTST1->TPDB1> select /* &comm */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSCLUF T where id_mess <= 1e6 ;
Enter value for comm: HICLUF_BATCHED
old 1: select /* &comm */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSCLUF T where id_mess <= 1e6
new 1: select /* HICLUF_BATCHED */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSCLUF T where id_mess <= 1e6
1000329 rows selected.
Elapsed: 00:00:33.36
Execution Plan
----------------------------------------------------------
Plan hash value: 3542504988
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 256M| 1002K (1)| 00:00:40 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TSCLUF | 1000K| 256M| 1002K (1)| 00:00:40 |
|* 2 | INDEX RANGE SCAN | IDX_ID_MESS | 1000K| | 2346 (1)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID_MESS"<=1e6)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1069104 consistent gets
361957 physical reads
0 redo size
295900271 bytes sent via SQL*Net to client
734033 bytes received via SQL*Net from client
66690 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000329 rows processed
Time elapsed: 33 seconds.
Nearly twice as long as the query with low CLUF.
Clearly, the number of physical reads is much higher due to reasons explained in the previous parts of this post.
Admittedly, this time Oracle applied its “defending mechanism” and creatively engaged the I/O batching.
Let us see the how much we gained.
Let us temporarily disable the batching and re-run the same test -still with high CLUF.
Retrieving 1 million rows (10%). High CLUF, not batched.
SYS@CDBTST1->TPDB1> alter session set "_optimizer_batch_table_access_by_rowid" = false ;
Session altered.
SYS@CDBTST1->TPDB1> alter system flush buffer_cache ;
System altered.
SYS@CDBTST1->TPDB1> select /* &comm */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSCLUF T where id_mess <= 1e6 ;
Enter value for comm: HICLUF_NOT_BATCHED
old 1: select /* &comm */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSCLUF T where id_mess <= 1e6
new 1: select /* HICLUF_NOT_BATCHED */ /*+ index(T IDX_ID_MESS) */ * from TSTUS.TSCLUF T where id_mess <= 1e6
1000329 rows selected.
Elapsed: 00:00:48.24
Execution Plan
----------------------------------------------------------
Plan hash value: 4078491162
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 256M| 1002K (1)| 00:00:40 |
| 1 | TABLE ACCESS BY INDEX ROWID| TSCLUF | 1000K| 256M| 1002K (1)| 00:00:40 |
|* 2 | INDEX RANGE SCAN | IDX_ID_MESS | 1000K| | 2346 (1)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID_MESS"<=1e6)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1069093 consistent gets
361957 physical reads
0 redo size
295900271 bytes sent via SQL*Net to client
734037 bytes received via SQL*Net from client
66690 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000329 rows processed
Ah – time elapsed: 48 seconds.
The I/O recalculated using the units of 8KB stays intact (see “Statistics” for both executions), yet this time I/O batching was off.
Admittedly, I should say: it was not that much or not that aggressively utilized. “Not that much“, since the engine does not entirely obey the disabling – it’s visible if you profile the 10046 trace counting single reads (“cell single block physical read” – each with 8KB) vs. the batched reads (“cell list of blocks physical read”).
Disabling the batching with “alter session set ‘_optimizer_batch_table_access_by_rowid’ = false; ” does not completely eliminate the batching.
Memory used (buffer cache)
These two comparisons below are the immediate result of the respective executions from above – that is, with low and high CLUF.
As a reminder, each time the buffer cache was emptied.
Note, I/O batching does not influence the usage of memory – in both cases Oracle had to read certain amount of of data from disk to memory to satisfy the predicate and it doesn’t matter which algorithm it applied to accomplish that.
Buffer Cache Memory usage (low CLUF)
If you query the physical sizes of the involved segments [MB] and compare them with the amount of the buffers cached (also expressed in [MB]), you’ll easily spot that – according to expectations, roughly 10% of the physical segments has been cached.
Size on disk:
SQL> @segsize
SEGMENT_NAME SEGMENT_TYPE TABLESPACE_NAME SEG_MB LOBSIZE_MB TOTAL_MB
------------------------------------------------------------ ------------------ --------------- ------------ ------------ ------------
TSTUS.IDX_ID_SORTED INDEX TS01 192 192
TSTUS.TSCLUF TABLE TS01 3,012 3,012
Amount of cached buffers upon the execution of the test query:
SQL> @bhobj
INST_ID OWNER OBJECT_NAME OBJECT_TYPE B_CACHE MB_CACHE
------- ------------------------------ ------------------------------ -------------------- ---------------- ----------
TSTUS IDX_ID_SORTED INDEX 18,251,776 17
TSTUS TSCLUF TABLE 315,080,704 300
Buffer Cache Memory usage (high CLUF)
Let’s check the buffer cache for high CLUF now.
First, let’s check the size of the involved segments on disk.
Needless to mention, the size of the table is the same – it’s the same table.
Seemingly obvious – the size of the index is the same as well.
In the worst case it’d be very similar, if not being the same – all in all the indexed entries are (nearly) the same and are ordered.
Size on disk:
SQL> @segsize
SEGMENT_NAME SEGMENT_TYPE TABLESPACE_NAME SEG_MB LOBSIZE_MB TOTAL_MB
------------------------------------------------------------ ------------------ ------------------------------ ------------ ------------ ------------
TSTUS.IDX_ID_MESS INDEX PSTS 192 192
TSTUS.TSCLUF TABLE TS01 3,012 3,012
Amount of cached buffers upon the execution of the test query (high CLUF):
SQL> @bhobj
INST_ID OWNER OBJECT_NAME OBJECT_TYPE B_CACHE MB_CACHE
------- ------------------------------ ------------------------------ -------------------- ---------------- ----------
1 TSTUS IDX_ID_MESS INDEX 18,276,352 17
TSTUS TSCLUF TABLE 2,946,875,392 2810
Ah, we wasted quite an amount of memory!
For logically the same amount of data now Oracle had to read 2810 MB (93% of the entire segment of TSCLUF), as oppose to the case with low CLUF: 300 MB (~10% of the TSCLUF table).
Such high wastage is no wonder if you realize that with the tested scenario of high CLUF (again: our case is extreme, but not completely unrealistic) Oracle managed to store just 1 row (satisfying the predicate) in the entire block of the test table.
One more thing to spot.
As you can see, the amount of memory for the sake of both indexes is virtually the same: 17MB and it constitutes ~10% of each index size.
Again, it should not be astonishing if you realize that the “price” of range scanning of the index, regardless to CLUF is basically the same – in both cases we logically and physically need roughly 10% of the entire index segment since index represents sorted structure across the leaf blocks (I am not touching hashed indexes and/or reversed indexes).
Scripts used above :
-- segsize.sql (Piotr Sajda)
compute sum of SEG_MB on report
compute sum of LOBSIZE_MB on report
compute sum of total_mb on report
break on report
col LOBSIZE_MB form 999,999,999
col SEG_MB form 999,999,999
col TOTAL_MB form 999,999,999
col segment_name form a60
undefine owner
WITH lobs as
(
SELECT l.OWNER owner, l.TABLE_NAME table_name, round( (sum(seg.bytes)/1024/1024) , 2) LOBSIZE_MB
FROM DBA_LOBS l, dba_segments seg
WHERE
seg.owner like nvl(upper('&&owner'),seg.owner)
and (l.SEGMENT_NAME = seg.SEGMENT_NAME or l.INDEX_NAME = seg.SEGMENT_NAME)
and (seg.segment_type like 'LOB%' or seg.segment_type like 'INDEX%')
and l.OWNER = seg.OWNER
group by l.OWNER, l.TABLE_NAME
)
SELECT seg.owner||'.'||seg.segment_name as segment_name, seg.segment_type, seg.TABLESPACE_NAME,
trunc(sum(bytes)/1024/1024,2) SEG_MB, max(lobs.LOBSIZE_MB) LOBSIZE_MB, trunc(sum(bytes)/1024/1024,2) + nvl( max(lobs.LOBSIZE_MB), 0 ) as total_mb
FROM dba_segments seg, lobs
WHERE
seg.owner like nvl(upper('&&owner'),seg.owner)
and seg.segment_name like upper(nvl('&object',seg.segment_name))
and seg.segment_type like upper(nvl('&type',seg.segment_type))
and seg.TABLESPACE_NAME like upper(nvl('&ts',seg.TABLESPACE_NAME))
AND seg.owner = lobs.owner(+)
AND seg.segment_name = lobs.table_name(+)
AND seg.segment_type not like 'LOB%'
group by seg.owner , seg.TABLESPACE_NAME, seg.segment_name, seg.segment_type
order by total_mb
/
undefine owner
--bhobj.sql (Piotr Sajda)
break on INST_ID
compute sum of MB_CACHE on INST_ID
col B_CACHE form 999,999,999,999
select
b.inst_id, o.OWNER, o.OBJECT_NAME, o.OBJECT_TYPE, trunc((count(o.OBJECT_NAME)*t.BLOCKSIZE)) B_CACHE,
round( trunc((count(o.OBJECT_NAME)*t.BLOCKSIZE)/1024/1024), 1 ) MB_CACHE
from dba_objects o, gv$bh b, sys.TS$ t
where o.DATA_OBJECT_ID =b.OBJD
and b.status not like 'free'
and b.TS#=t.TS#
and b.inst_id = to_number( nvl('&inst',b.inst_id) )
and o.OWNER like upper(nvl('%&user%',o.OWNER))
and upper(o.OBJECT_NAME) like upper(nvl('&objname',o.OBJECT_NAME))
group by b.inst_id, o.OWNER, o.OBJECT_NAME,o.OBJECT_TYPE, t.BLOCKSIZE
having round( trunc((count(o.OBJECT_NAME)*t.BLOCKSIZE)/1024/1024), 1 ) >= to_number( nvl('&min_mb',0) )
order by b.inst_id, B_CACHE, o.OWNER, o.OBJECT_NAME, o.OBJECT_TYPE, B_CACHE
/
Utilized CPU time for low and high CLUF
In order to estimate how much we are about to waste in terms of CPU cycles (time spent on CPU) I had to apply the following:
For both scenarios I executed the test SQL and did not flush the buffer cache. The subsequent execution will predominantly utilize CPU performing logical reads (LIO).
For both, I have set up the 10046 tracing, this time though I will not analyze the raw files but will examine the output of tkprof which will nicely profile the cpu components, down to the the cpu time spent while accessing the buffers cached on behalf of each segment.
Let’s start from the low CLUF.
Here’s the second execution of the test SQL :
SQL> SET AUTOTRACE TRACE EXPLAIN STATISTICS
SQL> exec dbms_monitor.session_trace_enable(session_id=>&sid_to_trace,serial_num=>&serial_to_trace,binds=>true,waits=>true);
Enter value for sid_to_trace: 1700
Enter value for serial_to_trace: 60567
PL/SQL procedure successfully completed.
SQL> select /* &comm */ /*+ index(T IDX_ID_SORTED) */ * from TSCLUF T where id_sorted <= 1e6
/
Enter value for comm: lowww
old 1: select /* &comm */ /*+ index(T IDX_ID_SORTED) */ * from TSCLUF T where id_sorted <= 1e6
new 1: select /* lowww */ /*+ index(T IDX_ID_SORTED) */ * from TSCLUF T where id_sorted <= 1e6
1000000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 4196823974
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 256M| 40883 (1)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TSCLUF | 1000K| 256M| 40883 (1)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | IDX_ID_SORTED | 1000K| | 2409 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID_SORTED"<=1e6)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
171311 consistent gets
0 physical reads
0 redo size
28603451 bytes sent via SQL*Net to client
733785 bytes received via SQL*Net from client
66668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000000 rows processed
Clearly, the Statistics show 0 physical reads (all reads were LIOs).
Let’s check the CPU utilization – as mentioned, I have used tkprof with default parameters and navigated to the following section of the output file:
Tkprof stats for low CLUF:
select /* lowww */ /*+ index(T IDX_ID_SORTED) */ *
from
TSCLUF T where id_sorted <= 1e6
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 66668 1.52 1.93 0 171311 0 1000000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 66670 1.53 1.93 0 171311 0 1000000
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1000000 1000000 1000000 TABLE ACCESS BY INDEX ROWID BATCHED TSCLUF (cr=171311 pr=0 pw=0 time=876584 us starts=1 cost=40883 size=269000000 card=1000000)
1000000 1000000 1000000 INDEX RANGE SCAN IDX_ID_SORTED (cr=68746 pr=0 pw=0 time=320505 us starts=1 cost=2409 size=0 card=1000000)(object id 181826)
The total time spent on CPU is 1.53 seconds. If we extract just the fetch phase (1.52 seconds) which is crucial for us and break it down to time spent while performing the logical IOs (so, engaging CPU) on behalf of each involved segments, i.e. IDX_ID_SORTED (index) and TSCLUF (table), we obtain the following CPU time components:
IDX_ID_SORTED with time=320505 us, which makes ~320 ms
TSCLUF with time=876584 us, which makes ~877 ms
That makes 1197ms (877ms + 320ms) and in seconds: 1.2s which is the lion part of 1.52s from above.
Not getting into details of the missing CPU time components, the number TSCLUF with time=876584 us is crucial for the purpose of the comparison with the scenario with high CLUF which we will run now.
The hard parse time /mis=1/ has been apparently almost omitted due to rounding. It’s 2.7ms – see below.
Hard parse time directly from the raw 10046 trace :
PARSING IN CURSOR #140605198666136 len=93 dep=0 uid=0 oct=3 lid=0 tim=50234004390152 hv=2932894152 ad='4b315f910' sqlid='gdazb6yrd0uf8'
select /* lowww */ /*+ index(T IDX_ID_SORTED) */ * from TSCLUF T where id_sorted <= 1e6
END OF STMT
PARSE #140605198666136:c=2700,e=2663,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=4196823974,tim=50234004390151
Let’s estimate the CPU component for high CLUF
Basically, I will repeat the above scenario involving all prerequisites, this time obviously will use the ID_MESS <= 1e6 for high CLUF.
Here it is:
SQL> SET AUTOTRACE TRACE EXPLAIN STATISTICS
SQL> exec dbms_monitor.session_trace_enable(session_id=>&sid_to_trace,serial_num=>&serial_to_trace,binds=>true,waits=>true);
Enter value for sid_to_trace: 7631
Enter value for serial_to_trace: 4189
PL/SQL procedure successfully completed.
SQL> select /* &comm */ /*+ index(T IDX_ID_MESS) */ * from delme.TSCLUF T where id_mess <= 1e6 ;
Enter value for comm: dddd
old 1: select /* &comm */ /*+ index(T IDX_ID_MESS) */ * from delme.TSCLUF T where id_mess <= 1e6
new 1: select /* dddd */ /*+ index(T IDX_ID_MESS) */ * from delme.TSCLUF T where id_mess <= 1e6
1000329 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3542504988
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 256M| 1002K (1)| 00:00:40 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TSCLUF | 1000K| 256M| 1002K (1)| 00:00:40 |
|* 2 | INDEX RANGE SCAN | IDX_ID_MESS | 1000K| | 2346 (1)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID_MESS"<=1e6)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1069093 consistent gets
0 physical reads
0 redo size
26778663 bytes sent via SQL*Net to client
734023 bytes received via SQL*Net from client
66690 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000329 rows processed
As before, the Statistics show 0 physical reads (just LIOs which keep CPU busy).
Worth mentioning – unlike the test with low CLUF, this time the forcing index range scan with /*+ index(T IDX_ID_MESS) */ could be excused.
With higher CLUF, costing of full table scan against index range scan has higher chances to win – I am explaining this in details in this post.
Tkprof stats for high CLUF:
select /* dddd */ /*+ index(T IDX_ID_MESS) */ *
from
TSCLUF T where id_mess <= 1e6
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 66690 2.84 3.47 0 1069093 0 1000329
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 66692 2.84 3.48 0 1069093 0 1000329
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1000329 1000329 1000329 TABLE ACCESS BY INDEX ROWID BATCHED TSCLUF (cr=1069093 pr=0 pw=0 time=2236932 us starts=1 cost=1002540 size=269000269 card=1000001)
1000329 1000329 1000329 INDEX RANGE SCAN IDX_ID_MESS (cr=68767 pr=0 pw=0 time=339050 us starts=1 cost=2346 size=0 card=1000001)(object id 181827)
As before, let’s focus on the fetch phase (2.84 seconds).
As before – we need to break it down onto time spent while performing the logical IOs on behalf of each involved segment, so this time: IDX_ID_MESS (index) and TSCLUF (table).
We have the following CPU time components assigned to traversing and filtering the buffers of each of the cached portions of these two segments:
IDX_ID_MESS with time=339050 us, which makes ~339 ms (virtually no difference and.. no wonder )
TSCLUF with time=2236932 us, which makes ~2237 ms (~2.2 seconds!)
Let’s shout it aloud:
Because of high CLUF the CPU time accounted to logical reads of the affected table jumped from ~877 ms to ~2237 ms (almost 3 x).
Such a huge difference becomes obvious if you realize that this time Oracle had to perform many more logical reads in order to filter out virtually the same amount of rows.
You might say, “oh, it’s still within the range of seconds, so who cares”.
Well, yes/no.
If this is indeed just a single SQL statement not being executed concurrently, we’ll lose “just couple of CPU cycles” and – let’s not forget that: waste memory.
Things change if high concurrency comes into play.
Then we may add the following scenario – arguably, the most painful.
Namely, let’s imagine that accessing of such table is accomplished within a nested loop with this table as as a “driven table”, so the one which is accessed over the join predicate.
Now, if such joining predicate (changing each loop) yells rather high clustering factor, accessing of such driven table produces plenty of wasted LIOs (so, hammering CPU) and is wasting memory – we saw that just 10% of the table’s content satisfying the predicate incurred 93% of that table landing in cache.
With unfortunate circumstances (assume that a driving table/rowset is rather big, so being within the range of many dozen of thousands), the content of such driven table over time will probably be fully cached. So virtually, as if a full table scan of relatively large table got fully cached.
But let’s not forget – these cached buffers represent rather high redundancy if we take efficiency of a single range scan into account.
Even worse. Unlike with full table scans, these buffers will survive in the buffer cache longer (as originating from the single-block reads), still being relatively useless.
Last but not least.
If after studying these 4 parts you’re still in doubt, reach out to Richard’s article – you’ll find this topic very well explained (you’ll need to navigate to it within other topics in his post), accompanied with several other “gotchas”.
Leave a Reply