Note: this is continuation of the previous post.
Now, let’s finally simulate my problem.
To do that, I will force reading the table into the SGA.
To remain fair, I won’t cache it entirely.
Why am I considering it a “fair” approach?
It’s hard to expect, after having executed your query that the entire large table, say 200GB in size, will be fully cached.
In the majority of cases we don’t even want that a huge table would flood the majority of the available buffer cache memory.
It somewhat stays in contradiction to what I stated before, right?
I mean, having a large cache makes good conditions for caching.
Good, but not perfect.
Even with large buffer cache you can’t expect that the entire 200GB table be present in buffer cache of size, say, 500GB.
Even if you are near, it’s hard to expect that all the buffers are at the SCN adhering to the start of your query.
If they are not, then you’ll phase the challenge of expensive reconstructing the “before image” from UNDO buffers (perhaps even from UNDO on disk), so that the required rows represent the state “as of time of the start of your query”.
My real case had yet an extra add-on, namely the caching was distributed among 4 nodes which made the situation much worse.
I will skip all the oddities of the large objects caching algorithms. They are depicted here in a comprehensive manner.
Specifically, I will skip the Automatic Big Table Cache (ABTC).
My post refers to the situation which arguably comes up most often, that is, no extra caching algorithms are in place.
So, let’s cache it first to the point the size of the object and – broadly speaking – the size of the buffer cache allow.
The size of the table obviously by far exceeds the _small_table_threshold.
21:07:20 SYS@MYTEST1->PDBXX> alter session set "_serial_direct_read" = NEVER ;
Session altered.
Elapsed: 00:00:00.00
21:07:26 SYS@MYTEST1->PDBXX>
21:07:26 SYS@MYTEST1->PDBXX>
21:07:27 SYS@MYTEST1->PDBXX> select /*+ full (T) */ * from TESTUS.TESTOF T where ID <= 10e3;
10000 rows selected.
Elapsed: 00:00:27.27
Execution Plan
----------------------------------------------------------
Plan hash value: 3294238222
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 19M| 289K (1)| 00:00:12 |
|* 1 | TABLE ACCESS STORAGE FULL| TESTOF | 10000 | 19M| 289K (1)| 00:00:12 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("ID"<=10e3)
filter("ID"<=10e3)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1667375 consistent gets
1666668 physical reads
0 redo size
295769 bytes sent via SQL*Net to client
7754 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10000 rows processed
This is how much I managed to cache in the SGA:
INST_ID OWNER OBJECT_NAME OBJECT_TYPE B_CACHE MB_CACHE
------- ------------------------------ ------------------------------ -------------------- ---------------- ----------
1 TESTUS TESTOF TABLE 4,843,184,128 4618
1 select
2 b.inst_id, o.OWNER, o.OBJECT_NAME, o.OBJECT_TYPE, trunc((count(o.OBJECT_NAME)*t.BLOCKSIZE)) B_CACHE,
3 round( trunc((count(o.OBJECT_NAME)*t.BLOCKSIZE)/1024/1024), 1 ) MB_CACHE
4 from dba_objects o, gv$bh b, sys.TS$ t
5 where o.DATA_OBJECT_ID =b.OBJD
6 and b.status not like 'free'
7 and b.TS#=t.TS#
8 and b.inst_id = to_number( nvl('&inst',b.inst_id) )
9 and o.OWNER like upper(nvl('%&user%',o.OWNER))
10 and upper(o.OBJECT_NAME) like upper(nvl('&objname',o.OBJECT_NAME))
11 group by b.inst_id, o.OWNER, o.OBJECT_NAME,o.OBJECT_TYPE, t.BLOCKSIZE
12 having round( trunc((count(o.OBJECT_NAME)*t.BLOCKSIZE)/1024/1024), 1 ) >= to_number( nvl('&min_mb',0) )
13* order by b.inst_id, B_CACHE, o.OWNER, o.OBJECT_NAME, o.OBJECT_TYPE, B_CACHE
Now, the real thing. Let’s run the test query again (I will set up the tracing of nsmtio before the execution of the query).
Short remainder, the buffer cache has been populated, and it saves roughly 4.6 GB in the buffer cache. The total segment size ~13 GB.
21:16:46 SYS@MYTEST1->PDBXX> @off03
21:16:48 SYS@MYTEST1->PDBXX>
21:16:48 SYS@MYTEST1->PDBXX>
21:16:48 SYS@MYTEST1->PDBXX> @mysid
21:16:48 SYS@MYTEST1->PDBXX> col OSPID form a10
21:16:48 SYS@MYTEST1->PDBXX> col TRACEFILE form a100
21:16:48 SYS@MYTEST1->PDBXX>
21:16:48 SYS@MYTEST1->PDBXX>
21:16:48 SYS@MYTEST1->PDBXX>
21:16:48 SYS@MYTEST1->PDBXX> WITH mysidq AS (select max(sid) maxsid from v$mystat)
21:16:48 2 SELECT s.inst_id, s.sid, s.serial#, p.spid OSPID , p.PID, TRACEFILE
21:16:48 3 FROM gv$session s, gv$process p , mysidq
21:16:48 4 where
21:16:48 5 s.sid = mysidq.maxsid AND
21:16:48 6 p.addr=s.paddr AND
21:16:48 7 p.inst_id = s.inst_id
21:16:48 8 /
INST_ID SID SERIAL# OSPID PID TRACEFILE
------- ----------- ------- ---------- ---------- ----------------------------------------------------------------------------------------------------
1 8357 42446 367903 285 /u01/app/oracle/diag/rdbms/MYTEST/MYTEST1/trace/MYTEST1_ora_367903.trc
Elapsed: 00:00:00.04
21:16:48 SYS@MYTEST1->PDBXX> @settrf
21:16:48 SYS@MYTEST1->PDBXX> set echo on
21:16:48 SYS@MYTEST1->PDBXX>
21:16:48 SYS@MYTEST1->PDBXX> ALTER SESSION SET TRACEFILE_IDENTIFIER='&trfile_suffix';
Enter value for trfile_suffix: CACHED
old 1: ALTER SESSION SET TRACEFILE_IDENTIFIER='&trfile_suffix'
new 1: ALTER SESSION SET TRACEFILE_IDENTIFIER='CACHED'
Session altered.
Elapsed: 00:00:00.00
21:16:54 SYS@MYTEST1->PDBXX>
21:16:54 SYS@MYTEST1->PDBXX>
21:16:54 SYS@MYTEST1->PDBXX> alter session set events 'trace[nsmtio]';
Session altered.
Elapsed: 00:00:00.00
21:16:54 SYS@MYTEST1->PDBXX>
21:16:54 SYS@MYTEST1->PDBXX> @ssqlid
21:16:54 SYS@MYTEST1->PDBXX> set echo on
21:16:54 SYS@MYTEST1->PDBXX> set feedback on sql_id
21:16:54 SYS@MYTEST1->PDBXX> set echo off
21:16:54 SYS@MYTEST1->PDBXX> SET AUTOTRACE TRACE EXPLAIN STATISTICS
21:16:54 SYS@MYTEST1->PDBXX>
21:16:54 SYS@MYTEST1->PDBXX>
21:16:54 SYS@MYTEST1->PDBXX> select /*+ full (T) */ * from TESTUS.TESTOF T where ID <= 10e3;
10000 rows selected.
SQL_ID: 8s6pdjnxuzsx3
Elapsed: 00:00:27.71
SQL_ID: g72kdvcacxvtf
Execution Plan
----------------------------------------------------------
SQL_ID: gwn5z121vwdj8
Plan hash value: 3294238222
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 19M| 289K (1)| 00:00:12 |
|* 1 | TABLE ACCESS STORAGE FULL| TESTOF | 10000 | 19M| 289K (1)| 00:00:12 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("ID"<=10e3)
filter("ID"<=10e3)
SQL_ID: 3s1hh8cvfan6w
SQL_ID: g72kdvcacxvtf
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1667375 consistent gets
1666667 physical reads
0 redo size
295787 bytes sent via SQL*Net to client
7750 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10000 rows processed
Trace file (nsmtio) – important excerpts : MYTEST1_ora_367903_CACHED.trc
Partial short call stack signature: 0xd57df041b0226129
NSMTIO: kcbism: islarge 1 next 0 nblks 1668786 type 2, bpid 3, kcbisdbfc 0 kcbnhl 16384 kcbstt 4368 keep_nb 0 nbuf 945194 kcbnwp 8 pstt 0 Exadata 1 bsz 8192 objd 75105
NSMTIO: kcbimd: nblks 1668786 kcbstt 4368 kcbnbh 94519 kcbisdbfc 3 is_medium 0 objd 0
NSMTIO: kcbivlo: nblks 1668786 vlot 500 pnb 945194 kcbisdbfc 0 is_large 0
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:NoDirectRead:[CACHE_READ]: tsn: 8, objd: 75105, objn: 75105
ckpt: 0, nblks: 1668786, ntcache: 590102, ntdist:1078684, Exadata:1, bsz:8192, rule 0, threshold 104857600, rule_enabled 0\nDirect Path for pdb 3 tsn 8 objd 75105 objn 75105
Direct Path 0 ckpt 0, nblks 1668786 ntcache 590102 ntdist 1078684
Direct Path mndb 0 tdiob 6 txiob 0 tciob 122
Direct path diomrc 128 dios 2 kcbisdbfc 0
kcbdpc: kx 1.500000 kc 2.000000 lhs 1618026.000000 rhs_ckpt 1668786.000000 rhs_dp 1668786.000000 rio 1.000000 cached 1668786 Exadata 1 bsz 8192 offload_rule 0 threshold 104857600 rule_enabled 0
NSMTIO: Additional Info: VLOT=4725970
Object# = 75105, Object_Size = 1668786 blocks
SqlId = 8s6pdjnxuzsx3, plan_hash_value = 3294238222, Partition# = 0
NSMTIO: kcbdpc:NoDirectRead:[CACHE_READ]
ntcache: number of blocks cached (very close to what my script returned)
nblks: number of blocks from the table statistics
ntdist: nblks – ntcache
not very relevant, just mentioning: kcbstt 4368 is the _small_table_threshold.
ntcache in bytes: 590102*8192 = 4834115584 (bytes)
Clearly, the system decided to revert to block shipping mode because the number of buffers present in the buffer cache was substantial enough. In this case, rather than using direct path reads, the data was accessed through the buffer cache, as the system deemed it more efficient given the amount of the cached portion of the table :
NSMTIO: kcbdpc:NoDirectRead:[CACHE_READ]
Unfortunately, the trace does not say where the threshold of that amount has been set (or at least I haven’t managed to spot it).
There are pretty good studies carried out by Alex Fatkulin in his article, but the result does not seem to hold true any longer in 19c (his tests were carried out on 11g).
Quoting Alex: “Direct path reads stops happening after certain amount of your table’s blocks are in the buffer cache already (…) which happens to be half of the table’s blocks”.
Let’s be pragmatic – do we care that much where this threshold is?
All in all, would you try to limit the extent of caching, especially when the blocks are cached as a result of index access? See the next scenario which indirectly touches that topic.
Nonetheless, the decision to revert to block shipping turned out really bad: 27 sec vs. less than 1 sec with smart scan and nice offload.
Note, these 27 sec. were achieved with ideal conditions – no need to reconstruct the consistent image (no undo buffers used) – all the buffers were at the demanded SCN.
No caching in the remote SGAs either.
A side note :
The algorithm whether or not to to apply the direct path read during the runtime phase also apparently takes into account the compression (OLTP/EHCC).
Intuitively obvious – the higher the compression ratio the bigger chances to revert to block shipping in place of direct path read (and eliminating offload).
Potentially dangerous if all of sudden the offload of a query stopped working following a nice EHCC compression project of a large segment (partition).
Let’s see where the time was wasted.
This time I have used the 10046 event to trace the calls throughout the entire time – since hitting ENTER till the test query returned (~27 sec).
I have used the raw version of it as well as the tkprof for profiling.
This is what I got from tkrpof (relevant excerpt) :
SQL ID: 8s6pdjnxuzsx3 Plan Hash: 3294238222
select /*+ full (T) */ *
from
TESTUS.TESTOF T where ID <= 10e3
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 668 12.39 27.57 1666667 1667375 0 10000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 670 12.39 27.57 1666667 1667375 0 10000
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
10000 10000 10000 TABLE ACCESS STORAGE FULL TESTOF (cr=1667375 pr=1666667 pw=0 time=12379 us starts=1 cost=289341 size=20630000 card=10000)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 668 0.00 0.00
Disk file operations I/O 29 0.00 0.00
cell multiblock physical read 13053 0.00 16.29
SQL*Net message from client 668 0.00 0.05
PGA memory operation 2 0.00 0.00
********************************************************************************
So, the ~27 sec. can essentially be divided into two groups:
-cpu component (12sec)
-wait component which then can in the lion part can be assigned to just this wait: cell multiblock physical read (16sec). On non-Exadata platform that would be the “db file scattered read”.
Why is it so slow?
“because you’ve set the 10046” – in fact, it did have an impact, that’s just the tiny part tough.
All in all, I had nearly 35% of all blocks cached.
Note, we are accessing this table with full table scan, so the rows required for filtering during FTS are not accessed in a two-step process where the cached portion is read and filtered first, followed by reading “the rest” from disk.
Anticipating, that will be the case, but with different type of caching (see below).
The blocks will virtually be fully read from disk and by default located at the cold end (the edge which is the the first candidate to age out) of the LRU. That means, during that process virtually all rows have been recycled from cache while being read from disk and put in cache.
When you run any script which would allow you to observe the start/end value of STAT: “free buffer requested” for you session (I have used snapper again), you’ll notice that free buffer requested is very close to the number of the physical reads expressed in 8KB blocks (in my case: 1666695). So it is nearly the same as the “physical reads” reported above (1666667) – see the section “Statistics” :
Statistics
----------------------------------------------------------
1666667 physical reads
Quick math again : 1666667 (physical reads) * 8192 = 13 653 336 064 (bytes)
13 653 336 064 (bytes) / 1024 / 1024 = 13020 [MB] which makes the size of the entire table.
That means, virtually none of the cached buffers were of any use for us. All were recycled.
That also explains such high cpu utilization. In terms of “time elapsed” nearly as high as the time required to accomplish the physical I/Os.
Another test scenario: let’s change the quality of the cached blocks to “high quality”.
The general idea is to improve the “quality” of the cached blocks by using a trick to influence the optimizer’s cost calculations. The goal is to make the cost of a large index range scan appear more favorable than a full table scan (FTS). To do this, I’ll adjust the optimizer’s view of the table size by artificially inflating the number of table blocks—let’s say by a factor of 10.
By doing so, the cost of using the index will be more appealing, pushing the optimizer to choose the index access path instead of the FTS. It’s important to remember that the number of blocks in the table plays a key role in these calculations, so I’ll fake that number to create the desired effect (if feel like taking a look under the hood, see my another post of matching the 10053 trace with SQL Plan lines).
Before starting, I’ll flush the buffer cache to make space for new cached blocks. Then, I’ll proceed to cache a similar amount of blocks as before, but this time through index access (single block reads). The SQL query I’ll use for this will be: SELECT * FROM TESTUS.TESTOF T WHERE ID <= 2e6;
TABLE_NAME TABLESPACE_NAME COMPRESS COMPRESS_FOR BLKSIZE BLOCKS NUM_ROWS AVG_ROW_LEN
------------------------------------------------ --------------- -------- --------------- ---------- ---------- ---------------- -----------
TESTUS.TESTOF PSTS DISABLED 8192 1668786 5,000,000 2063
22:40:59 SYS@GCTEST1->PDBXX> @fb
22:41:00 SYS@GCTEST1->PDBXX> alter system flush buffer_cache ;
System altered.
Elapsed: 00:00:00.28
22:41:00 SYS@GCTEST1->PDBXX>
22:41:00 SYS@GCTEST1->PDBXX> set echo off
22:41:01 SYS@GCTEST1->PDBXX>
22:41:01 SYS@GCTEST1->PDBXX> exec dbms_stats.set_table_stats('&owner','&table',numblks=>&set_new_number_of_blocks) ;
Enter value for owner: TESTUS
Enter value for table: TESTOF
Enter value for set_new_number_of_blocks: 1668786*10
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.03
22:41:16 SYS@GCTEST1->PDBXX>
22:41:17 SYS@GCTEST1->PDBXX>
22:41:29 SYS@GCTEST1->PDBXX>
22:41:32 SYS@GCTEST1->PDBXX> @setts
22:41:33 SYS@GCTEST1->PDBXX> SET AUTOTRACE TRACE EXPLAIN STATISTICS
22:41:34 SYS@GCTEST1->PDBXX>
22:41:34 SYS@GCTEST1->PDBXX> SELECT * from TESTUS.TESTOF T where ID <= 2e6 ;
2000000 rows selected.
Elapsed: 00:02:39.83
Execution Plan
----------------------------------------------------------
Plan hash value: 3219724843
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2000K| 3934M| 670K (1)| 00:00:27 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TESTOF | 2000K| 3934M| 670K (1)| 00:00:27 |
|* 2 | INDEX RANGE SCAN | IDX_ON_TESTOF | 2000K| | 3971 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"<=2e6)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1204030 consistent gets
670984 physical reads
0 redo size
4180837046 bytes sent via SQL*Net to client
1467070 bytes received via SQL*Net from client
133335 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2000000 rows processed
Caching :
INST_ID OBJECT_ID OBJECT_NAME OBJECT_TYPE BYTES_CACHE MB_CACHE
------- ---------- ------------------------------------------------------------ -------------------- ---------------- --------------
1 75109 TESTUS.IDX_ON_TESTOF INDEX 35,364,864 33.7
75105 TESTUS.TESTOF TABLE 5,461,336,064 5,208.3
As you notice, the CBO chose to apply the index range scan. This decision is obviously the result of fake data it received. With the first attempt, without this trick, the CBO was clever enough to choose the FTS. Yes, the hint /*+ index */ would also do the same job 🙂
Now, having achieved the “high quality caching”, let’s revert the faked statistics to truth and set the number of blocks to the real value.
Next, I will repeat the entire test. Albeit the basic decision of not applying “direct path read” stays the same, there will be a notable difference though.
22:46:29 SYS@GCTEST1->PDBXX> exec dbms_stats.set_table_stats('&owner','&table',numblks=>&set_new_number_of_blocks) ;
Enter value for owner: TESTUS
Enter value for table: TESTOF
Enter value for set_new_number_of_blocks: 1668786
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.02
22:46:54 SYS@GCTEST1->PDBXX>
22:46:54 SYS@GCTEST1->PDBXX> alter session set events 'trace[nsmtio]';
Session altered.
Elapsed: 00:00:00.00
22:47:08 SYS@MYTEST1->PDBXX> @setts
22:47:25 SYS@MYTEST1->PDBXX> SET AUTOTRACE TRACE EXPLAIN STATISTICS
22:47:26 SYS@MYTEST1->PDBXX>
22:47:26 SYS@MYTEST1->PDBXX> @settrf
22:58:28 SYS@MYTEST1->PDBXX> set echo on
22:58:28 SYS@MYTEST1->PDBXX>
22:58:28 SYS@MYTEST1->PDBXX> ALTER SESSION SET TRACEFILE_IDENTIFIER='&trfile_suffix';
Enter value for trfile_suffix: nsmtio_highq_cache
old 1: ALTER SESSION SET TRACEFILE_IDENTIFIER='&trfile_suffix'
new 1: ALTER SESSION SET TRACEFILE_IDENTIFIER='nsmtio_highq_cache'
Session altered.
Elapsed: 00:00:00.00
22:59:34 SYS@MYTEST1->PDBXX>
22:59:36 SYS@MYTEST1->PDBXX> select /*+ full (T) */ * from TESTUS.TESTOF T where ID <= 10e3 ;
10000 rows selected.
Elapsed: 00:00:17.94
Execution Plan
----------------------------------------------------------
Plan hash value: 3294238222
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 19M| 289K (1)| 00:00:12 |
|* 1 | TABLE ACCESS STORAGE FULL| TESTOF | 10000 | 19M| 289K (1)| 00:00:12 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("ID"<=10e3)
filter("ID"<=10e3)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1667375 consistent gets
1000002 physical reads
0 redo size
295769 bytes sent via SQL*Net to client
7755 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10000 rows processed
The important excerpt of the trace file focused on NSMTIO :
Partial short call stack signature: 0xd57df041b0226129
NSMTIO: kcbism: islarge 1 next 0 nblks 1668786 type 2, bpid 3, kcbisdbfc 0 kcbnhl 16384 kcbstt 4368 keep_nb 0 nbuf 945194 kcbnwp 8 pstt 0 Exadata 1 bsz 8192 objd 75105
NSMTIO: kcbimd: nblks 1668786 kcbstt 4368 kcbnbh 94519 kcbisdbfc 3 is_medium 0 objd 0
NSMTIO: kcbivlo: nblks 1668786 vlot 500 pnb 945194 kcbisdbfc 0 is_large 0
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:NoDirectRead:[CACHE_READ]: tsn: 8, objd: 75105, objn: 75105
ckpt: 0, nblks: 1668786, ntcache: 666667, ntdist:1002119, Exadata:1, bsz:8192, rule 0, threshold 104857600, rule_enabled 0\nDirect Path for pdb 3 tsn 8 objd 75105 objn 75105
Direct Path 0 ckpt 0, nblks 1668786 ntcache 666667 ntdist 1002119
Direct Path mndb 0 tdiob 6 txiob 0 tciob 135
Direct path diomrc 128 dios 2 kcbisdbfc 0
kcbdpc: kx 1.500000 kc 2.000000 lhs 1503178.500000 rhs_ckpt 1668786.000000 rhs_dp 1668786.000000 rio 1.000000 cached 1668786 Exadata 1 bsz 8192 offload_rule 0 threshold 104857600 rule_enabled 0
NSMTIO: Additional Info: VLOT=4725970
Object# = 75105, Object_Size = 1668786 blocks
SqlId = 8mdrfxy35m7rp, plan_hash_value = 3294238222, Partition# = 0
Short summary of the relevant values :
NSMTIO: kcbdpc:NoDirectRead:[CACHE_READ]
nblks: 1668786 (number of blocks from statistics)
ntcache: 666667 (this many blocks was cached which adheres to the result of querying of the v$bh above: 1668786 (buffers) * 8192 = 5461336064 (bytes).
Again, the decision to not apply the “direct path read” (thus: no offload) has not changed. Reason’s the same: too many blocks got cached.
There is a notable difference though: the cached buffers were located in the privileged location (pole position :), that is, in the hot area of LRU and as such they were utilized.
There is obviously a logic behind it: the blocks accessed with the single block reads (this is what I have forced with the index range scan) are considered more useful for the sake of the subsequent visits as potentially they were more carefully selected than the dumb result of a full table scan.
Therefore they should not be disposed from the cache as quickly as the remainder left over being the result of the FTS which statistically is almost useless.
This has been confirmed above – unlike in the previous scenario, this time the portion of data which had been cached has indeed also been utilized (read from cache). Then the rest has been laboriously read from disk. So, this time the job was clearly divided between the cache reads (LIO) and disk reads (PIO).
Worth pointing out, the elapsed time got reduced down to ~17 sec (from ~27 sec) – which btw. is still nowhere near to the pure smart scan achievement (below 1sec).
The proportions also roughly match (well, a bit of “oranges and apples comparison” as I forced caching in slightly more “generous” way than with pure FTS), anyway:
The savings on PIOs (expressed in 8192 blocks) is : 1667375-1000002 = 667373. 667373/1667375 = 40%
The saving on time elapsed: 27-17 = 10 sec. 10/27 = 37%
Talking numbers:
A total of 1,000,002 physical reads were performed. Since this statistic is measured in blocks of 8,192 bytes, it translates to: 8,192,016,384 bytes (1,000,002 * 8,192), or approximately 7,812 MB of data read from disk.
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1667375 consistent gets
1000002 physical reads
The number of cached buffers amounts to 666,667 (see ntcache above), which translates to 5,461,336,064 bytes (approximately 5,208 MB).
Now, if we sum both values:
- 8,192,016,384 bytes (from physical reads)
- 5,461,336,064 bytes (from cached buffers)
We get a total of 13,653,352,448 bytes, or approximately 13,020 MB. This total is very close to the size of the entire table segment.
This indicates that we indeed had to read the entire table. However, this time, the rows delivered for filtering could be divided into two parts:
- being read from cache (~5208 MB)
- being read from disk (~7812 MB)
Keep reading here
Leave a Reply