This post continues from :”When OEM meets the CBO: cardinality chaos across containers – Part 1“
Getting down to a single & simple query and using “FROM CONTAINERS” to my own advantage
Let’s further “divide and conquer” –
to confirm that each component of the “UNION ALL” being executed at the CDB level (which looks like that excerpt directly below), spawns as many queries as there are PDBs within that CDB, I have isolated just the single branch of such “UNION ALL” and executed it at the CDB level.
So, being precise, I manually executed this tiny subsection of the main query at the CDB level (I chose to execute the version of the SQL transformed by CBO) :
SELECT
"K"."CON_ID" "CON_ID",
"K"."TABLESPACE_NAME" "TABLESPACE_NAME",
SUM("K"."BYTES") "BYTES"
FROM
CONTAINERS ( "SYS"."DBA_FREE_SPACE" ) "K"
WHERE
"K"."CON_ID" <>40 AND "K"."CON_ID" <>41
GROUP BY
"K"."CON_ID",
"K"."TABLESPACE_NAME"
Indeed, It became immediately clear that this “root query” got down to each PDB and spawned the following SQL at each of the PDBs :
SELECT /*+ NO_STATEMENT_QUEUING RESULT_CACHE (SYSOBJ=FALSE) */ "TABLESPACE_NAME","BYTES" FROM "SYS"."DBA_FREE_SPACE" "K" WHERE 1=1
I stripped the portion which goes after the data from DBA_FREE_SPACE as the one caused the most of the problems (at least SQL Monitor suggested so).
Once I realized the real action took place at the tenants — which were supposedly the source of the problem — I wanted a way to look at all PDBs at once, in a kind of “combined” view, especially that there were bunch of the PDBs to search through.
That’s exactly what CONTAINERS(<some_dba_%>) gives you.
So why not use it to my own advantage?
With that in mind, I wrote a script (actually, just rebuilt another one which I often use) that iterates through all PDBs and queries the GV$SQL “at the source” to identify the specific SQL statements triggered by the main OEM query responsible for detecting outliers.
The script source is attached at the end of this post, nonetheless, at a high level, the idea is to use FROM CONTAINERS(gv$sql) and join it to gv$pdbs by con_id, so you can retrieve the PDB names associated with the gv$sql data returned from each PDB. Obviously, the script needs to be executed from the CDB.
This approach—using CONTAINERS(v$...), by the way opens up a variety of possibilities for retrieving data across PDBs with a single script executed from the CDB, while still querying the v$ views at the PDB level.
So, in order to navigate to the outliers at the level of PDBs, I first executed the main query (which originally has been executed by the OEM), waited around 40 seconds and then ran my script from the CDB, providing “20” when asked “Enter value for min_ela_secs:” (the rest of the filters got ignored as I kept hitting <ENTER>).
With that condition supplied (so, basically, I have asked my script to list all PDBs along with all sql_ids whose execution time exceeded 20 seconds), I received the bzps4qgdag0v1 as the one which was the common outlier across all PDBs.
No need to mention that bzps4qgdag0v1 is my well known :
SELECT /*+ NO_STATEMENT_QUEUING RESULT_CACHE (SYSOBJ=FALSE) */ "TABLESPACE_NAME","BYTES" FROM "SYS"."DBA_FREE_SPACE" "K" WHERE 1=1
The example of its execution along with its sample output is shown below.
Unfortunately, this output represents the already tuned state – see the “avg[s]/exec” (average elapsed time per execution measured in seconds) showing 0-2 [secs].
Before tuning, this column showed many hundreds (for some: thousands) of seconds for each listed PDB (I have changed the names of the PDBs).
PDB_NAME INST_ID CON_ID SQL_ID CHLD Sens Aware? Binds Prof Patch PLAN_HASHVAL CBO_HASHVAL PARS_SCHEMA TOTAL_EXECS cpu/exec[us] avg[ms]/exec avg[s]/exec Offl[%] Flash[%] rows/exec buffgets/exec phyreqs/exec
---------- ------- ---------- --------------- ----- ----- -------- ----- ---- ----- ------------ ----------- ------------- ------------ ---------------- ------------ ----------- ------- -------- ------------ ------------- ------------
PDB0001 4 24 bzps4qgdag0v1 0 N N No No No 1937079756 2083106959 SYS 6 4,740 5 0 .00 .00 2,321 238 0
PDB0002 4 33 bzps4qgdag0v1 2 N N No No No 2728508444 2083106959 SYS 6 5,802 6 0 .00 .00 2,434 442 0
PDB0003 4 36 bzps4qgdag0v1 4 N N No No No 2728508444 2083106959 SYS 6 6,988 7 0 .00 .00 2,444 666 0
PDB0004 4 42 bzps4qgdag0v1 5 N N No No No 1937079756 2083106959 SYS 6 7,582 8 0 .00 .00 2,574 380 0
PDB0005 4 32 bzps4qgdag0v1 1 N N No No No 1937079756 2083106959 SYS 6 8,781 9 0 .00 100.00 2,569 408 13
PDB0006 4 65 bzps4qgdag0v1 7 N N No No No 1605568276 2083106959 SYS 6 169,138 178 0 .00 100.00 4,090 7500 1152
PDB0007 5 87 bzps4qgdag0v1 11 N N No No No 1937079756 2083106959 SYS 6 4,411 5 0 .00 .00 1,557 197 0
PDB0008 5 68 bzps4qgdag0v1 4 N N No No No 1937079756 2083106959 SYS 6 4,505 5 0 .00 .00 2,507 158 0
PDB0009 5 27 bzps4qgdag0v1 1 N N No No No 1937079756 2083106959 SYS 6 5,095 6 0 .00 .00 2,184 226 0
PDB0010 5 67 bzps4qgdag0v1 3 N N No No No 3445310011 2083106959 SYS 6 11,364 12 0 .00 100.00 2,543 1302 17
PDB0011 5 74 bzps4qgdag0v1 5 N N No No No 3742954135 2083106959 SYS 6 12,448 13 0 .00 .00 2,390 2331 0
PDB0012 5 86 bzps4qgdag0v1 10 N N No No No 2963266590 2083106959 SYS 6 18,184 18 0 .00 100.00 2,001 1049 63
PDB0013 5 85 bzps4qgdag0v1 9 N N No No No 269387889 2083106959 SYS 6 22,258 23 0 .00 100.00 3,287 1951 31
PDB0014 5 84 bzps4qgdag0v1 8 N N No No No 41022335 2083106959 SYS 6 37,805 38 0 .00 100.00 1,525 1874 246
PDB0015 5 82 bzps4qgdag0v1 6 N N No No No 4252579553 2083106959 SYS 6 65,915 67 0 .00 100.00 1,251 2347 492
PDB0016 5 83 bzps4qgdag0v1 7 N N No No No 1220745601 2083106959 SYS 6 80,796 82 0 .00 100.00 3,443 3477 361
PDB0017 5 39 bzps4qgdag0v1 2 N N No No No 1430812067 2083106959 SYS 6 454,546 459 0 .00 100.00 136,934 67728 457
PDB0018 6 29 bzps4qgdag0v1 0 N N No No No 1937079756 2083106959 SYS 6 5,726 6 0 .00 100.00 1,758 134 11
PDB0019 6 64 bzps4qgdag0v1 4 N N No No No 2728508444 2083106959 SYS 6 7,412 8 0 .00 100.00 2,143 264 14
PDB0020 6 63 bzps4qgdag0v1 3 N N No No No 2963266590 2083106959 SYS 6 39,338 40 0 .00 100.00 2,741 1722 165
PDB0021 6 80 bzps4qgdag0v1 5 N N No No No 4274080761 2083106959 SYS 6 206,677 209 0 .00 100.00 2,577 20273 53
PDB0022 6 59 bzps4qgdag0v1 2 N N No No No 2407526118 2083106959 SYS 6 741,912 760 1 .00 100.00 12,163 18757 6355
PDB0023 6 37 bzps4qgdag0v1 1 N N No No No 2052579814 2083106959 SYS 6 1,106,642 1,125 1 .00 100.00 84,116 22788 6839
PDB0024 7 72 bzps4qgdag0v1 6 N N No No No 1937079756 2083106959 SYS 6 4,115 4 0 .00 100.00 2,151 136 4
PDB0025 7 61 bzps4qgdag0v1 3 N N No No No 1937079756 2083106959 SYS 6 4,535 5 0 .00 100.00 2,538 156 4
PDB0026 7 71 bzps4qgdag0v1 5 N N No No No 1937079756 2083106959 SYS 6 6,446 7 0 .00 100.00 2,466 618 4
PDB0027 7 30 bzps4qgdag0v1 0 N N No No No 1652492838 2083106959 SYS 6 13,453 14 0 .00 100.00 7,447 544 13
PDB0028 7 41 bzps4qgdag0v1 2 N N No No No 3483489860 2083106959 SYS 6 317,780 329 0 .00 99.79 38,088 43656 388
PDB0029 7 70 bzps4qgdag0v1 4 N N No No No 2685656366 2083106959 SYS 6 576,215 582 1 .00 100.00 19,428 40536 416
PDB0030 7 40 bzps4qgdag0v1 1 N N No No No 814635842 2083106959 SYS 6 1,945,995 1,988 2 .00 100.00 62,423 110501 9850
Before getting busy with tuning described in the subsequent sections of this article, I was able (by looking at the avg[ms]/exec column) to fairly precisely pinpoint the SQLs and the PDBs which were the outliers.
This in turn enabled me to revert to the proper PDB(s) for further analysis.
Example of tuning focused on one of the outliers (one PDB)
Following the execution of my script I was able to navigate to all long lasting queries which got spawned at each PDB on behalf of the main (OEM) query executed at CDB level.
So, getting back to the example of just one selected PDB – see the avg[s]/exec showing 311 seconds (still not finished) which was wasted on behalf of SQL_ID: bzps4qgdag0v1 :SELECT /*+ NO_STATEMENT_QUEUING RESULT_CACHE (SYSOBJ=FALSE) */ “TABLESPACE_NAME”,”BYTES” FROM “SYS”.”DBA_FREE_SPACE” “K” WHERE 1=1 – definitely a candidate to be looked at :
SYS @ CDB> @lssqlid_across_pdbs.sql
Enter value for sqlid:
old 45: sql_id = nvl('&sqlid' , sql_id)
new 45: sql_id = nvl('' , sql_id)
Enter value for inst_id:
old 46: AND q.inst_id = to_number( nvl('&inst_id' , q.inst_id) )
new 46: AND q.inst_id = to_number( nvl('' , q.inst_id) )
Enter value for sql_text_prefix:
old 47: AND upper(sql_text) like upper(nvl('&sql_text_prefix%' , sql_text))
new 47: AND upper(sql_text) like upper(nvl('%from DELME.XLCHI%%' , sql_text))
Enter value for parsing_schema:
old 48: AND upper(PARSING_SCHEMA_NAME) like upper(nvl('&parsing_schema',PARSING_SCHEMA_NAME))
new 48: AND upper(PARSING_SCHEMA_NAME) like upper(nvl('',PARSING_SCHEMA_NAME))
Enter value for plan_hash_value:
old 50: AND PLAN_HASH_VALUE like nvl('&PLAN_HASH_VALUE%' , PLAN_HASH_VALUE)
new 50: AND PLAN_HASH_VALUE like nvl('%' , PLAN_HASH_VALUE)
Enter value for force_matching_signature:
old 51: AND FORCE_MATCHING_SIGNATURE = to_number( nvl('&FORCE_MATCHING_SIGNATURE',FORCE_MATCHING_SIGNATURE) )
new 51: AND FORCE_MATCHING_SIGNATURE = to_number( nvl('',FORCE_MATCHING_SIGNATURE) )
Enter value for how_many_minutes_back:
old 52: AND LAST_ACTIVE_TIME > ( sysdate - nvl('&How_Many_Minutes_Back',99999999)/24/60 )
new 52: AND LAST_ACTIVE_TIME > ( sysdate - nvl('',99999999)/24/60 )
Enter value for min_ela_secs:20
Here is the line (PDB) of interest – this time still before tuning (I removed many trailing columns of the output of the script to accommodate the elapsed time) :
Bind Bind
PDB_NAME INST_ID CON_ID SQL_ID CHLD Sens Aware? cpu/exec[us] avg[ms]/exec avg[s]/exec
---------- ------- ---------- --------------- ----- ----- -------- ------------ ---------------- ------------
PDB0030 7 40 bzps4qgdag0v1 1 N N 304,450,454 311,134 311
SYS@CDB->PDB0030> @sqltxt
Enter value for sqlid: bzps4qgdag0v1
SQL_FULLTEXT
-----------------------------------------------------------------------------------------------------------------------------
SELECT /*+ NO_STATEMENT_QUEUING RESULT_CACHE (SYSOBJ=FALSE) */ "TABLESPACE_NAME","BYTES" FROM "SYS"."DBA_FREE_SPACE" "K" WHERE 1=1
As I haven’t changed anything within the the SQL text, the SQL_ID is identical with the one caught by the script.
Let’s follow the same strategy and apply the same approach, so let us add the /*+ monitor */ hint in order to be able to observe the SQL Monitor’s output and re-run it manually, but this time directly from the PDB (PDB0030):
SYS@CDB7->PDB0030> @mysid
INST_ID SID SERIAL# OSPID PID TRACEFILE
------- ----------- ------- ---------- ---------- -------------------------------------------------------------------
7 6324 49834 107384 1078 /u02/app/oracle/diag/rdbms/cdb7/CDB7/trace/PDB0030_ora_107384.trc
SYS@CDB7->PDB0030> @ssqlid
12:51:29 SYS@CDB7->PDB0030> set feedback on sql_id
12:51:29 SYS@CDB7->PDB0030> set echo off
SYS@CDB7->PDB0030> @setts
SYS@CDB7->PDB0030> SET AUTOTRACE TRACE EXPLAIN STATISTICS
SYS@CDB7->PDB0030> SELECT /*+ monitor NO_STATEMENT_QUEUING RESULT_CACHE (SYSOBJ=FALSE) */ "TABLESPACE_NAME","BYTES" FROM "SYS"."DBA_FREE_SPACE" "K" WHERE 1=1 ;
SQL monitor output (SQL_ID got changed as I have added the hint, so basically changed the SQL text – this time it appeared as SQL_ID: gxvnh0fbwkkqx) :
Enter value for sql_id: gxvnh0fbwkkqx
old 1: select dbms_sqltune.REPORT_SQL_MONITOR(sql_id=>'&&sql_id', report_level=>'ALL', type=>'TEXT') as report from dual
new 1: select dbms_sqltune.REPORT_SQL_MONITOR(sql_id=>'gxvnh0fbwkkqx', report_level=>'ALL', type=>'TEXT') as report from dual
SQL Monitoring Report
SQL Text
------------------------------
SELECT /*+ monitor NO_STATEMENT_QUEUING RESULT_CACHE (SYSOBJ=FALSE) */ "TABLESPACE_NAME","BYTES" FROM "SYS"."DBA_FREE_SPACE" "K" WHERE 1=1
Global Information
------------------------------
Status : EXECUTING
Instance ID : 7
Session : SYS (6324:49834)
SQL ID : gxvnh0fbwkkqx
SQL Execution ID : 117440512
Execution Started : 06/27/2024 12:51:49
First Refresh Time : 06/27/2024 12:51:49
Last Refresh Time : 06/27/2024 13:07:42
Duration : 955s
Fetch Calls : 2405
Global Stats
====================================================================================================
| Elapsed | Cpu | IO | Concurrency | Cluster | Other | Fetch | Buffer | Read | Read |
| Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Waits(s) | Calls | Gets | Reqs | Bytes |
====================================================================================================
| 954 | 907 | 1.15 | 0.16 | 27 | 19 | 2405 | 67M | 14M | 107GB |
====================================================================================================
SQL Plan Monitoring Details (Plan Hash Value=2579583833)
=================================================================================================================================================================================================================
| Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Read | Read | Mem | Activity | Activity Detail |
| | | | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes | | (%) | (# samples) |
=================================================================================================================================================================================================================
| 0 | SELECT STATEMENT | | | | 894 | +0 | 1 | 36070 | | | . | 0.32 | Cpu (3) |
| 1 | VIEW | DBA_FREE_SPACE | 350 | 90 | 4 | +0 | 1 | 36070 | | | . | | |
| 2 | UNION-ALL | | | | 4 | +0 | 1 | 36070 | | | . | | |
| 3 | NESTED LOOPS | | 1 | 1 | | | 1 | | | | . | | |
| 4 | NESTED LOOPS | | 1 | 1 | 1 | +0 | 1 | 0 | | | . | | |
| 5 | INDEX FULL SCAN | I_FILE2 | 275 | 1 | 1 | +0 | 1 | 278 | | | . | | |
| 6 | TABLE ACCESS CLUSTER | FET$ | 1 | | 1 | +0 | 278 | 0 | | | . | | |
| 7 | INDEX UNIQUE SCAN | I_TS# | 1 | | 1 | +0 | 278 | 278 | | | . | | |
| 8 | TABLE ACCESS CLUSTER | TS$ | 1 | | | | | | | | . | | |
| 9 | INDEX UNIQUE SCAN | I_TS# | 1 | | | | | | | | . | | |
| 10 | NESTED LOOPS | | 346 | 78 | 4 | +0 | 1 | 36070 | | | . | | |
| 11 | HASH JOIN | | 346 | 78 | 4 | +0 | 1 | 36070 | | | . | | |
| 12 | TABLE ACCESS STORAGE FULL | TS$ | 274 | 78 | 1 | +0 | 1 | 278 | | | . | | |
| 13 | FIXED TABLE FULL | X$KTFBFE | 347 | | 4 | +0 | 1 | 36070 | 112 | 896KB | . | | |
| 14 | INDEX UNIQUE SCAN | I_FILE2 | 1 | | 4 | +0 | 36070 | 36070 | | | . | | |
| 15 | HASH JOIN | | 1 | 3 | 1 | +3 | 1 | 0 | | | 3MB | | |
| 16 | NESTED LOOPS | | 1 | 3 | 121 | +3 | 1 | 1 | | | . | | |
| 17 | STATISTICS COLLECTOR | | | | 121 | +3 | 1 | 3273 | | | . | | |
| 18 | NESTED LOOPS | | 1 | 2 | 121 | +3 | 1 | 3273 | | | . | | |
| 19 | NESTED LOOPS | | 1 | 2 | 937 | +3 | 1 | 3273 | | | . | | |
| -> 20 | TABLE ACCESS STORAGE FULL | RECYCLEBIN$ | 5 | 2 | 953 | +3 | 1 | 584 | | | . | | |
| -> 21 | FIXED TABLE FIXED INDEX | X$KTFBUE (ind:1) | 1 | | 955 | +1 | 585 | 3273 | 14M | 107GB | . | 90.67 | gc buffer busy acquire (14) |
| | | | | | | | | | | | | | Cpu (588) |
| | | | | | | | | | | | | | cell single block physical read: RDMA (231) |
| | | | | | | | | | | | | | cell single block read request (12) |
| -> 22 | INDEX UNIQUE SCAN | I_FILE2 | 1 | | 953 | +3 | 3273 | 3273 | | | . | | |
| 23 | TABLE ACCESS CLUSTER | TS$ | 1 | 1 | | | | | | | . | | |
| 24 | INDEX UNIQUE SCAN | I_TS# | 1 | | | | | | | | . | | |
| 25 | TABLE ACCESS STORAGE FULL | TS$ | 1 | 1 | | | | | | | . | | |
| 26 | NESTED LOOPS | | 1 | 5 | | | | | | | . | | |
| 27 | NESTED LOOPS | | 1 | 5 | | | | | | | . | | |
| 28 | MERGE JOIN CARTESIAN | | 1375 | 5 | | | | | | | . | | |
| 29 | TABLE ACCESS STORAGE FULL | RECYCLEBIN$ | 5 | 2 | | | | | | | . | | |
| 30 | BUFFER SORT | | 275 | 3 | | | | | | | . | | |
| 31 | INDEX STORAGE FAST FULL SCAN | I_FILE2 | 275 | 1 | | | | | | | . | | |
| 32 | TABLE ACCESS CLUSTER | UET$ | 1 | | | | | | | | . | | |
| 33 | INDEX UNIQUE SCAN | I_FILE#_BLOCK# | 1 | | | | | | | | . | | |
| 34 | TABLE ACCESS CLUSTER | TS$ | 1 | | | | | | | | . | | |
| 35 | INDEX UNIQUE SCAN | I_TS# | 1 | | | | | | | | . | | |
| 36 | HASH JOIN | | 1 | 3 | | | | | | | . | | |
| 37 | NESTED LOOPS | | 1 | 3 | | | | | | | . | | |
| 38 | STATISTICS COLLECTOR | | | | | | | | | | . | | |
| 39 | NESTED LOOPS | | 1 | 2 | | | | | | | . | | |
| 40 | TABLE ACCESS STORAGE FULL | NEW_LOST_WRITE_EXTENTS$ | 1 | 2 | | | | | | | . | | |
| 41 | TABLE ACCESS CLUSTER | TS$ | 1 | | | | | | | | . | | |
| 42 | INDEX UNIQUE SCAN | I_TS# | 1 | | | | | | | | . | | |
| 43 | INDEX RANGE SCAN | I_FILE2 | 1 | 1 | | | | | | | . | | |
| 44 | INDEX FULL SCAN | I_FILE2 | 1 | 1 | | | | | | | . | | |
=================================================================================================================================================================================================================
Didn’t I mention that..? Probably did..
Really great thing about the SQL Monitor is, it not only shows the discrepancies in the estimated vs true cardinality, but it also gives you the notion where (in which SQL Plan Id) the time is wasted.
So, seeing the “Time Active” it’s clear it got stuck somewhere around the Plan IDs 19,20,21,22 – clearly NESTED LOOP, which is the usual suspect when the cardinality of driving table is underestimated – but we’ll soon get there.
SQL Monitor reports the Plan ID: 21 (FIXED TABLE FIXED INDEX: X$KTFBUE) as the one which took the longest time: 90.67% of (up to now) 955 seconds – see the “Activity(%)“.
One needs to read “between the lines” though – in the literal and metaphoric sense.
The true culprits sit “one and two floors up”, namely in the Plan Id: 20 and Plan Id: 19 – the first being the root cause and the second (Plan Id: 19) being the victim – victim of the math, so to say.
If you get a bit more attentive glance, you’ll notice the Rows(Estim) for ID 20 = 5 (reading RECYCLEBIN$ with FTS), whereas in reality (see: Rows(Actual)) it managed to gather 584 rows up to now and it’s still being executed.
As you can see, it already took 955 seconds for just single PDB and single section of the “UNION ALL”.
Although, to be fair, the phrase “for just single PDB” is somewhat overused – under normal circumstances this excerpt would be executed by one of the many PX slaves, so at least in terms of the “time elapsed”, it wouldn’t add up per each PDB.
On the other hand, keep in mind that this query appears as a parallel query only from the perspective of the root container. Each “PX slave” is effectively a single-threaded SQL statement executing at the PDB level. From the operating system’s point of view, it makes no difference whether that work originates in the CDB or a PDB.
So if each of those statements spends most of its time on CPU (see “Activity Detail (# samples)” in the last column of SQL Monitor), we could end up with very high CPU utilization across the whole cluster. In the worst case, you may see nearly one CPU core saturated per PDB per RAC node (assuming enough concurrency).
End of digression.
If you realize that ID=20 (TABLE ACCESS STORAGE FULL: RECYCLEBIN$) is involved in NESTED LOOP being its driving rowset, joining ID=21 (X$KTFBUE) as the driven (joined) rowset, you’ll start wondering – NL would probably be fine with 5 rows (see Rows (Estim)) returned from RECYCLEBIN$ (being driving rowset of that NL), but two orders of magnitude higher cardinality make that type of join questionable.
There is one more place in that report which makes the situation clear (i.e. why overly optimistically calculated NESTED LOOP join is the real problem).
Take a look at these two columns: Rows (Actual) at ID 20 and Execs at ID 21.
Under “Rows (Actual)” at Plan ID: 20 (FTS on RECYCLEBIN$) you can see: 584.
Now get down one line to Plan ID: 21 and navigate to “Execs” – it reads 585.
The Plan ID: 20 is the driving rowset and Plan ID: 21 is the driven one, both being children of the join represented by Plan ID: 19 – NESTED LOOP.
So the Plan ID: 21 is accessed (joined over some join predicate) for each row of the Plan ID: 20 (driving rowset).
If you then realize that (up to now) this cycle was executed 585 times (see: Execs), you’ll come to terms that the real “silent criminal” is the Plan ID: 19, that is: NESTED LOOP.
Why?
Because the optimizer “thought” it was supposed to “exec” (touch) the Plan ID: 21 just 5 times = “Rows (Estim)“. If CBO knew the true cardinality of the driving rowset (which in reality is much higher), perhaps it would decide to apply HASH JOIN.
Taking opportunity, there’s still yet one more detail worth explaining, namely the placement of the “Activity (%)” bound to Plan ID: 21.
Unreflective, sort of automatic interpretation might lead you to the conclusion that the Plan ID 21 is the main culprit – all in all this is where the “Activity (%)” got accumulated and reads: 90.67% (of the total time).
So the Plan ID: 21 (touching the “innocent” X$KTFBUE) accumulated practically all the wait events and the CPU cycles and managed to read enormous 107GB.
So, should we revert our efforts to examining X$KTFBUE?
Perhaps. But what can you really do with a fixed table like X$? Exactly — nothing. Which makes the whole situation mildly ironic.
And not that SQL Monitor reports untruth. Yes, most of the elapsed time was indeed spent on processing X$KTFBUE, but the X$KTFBUE is merely the sampled branch of the Nested Loop’s driving (say, deciding) first child, namely: RECYCLEBIN$.
I am aware that technically I will be repeating myself here, but perhaps it’s worth to take a look at this (fairly typical) situation from the different angle.
So, with the false estimation (5 rows) of the cardinality of RECYCLEBIN$ vs the facts (up to now, 584 rows), the X$KTFBUE was “hammered” 585 times, having returned 3273 rows.
Although the term “hammered” is an exaggeration. All in all, visiting an object 585 times shouldn’t last “forever”, should it?
So let’s keep going with the harassment: if you divide 3273 (total rows returned from X$KTFBUE) by the number of times it was “touched”: 585, you’ll get modest 5.6 rows returned on average per each single looping.
Then it gets even more scary if you push the numbers a step further.
The SQL has run for 955 seconds so far, and 90.67% of that time is attributed to “touching” X$KTFBUE. Converting that percentage into elapsed time is straightforward: 0.9067 × 955 ≈ 866 seconds spent accessing X$KTFBUE.
Pulling together the remaining details from SQL Monitor: if you take 866 seconds and divide by 585 accesses to X$KTFBUE, you end up at roughly 1.5 seconds per touch — an enormous number for what is essentially an in-memory X$ structure that should be quick to read (yet still it reports I/O family waits: cell single block physical read).
This is admittedly speculative, but STATISTICS COLLECTOR (Plan ID: 17) and the associated adaptive behavior may have contributed to the problem.
Either way, STATISTICS COLLECTOR is usually a symptom of cardinality uncertainty: Oracle is collecting runtime row counts because the optimizer’s estimates are unreliable – and that’s consistent with the estimate-vs-actual discrepancy we see here.
Put differently: across those ~1.5 seconds, Oracle returns only about 5.6 rows on average from X$KTFBUE. That’s a strong indication that something is seriously off in the way X$KTFBUE is being accessed.
We shouldn’t dig any deeper than X$ itself — especially since we already suspect (and will soon confirm) that the CBO was working with incorrect information.
Anyway, I regret not having traced it with 10046 — it might have revealed more, especially since SQL Monitor clearly reported I/O waits for Plan ID 21, and a 10046 trace would most likely provide the finer details.
“Then take p1, p2, p3 and obj# from Active Session History — or even from Historical ASH (DBA_HIST_ACTIVE_SESS_HISTORY) if you’ve already lost the data from GV$ACTIVE_SESSION_HISTORY.”
That’s true. ASH is very generous in terms of the information it exposes – and it even ties those details to a specific Plan ID (in my case, I’d be looking at Plan ID 21).
All of that said, I’ll shamelessly disarm you: I’m finishing this post in February 2026, while the analysis was done back in June 2024 (you can easily spot the dates in the SQL Monitor reports — ahem… yet another “great feature” of SQL Monitor 😉). So by now I’ve lost both the data and access to that cluster.
In-between summary: it’s by the way pretty typical (and sometimes disastrous) situation where the cardinality of the driving table is significantly underestimated, making the CBO calculating the cost of join cardinality based on NL (Nested Loop) lower than HJ (Hash Join).
If you’re thoroughly examining the output of the SQL Monitor, you’ll notice another (more severe?) disaster lurking in the bushes, namely the Plan Id: 28, 29, 30.
The SQL execution didn’t even reach to these IDs (see Time Active and/or Start Active – both are blank).
Now, the Plan ID: 29 shows again the FTS on RECYCLEBIN$ with wrong estimation (I promise, we’ll get to that): again just 5 rows, being the driving rowset to the driven rowset under Plan ID: 30 (effectively: I_FILE2).
This time though CBO went for Cartesian Join of the two, expecting (false) 1375 rows as the join cardinality: 5 x 275 – following the math which is applied to Cartesian Joins:
( all rows of the first branch ) x ( all rows of second branch ).
In reality it would involve many orders of magnitude higher join cardinality.
All above with the assumption that the cardinality of I_FILE2 is indeed close to 275 (it may though be true, at least in terms of order of magnitude, as it probably represents the number of files).
As you may conclude, with the Cartesian Join, even the minor underestimation of either branch may bring disastrous footprint on the elapsed time.
With all that in mind I am merely afraid to extrapolate the final elapsed time for that query. Alone the Plan IDs 19,20,21 would end up with roughly 5 hours:
Rough calculation: assuming linear relation –
“588 loops” took 955 seconds, so with 10940 rows (true cardinality of RECYCLEBIN$ – see below) it should roughly take 17’900 seconds, which makes ~5 hours.
I am aware I am not quite accurate here, taking into account that these Plan Ids constitute 90% (not 100%) of the up to now elapsed time.
All in all – it sounds scary.
There is yet another point to take care of.
You can sing poems about SQL Monitor – it has plenty of advantages. However, there’s one drawback: it doesn’t show the filter predicates. It doesn’t even flag the relevant Plan ID with an asterisk (*) the way DBMS_XPLAN.DISPLAY_CURSOR does.
In that case you may e.g. assume that the FTS on SYS.RECYCLEBIN$ which would obviously need to read all 10940 rows (see below) – would leave just these 5 rows as a result of some filtering, to be then joined over the Nested Loop to the X$KTFBUE.
In such case NL might be completely legitimate.
But this is not the case here.
How do I know it?
First, even if a filter had been applied, it would have been wrong anyway: the operation had already read 584 rows (estimated: 5). So at that point the estimate was off by roughly two orders of magnitude.
Secondly – and this is where the classical SQL Plan output helps – if you take a look at the output of dbms_xplan.display_cursor, you won’t see any predicate which would filter out these 10k rows down to 5.
So, even without checking the statistics I have reasons to believe that 5 rows represent the FTS without any accompanied filtering.
Anyway – let’s find out the reality – namely, let us simply check the true cardinality of the SYS.RECYCLEBIN$ on that PDB:
SQL> select count(*) cnt from SYS.RECYCLEBIN$ ;
CNT
----------------
10,940
Oh, much higher than 5 – although there is still another trap here and I will get to this soon.
For now, let’s confirm what CBO “knows”.
Here it is – derived from dba_tab_statistics, indeed, just 5 rows :
select * from dba_tab_statistics where upper(owner) = upper( nvl('&owner',owner) ) and upper(TABLE_NAME) like upper('&table_name')
@pr
SYS> @lstabv
Enter value for owner: sys
Enter value for table_name: RECYCLEBIN$
old 3: select * from dba_tab_statistics where upper(owner) = upper( nvl('&owner',owner) ) and upper(TABLE_NAME) like upper('&table_name')
new 3: select * from dba_tab_statistics where upper(owner) = upper( nvl('sys',owner) ) and upper(TABLE_NAME) like upper('RECYCLEBIN$')
==============================
OWNER : SYS
TABLE_NAME : RECYCLEBIN$
PARTITION_NAME :
PARTITION_POSITION :
SUBPARTITION_NAME :
SUBPARTITION_POSITION :
OBJECT_TYPE : TABLE
NUM_ROWS : 5
BLOCKS : 1
EMPTY_BLOCKS : 0
AVG_SPACE : 0
CHAIN_CNT : 0
AVG_ROW_LEN : 81
AVG_SPACE_FREELIST_BLOCKS : 0
NUM_FREELIST_BLOCKS : 0
AVG_CACHED_BLOCKS :
AVG_CACHE_HIT_RATIO :
IM_IMCU_COUNT :
IM_BLOCK_COUNT :
IM_STAT_UPDATE_TIME :
SCAN_RATE :
SAMPLE_SIZE : 5
LAST_ANALYZED : 16-11-2022 15:07:22
GLOBAL_STATS : YES
USER_STATS : NO
STATTYPE_LOCKED :
STALE_STATS : YES
NOTES :
SCOPE : SHARED
=========================
“pr” is the great little helper pivoting the output which I shamelessly took from Tanel Poder’s library.
Adequate statistics are always in great demand
Let’s finally give the CBO the facts and gather the statistics for the SYS.RECYCLEBIN$.
With that in pocket, CBO should hopefully go away from the NESTED LOOP and take on HASH JOIN with SYS.RECYCLEBIN$ being either a Hash-Table or a Probed-Table in the HASH JOIN.
SYS@CDB->PDB0030> exec DBMS_STATS.GATHER_TABLE_STATS ('SYS','RECYCLEBIN$') ;
SYS@CDB->PDB0030> @lstabv
Enter value for owner: sys
Enter value for table_name: RECYCLEBIN$
old 3: select * from dba_tab_statistics where upper(owner) = upper( nvl('&owner',owner) ) and upper(TABLE_NAME) like upper('&table_name')
new 3: select * from dba_tab_statistics where upper(owner) = upper( nvl('sys',owner) ) and upper(TABLE_NAME) like upper('RECYCLEBIN$')
==============================
OWNER : SYS
TABLE_NAME : RECYCLEBIN$
PARTITION_NAME :
PARTITION_POSITION :
SUBPARTITION_NAME :
SUBPARTITION_POSITION :
OBJECT_TYPE : TABLE
NUM_ROWS : 10940
BLOCKS : 142
EMPTY_BLOCKS : 0
AVG_SPACE : 0
CHAIN_CNT : 0
AVG_ROW_LEN : 83
AVG_SPACE_FREELIST_BLOCKS : 0
NUM_FREELIST_BLOCKS : 0
AVG_CACHED_BLOCKS :
AVG_CACHE_HIT_RATIO :
IM_IMCU_COUNT :
IM_BLOCK_COUNT :
IM_STAT_UPDATE_TIME :
SCAN_RATE :
SAMPLE_SIZE : 10940
LAST_ANALYZED : 27-06-2024 15:23:12
GLOBAL_STATS : YES
USER_STATS : NO
STATTYPE_LOCKED :
STALE_STATS : NO
NOTES :
SCOPE : SHARED
=========================
Finally, let’s execute that tiny part :
SYS@CDB-> PDB0030> select /*+ monitor NO_ADAPTIVE_PLAN NO_STATEMENT_QUEUING RESULT_CACHE (SYSOBJ=FALSE) */ "TABLESPACE_NAME","BYTES" FROM "SYS"."DBA_FREE_SPACE" "K" WHERE 1=1 ;
62302 rows selected.
SQL_ID: 7mxgy3n3hra71
Elapsed: 00:00:02.24
Ah, look – the execution of this query took just ~2 seconds!
As expected, the Nested Loop got replaced by the Hash Join with the SYS.RECYCLEBIN$ as the Probed-Table (see the SQL Monitor below and the marked lines).
In addition, the SYS.RECYCLEBIN$ has not been directly joined to X$KTFBUE, but to TS$)
Most importantly, the cardinality of SYS.RECYCLEBIN$ is now estimated nearly accurately (no wonder, I have gathered the statistics) – see Plan ID 21 and E-Rows/ID vs A-Rows/ID.
SQL Monitor (my version – you may need to scroll to the right to see the objects) :
SQL_ID SQL_EXEC_ID MIN(SQL_EXEC_START) MAX(LAST_REFRESH_TI MAX(SQL_PLAN_HASH_VALUE) ELASEC
--------------- ----------- ------------------- ------------------- ------------------------ ----------
7mxgy3n3hra71 117440512 27-06-2024 15:36:35 27-06-2024 15:36:38 814635842 3
==============================
STATUS : DONE (ALL ROWS)
FIRST_REFRESH_TIME : 27-06-2024 15:36:35
LAST_REFRESH_TIME : 27-06-2024 15:36:38
REFRESH_COUNT : 4158
SQL_ID : 7mxgy3n3hra71
SQL_EXEC_ID : 117440512
SQL_PLAN_HASH_VALUE : 814635842
SQL_CHILD_ADDRESS : 0000002DAAEEC130
> Elapsed Time [Sec / Mins] : 2 / 0
> CPU Time [Sec / Mins] : 2 / 0
> I/O Time [Sec / Mins] : 0 / 0
> QUEUING_TIME [Sec / Mins] : 0 / 0
> Cluster WT [Sec / Mins] : 0 / 0
> Concurrency Wait [Sec / Min: 0 / 0
> PL/SQL Exec Time [Sec / Min: 0 / 0
> Java Exec Time [Sec / Mins] : 0 / 0
FETCHES : 4155
BUFFER_GETS : 115757
DISK_READS : 10042
DIRECT_WRITES : 0
ACTION :
PROCESS_NAME : ora
USERNAME : SYS
sid:serial : 6324:49834
IS_ADAPTIVE_PLAN : N
IS_FINAL_PLAN : Y
Exec from: PL/SQL:proc: : top-lvl-SQL
=========================
Last(s/ed)
A-Tmp / A-PGA / is/was active Executed Samples
Run MaxTmp[MB] MaxPGA[MB] A-Read MB A-Ret MB % Offl PATH STARTS OUTPUT_ROWS A-Rows/ID E-Rows/ID [Secs] Secs ago from ASH[%] ID P-ID(POS) DEPTH TEXT_LINE
--- --------------- -------------- ------------ ------------ ------- ----------------------------------- ---------------- ---------------- ---------------- -------------------- ------------- -------- -------------- ------- --------------- ----- ---------------------------------------------------------------
. / . / 0 0 .00 0 1 62,302 62,302 3 79 . 0 0 SELECT STATEMENT
. / . / 0 0 .00 0-1 1 62,302 62,302 10,922 3 79 . 1 P-ID: 0 (1) 1 VIEW -> SYS.DBA_FREE_SPACE
. / . / 0 0 .00 0-1-2 1 62,302 62,302 3 79 . 2 P-ID: 1 (1) 2 UNION-ALL
. / . / 0 0 .00 0-1-2-3 1 0 0 1 . 3 P-ID: 2 (1) 3 NESTED LOOPS
. / . / 0 0 .00 0-1-2-3-4 1 0 0 1 0 82 . 4 P-ID: 3 (1) 4 NESTED LOOPS
. / . / 0 0 .00 0-1-2-3-4-5 1 278 278 275 0 82 . 5 P-ID: 4 (1) 5 INDEX FULL SCAN -> SYS.I_FILE2
. / . / 0 0 .00 0-1-2-3-4-5 278 0 0 1 0 82 . 6 P-ID: 4 (2) 5 TABLE ACCESS CLUSTER -> SYS.FET$
. / . / 0 0 .00 0-1-2-3-4-5-6 278 278 1 1 0 82 . 7 P-ID: 6 (1) 6 INDEX UNIQUE SCAN -> SYS.I_TS#
. / . / 0 0 .00 0-1-2-3-4 0 0 0 1 . 8 P-ID: 3 (2) 4 TABLE ACCESS CLUSTER -> SYS.TS$
. / . / 0 0 .00 0-1-2-3-4-5 0 0 0 1 . 9 P-ID: 8 (1) 5 INDEX UNIQUE SCAN -> SYS.I_TS#
. / . / 0 0 .00 0-1-2-3 1 36,066 36,066 346 1 81 . 10 P-ID: 2 (2) 3 NESTED LOOPS
. / . / 4.7 0 0 .00 0-1-2-3-4 1 36,066 36,066 346 1 81 . 11 P-ID: 10 (1) 4 HASH JOIN
. / . / 0 0 .00 0-1-2-3-4-5 1 278 278 274 0 82 . 12 P-ID: 11 (1) 5 TABLE ACCESS STORAGE FULL -> SYS.TS$
. / . / 2 2 .00 0-1-2-3-4-5 1 36,066 36,066 347 1 81 . 13 P-ID: 11 (2) 5 FIXED TABLE FULL -> SYS.X$KTFBFE
. / . / 0 0 .00 0-1-2-3-4 36,066 36,066 1 1 1 81 . 14 P-ID: 10 (2) 4 INDEX UNIQUE SCAN -> SYS.I_FILE2
. / . / 0 0 .00 0-1-2-3 1 26,236 26,236 10,573 2 79 . 15 P-ID: 2 (3) 3 NESTED LOOPS
. / . / 0 0 .00 0-1-2-3-4 1 26,236 26,236 10,746 2 79 . 16 P-ID: 15 (1) 4 NESTED LOOPS
. / . / 4.2 0 0 .00 0-1-2-3-4-5 1 10,764 10,764 10,746 2 79 . 17 P-ID: 16 (1) 5 HASH JOIN
. / . / 0 0 .00 0-1-2-3-4-5-6 1 278 278 274 0 81 . 18 P-ID: 17 (1) 6 JOIN FILTER CREATE -> SYS.:BF0000
. / . / 0 0 .00 0-1-2-3-4-5-6-7 1 278 278 274 0 81 . 19 P-ID: 18 (1) 7 TABLE ACCESS STORAGE FULL -> SYS.TS$
. / . / 0 0 .00 0-1-2-3-4-5-6 1 10,764 10,764 10,940 2 79 . 20 P-ID: 17 (2) 6 JOIN FILTER USE -> SYS.:BF0000
. / . / 0 0 .00 0-1-2-3-4-5-6-7 1 10,764 10,764 10,940 2 79 . 21 P-ID: 20 (1) 7 TABLE ACCESS STORAGE FULL -> SYS.RECYCLEBIN$
. / . / 76 76 .00 0-1-2-3-4-5 10,764 26,236 2 1 2 79 (1) 50[%] 22 P-ID: 16 (2) 5 FIXED TABLE FIXED INDEX -> SYS.X$KTFBUE (ind:1)
. / . / 0 0 .00 0-1-2-3-4 26,236 26,236 1 1 2 79 . 23 P-ID: 15 (2) 4 INDEX UNIQUE SCAN -> SYS.I_FILE2
. / . / 0 0 .00 0-1-2-3 1 0 0 1 . 24 P-ID: 2 (4) 3 NESTED LOOPS
. / . / 0 0 .00 0-1-2-3-4 1 0 0 842 . 25 P-ID: 24 (1) 4 NESTED LOOPS
. / . / 0 0 .00 0-1-2-3-4-5 1 0 0 1 . 26 P-ID: 25 (1) 5 NESTED LOOPS
. / . / 0 0 .00 0-1-2-3-4-5-6 1 0 0 1 . 27 P-ID: 26 (1) 6 NESTED LOOPS
. / . / 0 0 .00 0-1-2-3-4-5-6-7 1 0 0 1 . 28 P-ID: 27 (1) 7 TABLE ACCESS STORAGE FULL -> SYS.UET$
. / . / 0 0 .00 0-1-2-3-4-5-6-7 0 0 0 1 . 29 P-ID: 27 (2) 7 TABLE ACCESS CLUSTER -> SYS.TS$
. / . / 0 0 .00 0-1-2-3-4-5-6-7-8 0 0 0 1 . 30 P-ID: 29 (1) 8 INDEX UNIQUE SCAN -> SYS.I_TS#
. / . / 0 0 .00 0-1-2-3-4-5-6 0 0 0 1 . 31 P-ID: 26 (2) 6 INDEX UNIQUE SCAN -> SYS.I_FILE2
. / . / 0 0 .00 0-1-2-3-4-5 0 0 0 842 . 32 P-ID: 25 (2) 5 INDEX RANGE SCAN -> SYS.RECYCLEBIN$_TS
. / . / 0 0 .00 0-1-2-3-4 0 0 0 2 . 33 P-ID: 24 (2) 4 TABLE ACCESS BY INDEX ROWID -> SYS.RECYCLEBIN$
. / . / 0 0 .00 0-1-2-3 1 0 0 1 . 34 P-ID: 2 (5) 3 NESTED LOOPS
. / . / 0 0 .00 0-1-2-3-4 1 0 0 1 . 35 P-ID: 34 (1) 4 NESTED LOOPS
. / . / 0 0 .00 0-1-2-3-4-5 1 0 0 1 . 36 P-ID: 35 (1) 5 TABLE ACCESS STORAGE FULL -> SYS.NEW_LOST_WRITE_EXTENTS$
. / . / 0 0 .00 0-1-2-3-4-5 0 0 0 1 . 37 P-ID: 35 (2) 5 TABLE ACCESS CLUSTER -> SYS.TS$
. / . / 0 0 .00 0-1-2-3-4-5-6 0 0 0 1 . 38 P-ID: 37 (1) 6 INDEX UNIQUE SCAN -> SYS.I_TS#
. / . / 0 0 .00 0-1-2-3-4 0 0 0 1 . 39 P-ID: 34 (2) 4 INDEX RANGE SCAN -> SYS.I_FILE2
I’ll be happy to publish the script once I’ve removed a few annoying bugs.
It was absolutely not my intention to copy-paste something, whet your appetite, and then leave you hungry. This is simply the only dump I had at hand while working on the case – something that shows the end result.
In essence, it doesn’t differ much from the output of DBMS_SQLTUNE.REPORT_SQL_MONITOR. In fact, it contains fewer details.
There are, however, a few points worth mentioning – things I decided to include (or add on top) which SQL Monitor doesn’t provide.
The script is based on V$SQL_MONITOR and V$SQL_PLAN_MONITOR.
The fields I find useful, but that are not shown in DBMS_SQLTUNE.REPORT_SQL_MONITOR (or: I never found a way to make them visible), are:
PLAN_POSITION → shown as (POS) in brackets next to P-ID. It gives the ordinal numbers (1) and (2) for joined rowsets of the common (join) parent, shown first to the left – very useful when trying to match distant branches of a join. As you can see though, sometimes the script shows numbers like (5) – I will take a look at that, obviously it makes no sense (5th child when joining?)
OUTPUT_ROWS → total actual #rows (then per exec = OUTPUT_ROWS / STARTS reported as A-Rows/ID). I keep the numbers unrounded (no M/G), regardless of how large they are. Matching A-Rows/ID from “driving set” of Nested Loop with STARTS from the “driven set” makes pretty good opportunity to trace the progress of that NL (with reasonable number of refreshes).
PLAN_CARDINALITY (E-Rows/ID) → estimated rows, also unrounded.
% Offl (derived from PHYSICAL_READ_BYTES and IO_INTERCONNECT_BYTES) → offload percentage per Plan ID (relevant on Exadata).
Two notes
First: “the star”.
I have not intercepted the SQL plan with dbms_xplan.display_cursor (or the like) before the SQL got tuned.
I did it later, upon tuning and achieving the HASH JOIN in place of NESTED LOOP.
There was an asterisk (*) in the tuned version next to SYS.RECYCLEBIN$, so you might think there was some filtering which could have come from a predicate.
But then there is yet another small “trap” which needs to be properly interpreted: the filtering came from the bloom filter (JOIN FILTER USE: BF0000) before hash-joining the probed table. So it may look like that :
20 JOIN FILTER USE | :BF0000
* 21 TABLE ACCESS STORAGE FULL | RECYCLEBIN$
Second: “X$KTFBUE looks ok now“
If you take a close look at the SQL Plan from my version of the SQL Monitor and navigate to the place where X$KTFBUE was accessed, you’ll notice that this time the X$KTFBUE was (again) accessed as the “driven” table of the join accomplished by NESTED LOOP (see Plan ID: 16), at the position of the Plan ID: 22 from the “driving” rowset being the product of the HASH JOIN (of TS$ and RECYCLEBIN$) placed at Plan ID: 17 (this is where the POS mentioned above comes in handy – see the “17 P-ID: 16 (1)” and “22 P-ID: 16 (2)“.
The (1) and (2) are informing me that these are the 1ˢᵗ and 2ⁿᵈ branch of the parent join to be found at position of Plan ID: 16 – being NESTED LOOP.
So, X$KTFBUE has been accessed 10764 times and.. no problem with that, whereas before it took 955 seconds after the same structure was accessed only 585 times?
Honestly, I have no credible explanation for that, except one (vital?) one: the estimated (E-Rows/ID) and actual (A-Rows/ID) cardinalities for the most inner parts of the SQL Plan (specifically, for the problematic joins of TS$, RECYCLEBIN$, X$KTFBUE and I_FILE2) are very close now – and definitely not away by order of magnitude, thus also no need for an adaptive plan.
The first pretty large discrepancy (E: 347 rows, A: 36066 rows) comes at Plan ID: 13 with… the same structure on the stage: X$KTFBUE.
Then it’s worth to note that Plan ID: 13 is in fact a one-pass full table scan of X$KTFBUE (being the “probed table” of the HASH JOIN from Plan ID: 11). This might suggest that HASH JOINs are by far safer, even if cardinality of either one: hash table or probe table (or even both) are largely misestimated. All in all, they are scanned only once, possibly filtered before joining.
Some “gotchas”
In the end, I can’t resist adding some gotchas.
How many times have you heard the statement “it’s running really slow, but we changed nothing”.
Now, imagine the following scenario (which may very well have been what happened above).
You manage a fairly large multitenant environment with a high number of PDBs (typical for development). After a series of failed tests, developers run a large wave of DROP TABLE statements across many – or even all – PDBs.
That obviously leaves thousands of fresh entries in SYS.RECYCLEBIN$ in each PDB – something the optimizer has (so far) no clue about.
As a result, the optimizer still assumes that SYS.RECYCLEBIN$ contains only a tiny number of rows, chooses a Nested Loop with SYS.RECYCLEBIN$ as the driving table and goes straight into disaster.
It obviously can work the other way around: “we did nothing and suddenly the problem is gone”.
How could that happen?
A DBA runs PURGE DBA_RECYCLEBIN across all PDBs effectively reducing the row count in every SYS.RECYCLEBIN$ back to zero. In that case, the optimizer doesn’t even need to “notice” the earlier spike after the drops – because now a Nested Loop (or even a Cartesian Join) at that point in the query becomes perfectly reasonable again, and the query runs like a dream.
In a way, this is also my attempt to defend the proverbial usual suspect: “OEM is always slow”.
But as the saying goes, the shoemaker’s children go barefoot: we tune the client’s application and database, while our own backyard is often left neglected 😉
The source of the SQL script which I used above to navigate to PDBs and problematic SQL_IDs:
-- lssqlid_across_pdbs.sql
-- Author: Piotr Sajda (piotrsajda.com)
-- The script operates across containers in multitenant environment and it needs to be executed from the CDB
-- It reports the elapsed time (and many other details) for SQL_ID across PDBs
-- Elapsed time is expressed as: cpu/exec[us] , avg[ms]/exec , avg[s]/exec
set echo off
set lines 300
col first_seen form a20
col type form a7
col LAST_ACTIVE form a20
col "cpu/exec[us]" form 999,999,999,999
col CBO_HASHVAL form 9999999999
col ins form 999
col chld form 9999
col PARS_SCHEMA form a20
col "bi-sens" form a5
col "bi-aware" form a8
col "sha" form a5
col "#inval" form 9999
col "#pars" form 999,999
col "Binds" form a5
col "SQLProf" form a4 head "SQL|Prof"
col "SQLPatch" form a5 head "SQL|Patch"
col "SQLBsln" form a5 head "SQL|Bsln"
col "avgIO[KB]" form 9999
col px form 999
col "Exec-from:" form a12
col "conc/exec" form 999,999
col "avg[s]/exec" form 999,999 head "avg[s]|exec"
col "avg[ms]/exec" form 999,999,999
col "I/O_svd[%]" form 99.99
col PLAN_HASHVAL form 99999999999
col "Offl[%]" form 999.99
col "Flash[%]" form 999.99
col "Line#" form 9999999
col "Exe-from:line#" form a28
col "Reoptim?" form A8 head "Candid.|4reoptim.|cardfeed"
col PLSQL_EXEC_TIME_MS_PER_EXEC form 999,999 head "PL/SQL|exec[ms]"
col "bi-aware" head "Bind|Aware?"
col "Adap-sql" form a5 head "Adapt|Plan"
col "bi-sens" head "Bind|Sens"
col FORCE_MATCHING_SIGNATURE form 99999999999999999999999999999
col total_execs form 999,999,999
col "rows/exec" form 999,999,999
col "buffgets/exec" form 999,999,999,999
col "phyreqs/exec" form 999,999,999
col PDB_NAME form a10
SELECT /* donotlookup_thisquery */
p.name AS pdb_name,
q.inst_id ,
p.con_id ,
sql_id,
CHILD_NUMBER chld,
IS_BIND_SENSITIVE "bi-sens" ,
IS_BIND_AWARE "bi-aware" ,
case when BIND_DATA is null then 'No' else 'Yes' end "Binds",
case when SQL_PROFILE is null then 'No' else 'Yes' end "SQLProf",
case when SQL_PATCH is null then 'No' else 'Yes' end "SQLPatch",
case when SQL_PLAN_BASELINE is null then 'No' else 'Yes' end "SQLBsln",
PLAN_HASH_VALUE PLAN_HASHVAL,
OPTIMIZER_ENV_HASH_VALUE CBO_HASHVAL,
PARSING_SCHEMA_NAME pars_schema,
-- case when PROGRAM_ID is null or PROGRAM_ID = 0 then 'no-dep' else 'exec from PL/SQL' end "Exe-from:line#",
-- case when PROGRAM_ID is null then 'no-dep' else (select object_name from dba_objects o where PROGRAM_ID = o.OBJECT_ID)||':'||PROGRAM_LINE# end "Exe-from:line#",
-- case when PROGRAM_ID is null OR PROGRAM_ID = 0 then 'top-lvl-sql' else (select substr(object_name,1,10) from dba_objects o where PROGRAM_ID = o.OBJECT_ID) end "Exec-from:",
-- PROGRAM_LINE# "Line#" ,
executions total_execs,
CPU_TIME/decode(nvl(executions,0),0,1,executions) "cpu/exec[us]",
-- PLSQL_EXEC_TIME/1000/to_number(decode(executions,0,1,executions)) PLSQL_EXEC_TIME_MS_PER_EXEC,
round( (elapsed_time/1000)/decode(nvl(executions,0),0,1,executions) / decode(px_servers_executions,0,1,px_servers_executions/decode(nvl(executions,0),0,1,executions)) , 2) "avg[ms]/exec",
-- CONCURRENCY_WAIT_TIME/decode(nvl(executions,0),0,1,executions) "conc/exec",
round( (elapsed_time/1000/1000)/decode(nvl(executions,0),0,1,executions) / decode(px_servers_executions,0,1,px_servers_executions/decode(nvl(executions,0),0,1,executions)) , 2) "avg[s]/exec",
px_servers_executions/decode(nvl(executions,0),0,1,executions) px,
case when IO_CELL_OFFLOAD_ELIGIBLE_BYTES > 0 then
decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,0,100 * (IO_CELL_OFFLOAD_ELIGIBLE_BYTES - IO_INTERCONNECT_BYTES)/decode(IO_CELL_OFFLOAD_ELIGIBLE_BYTES,0,1,IO_CELL_OFFLOAD_ELIGIBLE_BYTES))
else 0 end "Offl[%]",
case when OPTIMIZED_PHY_READ_REQUESTS > 0 then
-- decode(PHYSICAL_READ_REQUESTS,0,0,100 * (OPTIMIZED_PHY_READ_REQUESTS-PHYSICAL_READ_REQUESTS) / decode(PHYSICAL_READ_REQUESTS,0,1,PHYSICAL_READ_REQUESTS))
OPTIMIZED_PHY_READ_REQUESTS/PHYSICAL_READ_REQUESTS*100
else 0 end "Flash[%]",
round(ROWS_PROCESSED/ decode(nvl(executions,0),0,1,executions) ,2) "rows/exec",
round(BUFFER_GETS/ decode(nvl(executions,0),0,1,executions) ,0) "buffgets/exec",
round(PHYSICAL_READ_REQUESTS/decode(nvl(executions,0),0,1,executions) ,0) "phyreqs/exec"
-- round( ( (PHYSICAL_READ_BYTES/1024)/ decode(nvl(executions,0),0,1,executions) ) / (decode(nvl(PHYSICAL_READ_REQUESTS,0),0,1,PHYSICAL_READ_REQUESTS)/ decode(nvl(executions,0),0,1,executions) ) ,1 ) "avgIO[KB]",
-- to_date(FIRST_LOAD_TIME,'yyyy-mm-dd/hh24:mi:ss') first_seen,
-- to_char(LAST_ACTIVE_TIME,'dd-mm-yyyy hh24:mi:ss') LAST_ACTIVE
FROM CONTAINERS( GV$SQL ) q
JOIN GV$PDBS p
ON p.inst_id = q.inst_id
AND p.con_id = q.con_id
WHERE
sql_id = nvl('&sqlid' , sql_id)
AND q.inst_id = to_number( nvl('&inst_id' , q.inst_id) )
AND upper(sql_text) like upper(nvl('&sql_text_prefix%' , sql_text))
AND upper(PARSING_SCHEMA_NAME) like upper(nvl('&parsing_schema',PARSING_SCHEMA_NAME))
AND sql_text not like 'SELECT /* donotlookup_thisquery */%'
AND PLAN_HASH_VALUE like nvl('&PLAN_HASH_VALUE%' , PLAN_HASH_VALUE)
AND FORCE_MATCHING_SIGNATURE = to_number( nvl('&FORCE_MATCHING_SIGNATURE',FORCE_MATCHING_SIGNATURE) )
AND LAST_ACTIVE_TIME > ( sysdate - nvl('&How_Many_Minutes_Back',99999999)/24/60 )
AND (elapsed_time/1000/1000)/decode(nvl(executions,0),0,1,executions) / decode(px_servers_executions,0,1,px_servers_executions/decode(nvl(executions,0),0,1,executions)) >= to_number( nvl('&min_ela_secs',0) )
order by "avg[s]/exec", LAST_ACTIVE_TIME, CHILD_NUMBER, LAST_ACTIVE_TIME, total_execs
/

Leave a Reply