Be cautious when the optimizer estimates a cardinality of 1 - Part2 - Piotr Sajda

Let us finally focus on my problem.
I will come back to my query :

SELECT
(some columns from the tables below)
FROM
    VERYSECOWNR.pr_secrtb_secrtpos_info  inf,
    VERYSECOWNR.pr_secrtb_secrtpos_value val,
    VERYSECOWNR.pr_secrtb_my_versec      sta,
    VERYSECOWNR.pr_secrtb_det1           uni,
    VERYSECOWNR.pr_rverysecr             res,
    VERYSECOWNR.pr_realsec               dev,
    VERYSECOWNR.pr_reallysec             loc
WHERE
        inf.hist_id = val.hist_secret_id (+)
    AND inf.myversecret_column = sta.secrid (+)
    AND inf.sensor_unit_autoid = uni.secrid
    AND inf.secretlo = loc.secrid (+)
    AND uni.secdev = dev.secrid
    AND dev.mysecret_ref = res.secrid (+)
    AND inf.hist_id >= :1
    AND val.hist_id (+) >= :2
order by inf.hist_id , inf.verysecret_timstmp
FETCH NEXT 50000 ROWS WITH TIES

As a reminder – the predicates which are of the greatest importance are the following :

    AND inf.hist_id >= :1
    AND val.hist_id (+) >= :2

The state before tuning

If not timed out after 2 minutes (such constraint was applied at the application layer) with the SQL Plan depicted below this query was running until being hit by ORA-01555 (snapshot too old).
In practice, this meant more than one hour, but the actual duration still depended heavily on the size of the range.
That said, if UNDO were not the limiting factor, but the query would be exposed to the need of reconstructing the buffers (or physical blocks) to the required point in time, I can easily imagine this query running days long (this will be addressed further).
At this point I can’t refrain from shortly reporting certain topic of a debate I was involved in, in which my opponent stated that (simplifying) “nowadays, hardware is both, cheap and powerful enough to mitigate deficiency of every SQL and by that replace SQL tuning”.
Well, in theory in the far end – yes.
On the other hand, I am trying to imagine achieving such hardware-based result of optimization getting to 1 second with the original SQL running days long.
Btw. it’s fun how easily one can demonstrate the scale of impact of cardinality misestimation during CBO calculation by applying sort of “slider”.
By “slider” I mean shifting it from “left to right” and by that leaving a broader or narrower range of rows to be processed by NESTED LOOP, thus achieving longer or shorter exec time, respectively.

I will focus on the example whose execution time took ~490 seconds (upon tuning, not even 1 second).

Let’s bring the facts. First: statistics.
Important are: LOW_VAL and HI_VAL.

OWNER              TABLE_NAME                     COLUMN_NAME                    DATA_TYPE           NDV     DENSITY  NUM_NULLS LOW_VAL                             HI_VAL                              HISTOGRAM       NUM_BUCKETS LAST_ANALYZED         
------------------ ------------------------------ ------------------------------ ------------ ---------- ----------- ---------- ----------------------------------- ----------------------------------- --------------- ----------- --------------------
VERYSECOWNR        PR_SECRTB_SECRTPOS_INFO        HIST_ID                        NUMBER          1496333   .00000067          0 283,438,051                         284,934,609                           NONE                      1 10-03-2026 22:01:04 
VERYSECOWNR        PR_SECRTB_SECRTPOS_VALUE       HIST_ID                        NUMBER          3108134   .00000032          0 583,330,185                         586,438,812                           NONE                      1 10-03-2026 22:01:10

Intercepted (from V$SQL_MONITOR) bind variables of the execution which took place around 15:14.
It is worth noting that the LOW_VAL and HI_VAL values shown above — that is, the minimum and maximum values — reflect the point in time at which the statistics were gathered (22:01 on the previous day).


<binds>
  <bind name=":1" pos="1" dty="2" dtystr="NUMBER" maxlen="22" len="6">285,970,450</bind>
  <bind name=":2" pos="2" dty="2" dtystr="NUMBER" maxlen="22" len="6">588,592,848</bind>
</binds>

The reality :

15:14:43 SYS @ REALDB01> @minmax
15:14:44 SYS @ REALDB01> select min(HIST_ID) min_hist_id_info, max(HIST_ID) max_hist_id_info, max(HIST_ID) - min(HIST_ID) min_max_range from VERYSECOWNR.PR_SECRTB_SECRTPOS_INFO ;

MIN_HIST_ID_INFO MAX_HIST_ID_INFO    MIN_MAX_RANGE
---------------- ---------------- ----------------
     284,496,403      285,983,828        1,487,425

15:14:44 SYS @ REALDB01> select min(HIST_ID) min_hist_id_val, max(HIST_ID) max_hist_id_val , max(HIST_ID) - min(HIST_ID) min_max_range from VERYSECOWNR.PR_SECRTB_SECRTPOS_VALUE ;

MIN_HIST_ID_VAL  MAX_HIST_ID_VAL    MIN_MAX_RANGE
---------------- ---------------- ----------------
     585,525,554      588,621,902        3,096,348

Referring to the intro, let’s recap the corresponding data.

The values of both bind variables fall outside the min–max range indicated by the statistics. As we will see shortly, this fact is reported in the 10053 trace file (CBO trace) as follows:
We will shortly see that this fact will be reported in the 10053 (CBO trace file) as :
Using prorated density: of col # as selectivity of out-of-range/non-existent value pred0

Getting closer to the numbers:

:1 Bind: Max from stats: 284,934,609 Real max: 285,983,828
Value of the 1st bind var: 285,970,450
:2 Bind: Max from stats: 586,438,812
Real max: 588,621,902 Value of the 2nd bind var: 588,592,848

As you can see, this data has been sampled at 15:14.
The discrepancy between the statistics (gathered during the usual time window, i.e. at 22:01) and reality is already significant.
What’s more important though, the values of both bind variables are “out of range” and this will produce the estimated cardinality for both predicates equal to 1.
Within a few hours, this will lead straight to truly poor performance.
Why not now (at 15:14) but in few hours?
If you take a look at the “gap” between the value of the first bind variable (let’s take the one which controlled the CBO’s decision to pick the driving table) and the real max, here is by how far the CBO got fooled:
Real max (285,983,828) – Bind value (285,970,450) = 13378.
If you recall the predicate inf.hist_id >= :1 and you’ll get through the CBO 10053 trace lines below, you’ll come to conclusion that 13378 will be the number of times of lookups (visits to the joined table) in the critical NESTED LOOP.
This discrepancy did already occasionally trigger the timeout (ran over 120 seconds), so it was bad enough for the business.
Sometimes, however, it finished within 120 seconds, so it sneaked in.
Needless to say, these circumstances produced quite a confusion.

`Data blocks consistent reads - undo records applied`

Why did the behavior appear so random — sometimes the statement finished successfully, while at other times it failed with a timeout — although the total number of processed rows around that time was very similar and the SQL plan was unchanged? You might say: on the surface everything was essentially the same.
Explanation came with the associating of frequently (coming in waves) changing rows – perhaps you remember the matter of frequent DELETEs and INSERTs which I have mentioned.
That obviously must have made certain executions (most likely the timed-out cases) to reconstruct the buffers back to the required SCN – which obviously added significant footprint.
You can find it out and measure the impact by observing the statistic data blocks consistent reads - undo records applied for the session executing your query.
How? For example with the “Snapper” by Tanel Põder.

Alternatively (I used both), compare “consistent gets” from several executions of the same query (with the same SQL Plan) upon setting the “SET AUTOTRACE TRACE STATISTICS”.

Example (the numbers are derived from an unrelated SQL execution) :

SQL> SET AUTOTRACE TRACE STATISTICS 

<run your query here>

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3270  consistent gets
        327  physical reads
        220  redo size
        812  bytes sent via SQL*Net to client
        392  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
     133233  rows processed

If with the same (or very similar) values of “rows processed” and “physical reads” (without reaching out to UNDO tablespace), you’ll notice significant discrepancy in “consistent gets” (probably also time elapsed) between two executions, it’s a good reason to assume that the execution with bigger number of “consistent gets” must have built up the required buffers reaching out to the buffers with before images.

Yet another “gotcha”.
You might say – wait, we’re talking of 13’000 rows. Even if all of them had to be reconstructed from the undo buffers, this would not leave such a significant difference between the times elapsed.
Yes/No.
Yes, possibly – if these 13’000 rows were left alone, being retrieved at “one go”.
No, if you realize that these 13’000 rows (or just 15 minutes later: nearly 30’000) make up the driving table being the part of the NESTED LOOP join.
As with NESTED LOOP, such driving rowset has the “driven” (looked-up) partner, which is usually visited in much broader extent, as being touched many times – exactly that many as the cardinality of the driving table (rowset).
Usually though, each such look up encompasses (many) more than just 1 row, in many situations representing the PK-FK relationship.
Therefore, provided that the content of the driven (joined) table may also change during the query execution (it did in a big scale in my case), building up consistent images will usually add bigger footprint while reaching out to the looked-up table than the driving one.

Although — and this should be emphasized — the reconstruction of the consistent image of the buffers belonging to the rowset that represents the driving rowset does not go unnoticed either.
This can be misleading when looking at the SQL Plan in e.g. SQL Monitor, (note, the classical execution plan does not show it) — specifically when focusing on the Execs column.
The Execs value indicates how many times a given Plan ID was executed over the course of the processing of a query.
Therefore, next to the Plan ID of the driving rowset it reads: 1.
At first glance, one might think: “So at least the driving rowset is read in a single go — more or less all at once — similar to how the hash table is built during a HASH JOIN.”
Not quite. Technically, the driving rowset is indeed collected in a single pass. Unfortunately, however, this collection happens simultaneously (along) with the lookups into the joined rowset.
This is clearly visible during particularly long-running NESTED LOOP operations observed across several SQL Monitor samples: the Rows (Actual) value for the driving rowset grows steadily, rather than appearing all at once with the final figure.
Thus, the buffers of both: the driving and the joined rowset are virtually equally exposed to the potential need of reconstructing of the consistent image, although the joined rowset is, by its nature, somewhat more exposed for the reason mentioned above.

Subtle difference in costing may lead to disastrous results

There is a “sub-gotcha” which depicts another trap, this time directly related to the caveat of estimated cardinality of 1.
Let’s start from the predicates which are involved in joining and/or accessing both main tables :

WHERE
        inf.hist_id = val.hist_secret_id (+)
    AND inf.hist_id >= 285970450
    AND val.hist_id (+) >= 588592848

Specifically: inf.hist_id = val.hist_secret_id (+)
If you look at the related excerpt of the SQL Plan, you’ll surely notice that – although there is a corresponding index on hist_secret_id: IX_SECRTB_SECRTPOS_VALUE_IX1, it is not used while joining both tables.
Strange enough, since with this join condition, it is intuitive that this index should be used.

DDL of that index (shortened) :
CREATE INDEX "VERYSECOWNR"."IX_SECRTB_SECRTPOS_VALUE_IX1" ON "VERYSECOWNR"."PR_SECRTB_SECRTPOS_VALUE" ("hist_secret_id")

Finally, the CBO chose not to utilize this index (below is the excerpt of the SQL Plan which represents the join of these two tables) :

--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------------------
…
|   7 |        NESTED LOOPS OUTER                  |                             |     1 |   174 |     7   (0)| 00:00:01 |
|   8 |         TABLE ACCESS BY INDEX ROWID BATCHED| PR_SECRTB_SECRTPOS_INFO     |     1 |   107 |     4   (0)| 00:00:01 |
|*  9 |          INDEX RANGE SCAN                  | IX_SECRTB_SECRTPOS_INFO_PK  |     1 |       |     3   (0)| 00:00:01 |
|* 10 |         TABLE ACCESS BY INDEX ROWID BATCHED| PR_SECRTB_SECRTPOS_VALUE    |     1 |    67 |     3   (0)| 00:00:01 |
|* 11 |          INDEX RANGE SCAN                  | IX_SECRTB_SECRTPOS_VALUE_PK |     1 |       |     2   (0)| 00:00:01 |
…
--------------------------------------------------------------------------------------------------------------------------

To be specific –
This above is the “driven” branch of the NESTED LOOP from Id = 7 which intuitively supposed to be accessed by IX_SECRTB_SECRTPOS_VALUE_IX1 based on the join condition : inf.hist_id = val.hist_secret_id.
Sort of :

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name                         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                              |    21 | 34692 |    70   (2)| 00:00:01 |
…
|   7 |          NESTED LOOPS OUTER                  |                              |    21 |  3654 |    48   (0)| 00:00:01 |
|   8 |           TABLE ACCESS BY INDEX ROWID BATCHED| PR_SECRTB_SECRTPOS_INFO      |    21 |  2247 |     5   (0)| 00:00:01 |
|*  9 |            INDEX RANGE SCAN                  | IX_SECRTB_SECRTPOS_INFO_PK   |    21 |       |     3   (0)| 00:00:01 |
|* 10 |           TABLE ACCESS BY INDEX ROWID BATCHED| PR_SECRTB_SECRTPOS_VALUE     |     1 |    67 |     3   (0)| 00:00:01 |
|* 11 |            INDEX RANGE SCAN                  | IX_SECRTB_SECRTPOS_VALUE_IX1 |     1 |       |     2   (0)| 00:00:01 |
…
-----------------------------------------------------------------------------------------------------------------------------

I slightly manipulated the cardinality of the driving rowset of the NESTED LOOP from Id=9, making the optimizer “thinking” it was supposed to get 21 rows to form a driving part (so, this time not 1 row).
What I did: I have manipulated the crucial predicate in the following way: inf.hist_id >= max(hist_id from the stats) – 20 and forced (by hint) the NESTED LOOP (CBO chose HASH JOIN).
This turned out far better. Not that this SQL Plan is ideal – the CBO still misses the facts. At least though it had reasons not to apply the cardinality of 1 and – as the result – the query ran much faster.

Optimizer decision paths – ACCESS PATH

The question arises: why it didn’t choose the index which seems to be the natural choice?
Anticipating: this decision drew the elapsed time of the query execution by several orders of magnitude.

Let’s dissect the internals of optimizer’s decision paths.

First, we need the data which optimizer uses in the below formulas (can be found in the 10053 CBO trace) :


  Index: IX_SECRTB_SECRTPOS_VALUE_PK  Col#: 18
  LVLS: 2  #LB: 7397  #DK: 3108287  LB/K: 1.00  DB/K: 1.00  CLUF: 158149.00  NRW: 3108287.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 8192 BSZ: 1

  Index: IX_SECRTB_SECRTPOS_VALUE_IX1  Col#: 21
  LVLS: 2  #LB: 13977  #DK: 1273984  LB/K: 1.00  DB/K: 1.00  CLUF: 165551.00  NRW: 3108261.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 8192 BSZ: 1

The total I/O cost calculated by the optimizer for an index range scan consists of two components, and the final I/O cost is the sum of these two values :

Cost of Index Access : Index_LVL + ceil( #LB * ix_sel )
and
Cost of the table access: ceil( CLUF * ix_sel )

This is btw. what can be found in the SQL Plan next to the corresponding Plan IDs.
Worth to know though, the figure (Cost = 4) which is put next to TABLE, e.g.
TABLE ACCESS BY INDEX ROWID BATCHED -> PR_SECRTB_SECRTPOS_INFO
is the accumulated value (it’ll get clarified soon).

So :
Total I/O Cost of Index Range Scan = [ Index_LVL + ceil( #LB * ix_sel ) ] + ceil( CLUF * ix_sel )
I did the math (more on that: here) using the corresponding numbers from the above summary and the figures withdrawn from the appropriate excerpts of the 10053 trace file (see right below).
The further required details are under the table below.

These are the excerpts which are direct copies from the CBO trace file representing the costing of Index Range Scans for both competing indexes.
Several not important lines were removed.


   2706  ****** Costing Index IX_SECRTB_SECRTPOS_VALUE_IX1
   2711   Using prorated density: 2.4172e-07 of col #21 as selectivity of out-of-range/non-existent value pred
   2712   Access Path: index (RangeScan)
   2713     Index: IX_SECRTB_SECRTPOS_VALUE_IX1
   2714     resc_io: 4.000000  resc_cpu: 29273
   2715     ix_sel: 2.4172e-07  ix_sel_with_filters: 2.4172e-07
   2716     Cost: 4.000854  Resp: 4.000854  Degree: 1

  2717  ****** Costing Index IX_SECRTB_SECRTPOS_VALUE_PK
   2722   Using prorated density: 1.6087e-07 of col #18 as selectivity of out-of-range/non-existent value pred
   2723   Access Path: index (RangeScan)
   2724     Index: IX_SECRTB_SECRTPOS_VALUE_PK
   2725     resc_io: 4.000000  resc_cpu: 29261
   2726     ix_sel: 1.6087e-07  ix_sel_with_filters: 1.6087e-07
   2727     Cost: 4.000853  Resp: 4.000853  Degree: 1

For better readability :
The ix_sel: 2.4172e-07 = 0.00000024172
The ix_sel: 1.6087e-07 = 0.00000016087

Btw. the conversion from scientific notation is fairly simple :

SQL> select to_char(&scientific_notation) from dual ;
Enter value for scientific_notation: 1.6087e-07
old   1: select to_char(&scientific_notation) from dual
new   1: select to_char(1.6087e-07) from dual

TO_CHAR(1.60
------------
.00000016087

If you put the appropriate numbers into the formula :
I/O Cost of Index Range Scan = [ Index_LVL + ceil( #LB * ix_sel ) ] + ceil( CLUF * ix_sel )
you’ll get the same I/O Cost for both: resc_io: 4.000000

Optimizer decision paths – CPU Cost conversion

Yet still the final cost figures (last lines: Cost: 4.000854 and Cost: 4.000853) slightly differ and for some reason the IX_SECRTB_SECRTPOS_VALUE_PK won (I am still not quite accurate with this statement but will come to the final.. speculation later).
The difference comes from the CPU part of the calculated cost, namely:
resc_cpu: 29273
and
resc_cpu: 29261.
These numbers can’t simply be added to the I/O cost (so, we cannot simply add resc_io + resc_cpu).
That said, resc_cpu needs to be converted to the common unit first.

Let’s carry out the conversion of cost_CPU to cost I/O to be able to add them both.

The main formula is:
converted CPU Cost = Cost_cpu_from_10053 / 1000 / SREADTIM / CPUSPEEDNW
Without getting into the meaning of the names used, let us proceed with the calculations, but if you need more examples and details, visit this article.

We need SYSTEM STATISTICS INFORMATION (the outcome of the system statistics), to be found in the 10053-trace file.

-----------------------------
SYSTEM STATISTICS INFORMATION
-----------------------------
Using dictionary system stats.
  Using NOWORKLOAD Stats
  CPUSPEEDNW: 2857 millions instructions/sec (default is 100)
  IOTFRSPEED: 4096 bytes per millisecond (default is 4096)
  IOSEEKTIM:  10 milliseconds (default is 10)
  MBRC:       NO VALUE blocks (default is 8)

The CPUSPEEDNW can be derived directly from the above table summary: 2857 (millions instructions/sec)
The Cost_cpu_from_10053 comes from the resc_cpu (29273 and 29261, respectively).
SREADTIM needs to be calculated as follows:
SREADTIM = IOSEEKTIM + db_block_size / IOTFRSPEED
where IOSEEKTIM and IOTFRSPEED come from the same table-summary.
After putting the appropriate figures into the above formulas, the converted CPU cost of both Index Range Scans is:
0.000854 (for IX_SECRTB_SECRTPOS_VALUE_IX1)
and
0.000853 (for IX_SECRTB_SECRTPOS_VALUE_PK).

Quick math:

IOTFRSPEED	IOSEEKTIME	db_block_size	CPUSPEED(NW)
4096	10	8192	2857

SREADTIM = 10 + 8192/4096 = 12

converted CPU Cost (IDX1) = 29273 / 1000 / 12 / 2857 = 0.000854
converted CPU Cost (PK) = 29261 / 1000 / 12 / 2857 = 0.000853

Since these figures represent the converted values, now we can (are allowed to) add them to the corresponding I/O Cost:

Total Cost = I/O Cost + Converted (to I/O) CPU Cost.

Finally:
4 + 0.000854 = 4.000854
and
4 + 0.000853 = 4.000853

These are exactly the figures withdrawn from the 10053 trace file which are then compared by the optimizer while deciding which access path will be selected (the one with lower cost).

The corresponding lines from the trace file :

Cost: 4.000854  Resp: 4.000854  Degree: 1 (representing the accumulated cost Index Range Scan using IX_SECRTB_SECRTPOS_VALUE_IX1)
Cost: 4.000853  Resp: 4.000853  Degree: 1 (representing the accumulated cost Index Range Scan using IX_SECRTB_SECRTPOS_VALUE_PK)

The summary:

	IdxLVL	#LB	ix_sel	CLUF	IdxCost (I/O)	Ceil(IdxCost(I/O)	TabCost(I/O)	Ceil(TabCost(I/O))	Cost(I/O)	Cost CPU from 10053	Converted CPU Cost	TotalCost
IX_SECRTB_SECRTPOS_VALUE_PK	2	7397	0.00000016087	158149	2.001190	3	0.025441	1	4.000000	29261	0.000853	4.000853
IX_SECRTB_SECRTPOS_VALUE_IX1	2	13977	0.00000024172	165551	2.003379	3	0.040017	1	4.000000	29273	0.000854	4.000854

At this point I need to raise a hand and emphasize how this subtle difference :
4.000853 < 4.000854
controlled the final decision made by CBO which then turned out disastrous.

For completeness, below is the calculated cost (65396.456320) of Full Table Scan of the table PR_SECRTB_SECRTPOS_VALUE.
The optimizer had no doubts while comparing its cost (much higher) with any of the two Index Range Scans, as ultimately it was comparing the figure 65396.456320 with 4.000854 and with 4.000853.

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for PR_SECRTB_SECRTPOS_VALUE[VAL]
  SPD: Directive valid: dirid = 18798806276229922, state = 1, flags = 1, loc = 1, forDS = NO, forCG = YES {C(1337655)[18, 21]}
  SPD: Return code in qosdDSDirSetup: EXISTS, estType = TABLE

kkecdn: Single Table Predicate:"VAL"."HIST_ID"(+)>=588592848
  Column (#18): HIST_ID(NUMBER)
    AvgLen: 7 NDV: 3108134 Nulls: 0 Density: 0.000000 Min: 583330185.000000 Max: 586438812.000000
  Using prorated density: 1.6087e-07 of col #18 as selectivity of out-of-range/non-existent value pred

kkecdn: Single Table Predicate:"VAL"."hist_secret_id"(+)>=285970450
  Using prorated density: 2.4172e-07 of col #21 as selectivity of out-of-range/non-existent value pred
  Table: PR_SECRTB_SECRTPOS_VALUE  Alias: VAL
    Card: Original: 3108134.000000  Rounded: 1  Computed: 0.000000  Non Adjusted: 0.000000
  Scan IO  Cost (Disk) =   65292.000000
  Scan CPU Cost (Disk) =   3426246362.240000
  Cost of predicates:
    io = NOCOST, cpu = 50.000000, sel = 0.000000 flag = 2048  ("VAL"."HIST_ID"(+)>=588592848)
    io = NOCOST, cpu = 50.000000, sel = 0.000000 flag = 2048  ("VAL"."hist_secret_id"(+)>=285970450)
  Total Scan IO  Cost  =   65292.000000 (scan (Disk))
                         + 0.000000 (io filter eval) (= 0.000000 (per row) * 3108134.000000 (#rows))
                       =   65292.000000
  Total Scan CPU  Cost =   3426246362.240000 (scan (Disk))
                         + 155406725.000000 (cpu filter eval) (= 50.000008 (per row) * 3108134.000000 (#rows))
                       =   3581653087.240000
  Access Path: TableScan
    Cost:  65396.456320  Resp: 65396.456320  Degree: 0
      Cost_io: 65292.000000  Cost_cpu: 3581653087
      Resp_io: 65292.000000  Resp_cpu: 3581653087

Can something which takes 3 days be optimized down to 10 seconds?

Let me come back to this statement :
“At this point I need to raise a hand and emphasize how subtle difference (4.000853 < 4.000854) controlled the final decision made by CBO which then turned out disastrous”.
Why so cruel?

Look at the facts provided by SQL Monitor, specifically at the Rows(Actual) next to the joined rowset at Plan ID 11, so: 2G (2G means “at least 2,000,000,000 (index entries)”.
Plan ID = 11 --> Plan Id = 10
IX_SECRTB_SECRTPOS_VALUE_PK --> PR_SECRTB_SECRTPOS_VALUE.

SQL Plan Monitoring Details (Plan Hash Value=364305447)
=================================================================================================================================================================================
| Id |                  Operation                   |            Name             |  Rows   | Cost |   Time    | Start  | Execs |   Rows   |  Mem  | Activity | Activity Detail |
|    |                                              |                             | (Estim) |      | Active(s) | Active |       | (Actual) | (Max) |   (%)    |   (# samples)   |
=================================================================================================================================================================================
|  0 | SELECT STATEMENT                             |                             |         |      |         1 |   +490 |     1 |    50004 |     . |          |                 |
|  1 |   VIEW                                       |                             |       1 |   12 |         1 |   +490 |     1 |    50004 |     . |          |                 |
|  2 |    WINDOW SORT PUSHED RANK                   |                             |       1 |   12 |       489 |     +2 |     1 |    71689 |  26MB |          |                 |
|  3 |     NESTED LOOPS OUTER                       |                             |       1 |   11 |       489 |     +2 |     1 |    71689 |     . |          |                 |
|  4 |      NESTED LOOPS                            |                             |       1 |   10 |       489 |     +2 |     1 |    71689 |     . |          |                 |
|  5 |       NESTED LOOPS                           |                             |       1 |    9 |       489 |     +2 |     1 |    71689 |     . |          |                 |
|  6 |        NESTED LOOPS OUTER                    |                             |       1 |    8 |       489 |     +2 |     1 |    71689 |     . |          |                 |
|  7 |         NESTED LOOPS OUTER                   |                             |       1 |    7 |       489 |     +2 |     1 |    71689 |     . |          |                 |
|  8 |          TABLE ACCESS BY INDEX ROWID BATCHED | PR_SECRTB_SECRTPOS_INFO     |       1 |    4 |       489 |     +2 |     1 |    32247 |     . |          |                 |
|  9 |           INDEX RANGE SCAN                   | IX_SECRTB_SECRTPOS_INFO_PK  |       1 |    3 |       489 |     +2 |     1 |    32247 |     . |          |                 |
| 10 |          TABLE ACCESS BY INDEX ROWID BATCHED | PR_SECRTB_SECRTPOS_VALUE    |       1 |    3 |       489 |     +2 | 32247 |    67143 |     . |    90.79 | Cpu (434)       |
| 11 |           INDEX RANGE SCAN                   | IX_SECRTB_SECRTPOS_VALUE_PK |       1 |    2 |       489 |     +2 | 32247 |       2G |     . |     8.99 | Cpu (43)        |
| 12 |         TABLE ACCESS BY INDEX ROWID          | PR_REALLYSEC                |       1 |    1 |       489 |     +2 | 71689 |    71663 |     . |          |                 |
| 13 |          INDEX UNIQUE SCAN                   | PK_RALOC_AUTOID             |       1 |      |       489 |     +2 | 71689 |    71663 |     . |          |                 |
| 14 |        TABLE ACCESS BY INDEX ROWID           | PR_SECRTB_DET1              |       1 |    1 |       489 |     +2 | 71689 |    71689 |     . |          |                 |
| 15 |         INDEX UNIQUE SCAN                    | PK_SENSORUNIT_AUTOID        |       1 |      |       489 |     +2 | 71689 |    71689 |     . |          |                 |
| 16 |       TABLE ACCESS BY INDEX ROWID            | PR_REALSEC                  |       1 |    1 |       489 |     +2 | 71689 |    71689 |     . |     0.21 | Cpu (1)         |
| 17 |        INDEX UNIQUE SCAN                     | PK_RADEVICE_AUTOID          |       1 |      |       489 |     +2 | 71689 |    71689 |     . |          |                 |
| 18 |      TABLE ACCESS BY INDEX ROWID             | PR_SECRTB_MY_VERSEC         |       1 |    1 |       489 |     +2 | 71689 |    71689 |     . |          |                 |
| 19 |       INDEX UNIQUE SCAN                      | PK_SENSOPST_AUTOID          |       1 |      |       489 |     +2 | 71689 |    71689 |     . |          |                 |
=================================================================================================================================================================================

If you divide that number by number of Execs (32247), you’ll get roughly 60’000 (it should be just 2) such entries returned with every single loop.
That reminds – let me call it that way – semi cartesian join.
In fact, it’s still away from a cartesian join understood in a literal sense, as its definition says: all rows from the driving branch multiplied by all rows from the joined branch.
All these numbers make sense and justify the usage of the index IX_SECRTB_SECRTPOS_VALUE_IX1 supporting the join which wasn’t used :
The numbers of rows filtered from the PR_SECRTB_SECRTPOS_VALUE upon traversing the PK index is indeed 67’143 – no wonder, IX_SECRTB_SECRTPOS_VALUE_PK is a primary key index.
Then the ratio between the total numbers of rows returned from the driving table (PR_SECRTB_SECRTPOS_INFO) and the joined one (PR_SECRTB_SECRTPOS_VALUE) is indeed close to 2, calculated as 67143 / 32247.
Now, with this SQL Plan here is what happened :
With each “loop” the joined index must have carried out large index range scan, returning ~60’000 index entries.
Note – had that index been hash-partitioned by HIST_ID (that might make sense, as this table is heavily populated by concurrent INSERTs), it would impose scanning of the entire index.
With these 60’000 index entries it had to carry out even more work: it had to map these 60’000 entries (by ROWIDs) to its table and ultimately return just.. 2 rows.
The amount of work (and its ratio) is btw. well depicted under the Activity(%):
~90% of work has been spent on filtering those 2 rows from PR_SECRTB_SECRTPOS_VALUE with every loop.
Things might have been even worse if the clustering factor between the IX_SECRTB_SECRTPOS_VALUE_PK and PR_SECRTB_SECRTPOS_VALUE was higher (in the examined case the CLUF looks quite ok).

To support the above, I executed the same query while forcing two different SQL plans by the means of hints.
I took the hints from the output of
select * from table(dbms_xplan.display_cursor('&sql_id',&child_no,'ALLSTATS +outline'))
The predicates were shaped as follows to ensure that the CBO calculates the cardinality as 1:

AND inf.hist_id >= 330106219 + 1
AND val.hist_id (+) >= 679793586 + 1

The 330106219 and 679793586 are the max. values of HIST_ID from the current statistics for both tables.

Note, I have not directly influenced the cardinalities with any of such hints like cardinality() or opt_estimate().
I just told the optimizer which index is to be used to service the joined (driven) part.
According to my expectation the cardinality in the crucial points of the plan was calculated as 1 (max value from stats + 1, so beyond the official min-max range).

The first plan uses the index IX_SECRTB_SECRTPOS_VALUE_IX1, whereas the other uses IX_SECRTB_SECRTPOS_VALUE_PK.
For both plans, the estimated cardinality and the cost figures are identical.
To identify the actual difference, it would be necessary to examine the CBO trace file in the same manner as shown above.
Ironically, this time the IX_SECRTB_SECRTPOS_VALUE_IX1 was originally chosen – again, with just the subtle difference between the both compared costs.

Looking at this SQL Plan, specifically at Rows(Actual) next to the marked Ids, one can apply the same analysis as above with (hopefully) the same conclusions withdrawn.
The number of Rows Actual is different, but the ratios stay intact, most notably the number of index entries returned at each “loop” : 2000000 / 769000 = ~2, so 2M divided by number of “loops” (769K) which comes from the true cardinality returned from the driving table (769K).
Most importantly: the elapsed time of the query: 10 seconds.

The query and the SQL Plan with IX_SECRTB_SECRTPOS_VALUE_IX1 (crucial hinting marked) :

SELECT /* &comment */
/*+
monitor
opt_param('optimizer_adaptive_plans','false')
      OUTLINE_LEAF(@"SEL$259F1CC8")
      ELIMINATE_JOIN(@"SEL$1" "RES"@"SEL$1")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$2" "from$_subquery$_008"@"SEL$2")
      INDEX_RS_ASC(@"SEL$259F1CC8" "INF"@"SEL$1" ("PR_SECRTB_SECRTPOS_INFO"."HIST_ID"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$259F1CC8" "INF"@"SEL$1")
      INDEX_RS_ASC(@"SEL$259F1CC8" "VAL"@"SEL$1" ("PR_SECRTB_SECRTPOS_VALUE"."hist_secret_id")) --> hint forcing the IX_SECRTB_SECRTPOS_VALUE_IX1
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$259F1CC8" "VAL"@"SEL$1")
      INDEX_RS_ASC(@"SEL$259F1CC8" "LOC"@"SEL$1" ("PR_REALLYSEC"."AUTOID"))
      INDEX_RS_ASC(@"SEL$259F1CC8" "UNI"@"SEL$1" ("PR_SECRTB_DET1"."AUTOID"))
      INDEX_RS_ASC(@"SEL$259F1CC8" "DEV"@"SEL$1" ("PR_REALSEC"."AUTOID"))
      INDEX_RS_ASC(@"SEL$259F1CC8" "STA"@"SEL$1" ("PR_SECRTB_MY_VERSEC"."AUTOID"))
      LEADING(@"SEL$259F1CC8" "INF"@"SEL$1" "VAL"@"SEL$1" "LOC"@"SEL$1" "UNI"@"SEL$1" "DEV"@"SEL$1" "STA"@"SEL$1")
      USE_NL(@"SEL$259F1CC8" "VAL"@"SEL$1")
      USE_NL(@"SEL$259F1CC8" "LOC"@"SEL$1")
      USE_NL(@"SEL$259F1CC8" "UNI"@"SEL$1")
      USE_NL(@"SEL$259F1CC8" "DEV"@"SEL$1")
      USE_NL(@"SEL$259F1CC8" "STA"@"SEL$1")
*/
* 
FROM
    VERYSECOWNR.PR_SECRTB_SECRTPOS_INFO  inf,
    VERYSECOWNR.PR_SECRTB_SECRTPOS_VALUE val,
    VERYSECOWNR.PR_SECRTB_MY_VERSEC      sta,
    VERYSECOWNR.PR_SECRTB_DET1           uni,
    VERYSECOWNR.PR_RVERYSECR             res,
    VERYSECOWNR.PR_REALSEC               dev,
    VERYSECOWNR.PR_REALLYSEC             loc
WHERE
        inf.hist_id = val.hist_secret_id (+)
    AND inf.operational_status = sta.autoid (+)
    AND inf.sensor_unit_autoid = uni.autoid
    AND inf.location = loc.autoid (+)
    AND uni.device = dev.autoid
    AND dev.resource_ref = res.autoid (+)
    AND inf.hist_id >= 330106219 + 1
    AND val.hist_id (+) >= 679793586 + 1
order by inf.hist_id , inf.recorded_timestamp
FETCH NEXT 50000 ROWS WITH TIES
/

Global Information
------------------------------
Status              :  DONE (ALL ROWS)
SQL ID              :  0yb25p5udvkyr
Duration            :  10s

Global Stats
===========================================================================
| Elapsed |   Cpu   | Cluster  |  Other   | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
===========================================================================
|    9.07 |    8.46 |     0.58 |     0.03 |  3335 |    10M |  121 | 968KB |
===========================================================================

SQL Plan Monitoring Details (Plan Hash Value=2128396058)
=================================================================================================================================================================================================
| Id |                  Operation                   |             Name             |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity | Activity Detail |
|    |                                              |                              | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | (Max) |   (%)    |   (# samples)   |
=================================================================================================================================================================================================
|  0 | SELECT STATEMENT                             |                              |         |      |         2 |     +9 |     1 |    50001 |      |       |     . |          |                 |
|  1 |   VIEW                                       |                              |       1 |   12 |         2 |     +9 |     1 |    50001 |      |       |     . |          |                 |
|  2 |    WINDOW SORT PUSHED RANK                   |                              |       1 |   12 |         9 |     +2 |     1 |     114K |      |       |  40MB |    11.11 | Cpu (1)         |
|  3 |     NESTED LOOPS OUTER                       |                              |       1 |   11 |         8 |     +2 |     1 |       2M |      |       |     . |          |                 |
|  4 |      NESTED LOOPS                            |                              |       1 |   10 |         8 |     +2 |     1 |       2M |      |       |     . |          |                 |
|  5 |       NESTED LOOPS                           |                              |       1 |    9 |         8 |     +2 |     1 |       2M |      |       |     . |          |                 |
|  6 |        NESTED LOOPS OUTER                    |                              |       1 |    8 |         8 |     +2 |     1 |       2M |      |       |     . |          |                 |
|  7 |         NESTED LOOPS OUTER                   |                              |       1 |    7 |         8 |     +2 |     1 |       2M |      |       |     . |          |                 |
|  8 |          TABLE ACCESS BY INDEX ROWID BATCHED | PR_SECRTB_SECRTPOS_INFO      |       1 |    4 |         8 |     +2 |     1 |     769K |      |       |     . |          |                 |
|  9 |           INDEX RANGE SCAN                   | IX_SECRTB_SECRTPOS_INFO_PK   |       1 |    3 |         8 |     +2 |     1 |     769K |      |       |     . |          |                 |
| 10 |          TABLE ACCESS BY INDEX ROWID BATCHED | PR_SECRTB_SECRTPOS_VALUE     |       1 |    3 |         8 |     +2 |  769K |       2M |      |       |     . |    11.11 | Cpu (1)         |
| 11 |           INDEX RANGE SCAN                   | IX_SECRTB_SECRTPOS_VALUE_IX1 |       1 |    2 |         8 |     +2 |  769K |       2M |  121 | 968KB |     . |    11.11 | Cpu (1)         |
| 12 |         TABLE ACCESS BY INDEX ROWID          | PR_REALLYSEC                 |       1 |    1 |         8 |     +2 |    2M |       2M |      |       |     . |          |                 |
| 13 |          INDEX UNIQUE SCAN                   | PK_RALOC_AUTOID              |       1 |      |         8 |     +2 |    2M |       2M |      |       |     . |          |                 |
| 14 |        TABLE ACCESS BY INDEX ROWID           | PR_SECRTB_DET1               |       1 |    1 |         9 |     +1 |    2M |       2M |      |       |     . |    22.22 | Cpu (2)         |
| 15 |         INDEX UNIQUE SCAN                    | PK_SENSORUNIT_AUTOID         |       1 |      |         8 |     +2 |    2M |       2M |      |       |     . |    11.11 | Cpu (1)         |
| 16 |       TABLE ACCESS BY INDEX ROWID            | PR_REALSEC                   |       1 |    1 |         8 |     +2 |    2M |       2M |      |       |     . |    11.11 | Cpu (1)         |
| 17 |        INDEX UNIQUE SCAN                     | PK_RADEVICE_AUTOID           |       1 |      |         8 |     +2 |    2M |       2M |      |       |     . |    11.11 | Cpu (1)         |
| 18 |      TABLE ACCESS BY INDEX ROWID             | PR_SECRTB_MY_VERSEC          |       1 |    1 |         8 |     +2 |    2M |       2M |      |       |     . |          |                 |
| 19 |       INDEX UNIQUE SCAN                      | PK_SENSOPST_AUTOID           |       1 |      |         8 |     +2 |    2M |       2M |      |       |     . |    11.11 | Cpu (1)         |
=================================================================================================================================================================================================

Now the query and the SQL Plan with IX_SECRTB_SECRTPOS_VALUE_PK (again, crucial hinting marked) :

SELECT /* &comment */
/*+
monitor
opt_param('optimizer_adaptive_plans','false')
      OUTLINE_LEAF(@"SEL$259F1CC8")
      ELIMINATE_JOIN(@"SEL$1" "RES"@"SEL$1")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$2" "from$_subquery$_008"@"SEL$2")
      INDEX_RS_ASC(@"SEL$259F1CC8" "INF"@"SEL$1" ("PR_SECRTB_SECRTPOS_INFO"."HIST_ID"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$259F1CC8" "INF"@"SEL$1")
      INDEX_RS_ASC(@"SEL$259F1CC8" "VAL"@"SEL$1" ("PR_SECRTB_SECRTPOS_VALUE"."HIST_ID")) --> hint forcing the IX_SECRTB_SECRTPOS_VALUE_PK
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$259F1CC8" "VAL"@"SEL$1")
      INDEX_RS_ASC(@"SEL$259F1CC8" "LOC"@"SEL$1" ("PR_REALLYSEC"."AUTOID"))
      INDEX_RS_ASC(@"SEL$259F1CC8" "UNI"@"SEL$1" ("PR_SECRTB_DET1"."AUTOID"))
      INDEX_RS_ASC(@"SEL$259F1CC8" "DEV"@"SEL$1" ("PR_REALSEC"."AUTOID"))
      INDEX_RS_ASC(@"SEL$259F1CC8" "STA"@"SEL$1" ("PR_SECRTB_MY_VERSEC"."AUTOID"))
      LEADING(@"SEL$259F1CC8" "INF"@"SEL$1" "VAL"@"SEL$1" "LOC"@"SEL$1" "UNI"@"SEL$1" "DEV"@"SEL$1" "STA"@"SEL$1") --> order of joins
      USE_NL(@"SEL$259F1CC8" "VAL"@"SEL$1") --> Nested Loop from INF to VAL
      USE_NL(@"SEL$259F1CC8" "LOC"@"SEL$1")
      USE_NL(@"SEL$259F1CC8" "UNI"@"SEL$1")
      USE_NL(@"SEL$259F1CC8" "DEV"@"SEL$1")
      USE_NL(@"SEL$259F1CC8" "STA"@"SEL$1")
*/
*
FROM
    VERYSECOWNR.PR_SECRTB_SECRTPOS_INFO  inf,
    VERYSECOWNR.PR_SECRTB_SECRTPOS_VALUE val,
    VERYSECOWNR.PR_SECRTB_MY_VERSEC      sta,
    VERYSECOWNR.PR_SECRTB_DET1           uni,
    VERYSECOWNR.PR_RVERYSECR             res,
    VERYSECOWNR.PR_REALSEC               dev,
    VERYSECOWNR.PR_REALLYSEC             loc
WHERE
        inf.hist_id = val.hist_secret_id (+)
    AND inf.operational_status = sta.autoid (+)
    AND inf.sensor_unit_autoid = uni.autoid
    AND inf.location = loc.autoid (+)
    AND uni.device = dev.autoid
    AND dev.resource_ref = res.autoid (+)
    AND inf.hist_id >= 330106219 + 1
    AND val.hist_id (+) >= 679793586 + 1
order by inf.hist_id , inf.recorded_timestamp
FETCH NEXT 50000 ROWS WITH TIES
/

Global Information
------------------------------
Status              :  EXECUTING
Instance ID         :  1
Session             :  SYS (756:38268)
SQL ID              :  6rtt3y94t06ub
Duration            :  3847s

Global Stats
=============================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Cluster  |  Other   | Buffer |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Waits(s) |  Gets  |
=============================================================================
|    3847 |    3818 |     0.00 |        0.00 |     0.01 |       29 |   884M |
=============================================================================

SQL Plan Monitoring Details (Plan Hash Value=364305447)
==================================================================================================================================================================================
| Id    |                  Operation                   |            Name             |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Mem | Activity | Activity Detail |
|       |                                              |                             | (Estim) |      | Active(s) | Active |       | (Actual) |     |   (%)    |   (# samples)   |
==================================================================================================================================================================================
|     0 | SELECT STATEMENT                             |                             |         |      |           |        |     1 |          |   . |          |                 |
|     1 |   VIEW                                       |                             |       1 |   12 |           |        |     1 |          |   . |          |                 |
|  -> 2 |    WINDOW SORT PUSHED RANK                   |                             |       1 |   12 |      3846 |     +2 |     1 |        0 | 8MB |          |                 |
|  -> 3 |     NESTED LOOPS OUTER                       |                             |       1 |   11 |      3846 |     +2 |     1 |    21882 |   . |          |                 |
|  -> 4 |      NESTED LOOPS                            |                             |       1 |   10 |      3846 |     +2 |     1 |    21882 |   . |          |                 |
|  -> 5 |       NESTED LOOPS                           |                             |       1 |    9 |      3846 |     +2 |     1 |    21882 |   . |          |                 |
|  -> 6 |        NESTED LOOPS OUTER                    |                             |       1 |    8 |      3846 |     +2 |     1 |    21882 |   . |          |                 |
|  -> 7 |         NESTED LOOPS OUTER                   |                             |       1 |    7 |      3846 |     +2 |     1 |    21882 |   . |          |                 |
|  -> 8 |          TABLE ACCESS BY INDEX ROWID BATCHED | PR_SECRTB_SECRTPOS_INFO     |       1 |    4 |      3846 |     +2 |     1 |    10229 |   . |          |                 |
|  -> 9 |           INDEX RANGE SCAN                   | IX_SECRTB_SECRTPOS_INFO_PK  |       1 |    3 |      3846 |     +2 |     1 |    10229 |   . |          |                 |
| -> 10 |          TABLE ACCESS BY INDEX ROWID BATCHED | PR_SECRTB_SECRTPOS_VALUE    |       1 |    3 |      3848 |     +1 | 10229 |    20037 |   . |    93.29 | Cpu (3506)      |
|    11 |           INDEX RANGE SCAN                   | IX_SECRTB_SECRTPOS_VALUE_PK |       1 |    2 |      3846 |     +2 | 10229 |       4G |   . |     6.71 | Cpu (252)       |
| -> 12 |         TABLE ACCESS BY INDEX ROWID          | PR_REALLYSEC                |       1 |    1 |      3846 |     +2 | 21882 |    21726 |   . |          |                 |
| -> 13 |          INDEX UNIQUE SCAN                   | PK_RALOC_AUTOID             |       1 |      |      3846 |     +2 | 21882 |    21726 |   . |          |                 |
| -> 14 |        TABLE ACCESS BY INDEX ROWID           | PR_SECRTB_DET1              |       1 |    1 |      3846 |     +2 | 21882 |    21882 |   . |          |                 |
| -> 15 |         INDEX UNIQUE SCAN                    | PK_SENSORUNIT_AUTOID        |       1 |      |      3846 |     +2 | 21882 |    21882 |   . |          |                 |
| -> 16 |       TABLE ACCESS BY INDEX ROWID            | PR_REALSEC                  |       1 |    1 |      3846 |     +2 | 21882 |    21882 |   . |          |                 |
| -> 17 |        INDEX UNIQUE SCAN                     | PK_RADEVICE_AUTOID          |       1 |      |      3846 |     +2 | 21882 |    21882 |   . |          |                 |
| -> 18 |      TABLE ACCESS BY INDEX ROWID             | PR_SECRTB_MY_VERSEC         |       1 |    1 |      3846 |     +2 | 21882 |    21882 |   . |          |                 |
| -> 19 |       INDEX UNIQUE SCAN                      | PK_SENSOPST_AUTOID          |       1 |      |      3846 |     +2 | 21882 |    21882 |   . |          |                 |
==================================================================================================================================================================================

As you can see, the outcome is truly disastrous and exactly the same rules as outlined above apply (again, it’s worth to take a look at the Rows(Actual) for the marked lines and carry out the same calculations as above).
It’d be unfair to say it took 3847 seconds, because it finished (got aborted) with ORA-01555: snapshot too old :

SQL> /
Enter value for comment: PK
old   1: SELECT /* &comment */
new   1: SELECT /* PK */

    VERYSECOWNR.PR_SECRTB_SECRTPOS_INFO  inf,
                *
ERROR at line 51:
ORA-01555: snapshot too old: rollback segment number 53 with name "_SYSSMU53_522012075$" too small

SQL_ID: ▒ˍ
Elapsed: 01:11:31.16

Still, it’s interesting to learn how long it would have taken if the UNDO hadn’t been the limiting factor, so the below it’s purely theoretical.
Nonetheless, let’s try to extrapolate it.
Considering, the driving table (PR_SECRTB_SECRTPOS_INFO) would return the same number of rows as the query above (the one which finished after ~10 seconds), so 769K, and at the sampled moment (after 3847 seconds) it managed to return just 10229 rows, it would take over 3 days to finish this query with this SQL Plan (using the index IX_SECRTB_SECRTPOS_VALUE_PK).
I believe, this simple linear proportion can indeed be applied, as the heavy phase (matching the rows from the PK index with the content of PR_SECRTB_SECRTPOS_VALUE) seems to show the linear characteristic *.

	Rows	Seconds	Hours	Days (24h)
Current :	10229	3847	1.07
Extrapolation :	769000	289211.4	80.34	3.35

Note that I still did not take the need to reconstruct the consistent read image from undo into account.
Admittedly, it is reflected up to certain extent already, since the query eventually failed with a “snapshot too old” error, which proves that undo-based reconstruction must have been involved (and time wasted).
Over time, however, this overhead would likely become increasingly significant, as many — perhaps even all — buffers would need to be reconstructed using undo generated by far greater number of commits.

Continue to part3

Piotr Sajda Oracle Tuning Blog

Be cautious when the optimizer estimates a cardinality of 1 – Part2

The state before tuning

`Data blocks consistent reads - undo records applied`

Subtle difference in costing may lead to disastrous results

Optimizer decision paths – ACCESS PATH

Optimizer decision paths – CPU Cost conversion

Can something which takes 3 days be optimized down to 10 seconds?

Leave a Reply Cancel reply

All tags

Latest posts