Clustering Factor (CLUF) – the anatomy and its impact on performance – part2

This post is the continuation of the previous part.

Clustering Factor – tests with block dumps


My test environment comprises a table with 10 million rows and three columns with two indexes:
– ID_SORTED (unique, generated as sorted) with the index on it
– ID_MESS (almost unique, generated as scattered) with the index on it
– PAD (filler with 256 bytes)

SYS@CDBTST1->TPDB1> 
create table TSTUS.TSCLUF as 
select rownum as ID_SORTED, 
trunc( dbms_random.value(1,10e6) ) as ID_MESS, 
rpad('X',256,'X') as pad 
from dual connect by level <= 10e6 ;

Table created.

SYS@CDBTST1->TPDB1> create index TSTUS.IDX_ID_SORTED on TSTUS.TSCLUF ( ID_SORTED ) ;

Index created.

SYS@CDBTST1->TPDB1> create index TSTUS.IDX_ID_MESS on TSTUS.TSCLUF ( ID_MESS ) ;

Index created.

SYS@CDBTST1->TPDB1> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => '&OWNER' , tabname => '&tablename',cascade => true, estimate_percent => &estimate_percent, granularity => 'ALL', degree => &parallel_slaves);
Enter value for owner: TSTUS
Enter value for tablename: TSCLUF
Enter value for estimate_percent: 100
Enter value for parallel_slaves: 4

PL/SQL procedure successfully completed.

Data distribution and some other characteristics of these columns (NDV, LOW_VAL, HI_VAL) :

OWNER                     TABLE_NAME                     COL_ID COLUMN_NAME                    DATA_TYPE             NDV     DENSITY  NUM_NULLS AVG_COL_LEN LOW_VAL                             HI_VAL                              HISTOGRAM       
------------------------- ------------------------------ ------ ------------------------------ ------------ ------------ ----------- ---------- ----------- ----------------------------------- ----------------------------------- --------------- 
TSTUS                     TSCLUF                              2 ID_MESS                        NUMBER          6,321,696   .00000016          0           6 1                                   9999997                             NONE             
TSTUS                                                         1 ID_SORTED                      NUMBER         10,000,000   .00000010          0           6 1                                   10000000                            NONE              
TSTUS                                                         3 PAD                            VARCHAR2                1  1.00000000          0         257 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX    NONE              

Some data considering the both indexes.
Note, the numbers under CLUSTF for both (calculated values for the clustering factor – you may scroll a bit to the right).
– CLUF for IDX_ID_SORTED is close to the number of table’s blocks: CLUF: 384,616 and Table Blocks: 385,478
whereas :
– CLUF for IDX_ID_MESS – exactly opposite, that is: CLUF : 9,999,973 and number of rows: 10,000,000

OWNER              TABLENAME                      INDEX_NAME                     INDEX_TYPE      UNIQUENES COMPRESSION   IDX_PX         CLUSTF          TabBlks          TabRows       NDV LEAF_BLOCKS COLS PAR BLEVEL LEAF_BL_KEY DATA_BL_KEY ORPHANED_E STATUS     MYLAST_ANALYZED      VISIBILIT           MB
------------------ ------------------------------ ------------------------------ --------------- --------- ------------- -------- ------------ ---------------- ---------------- --------- ----------- ---- --- ------ ----------- ----------- ---------- ---------- -------------------- --------- ------------
TSTUS              TSCLUF                         IDX_ID_MESS                    NORMAL          NONUNIQUE DISABLED      1           9,999,973          385,478       10,000,000   6355456       23371    1 NO       2           1           1 NO         VALID      24-09-24 06:00       VISIBLE            192
TSTUS                                             IDX_ID_SORTED                  NORMAL          NONUNIQUE DISABLED      1             384,616          385,478       10,000,000   9914368       24002    1 NO       2           1           1 NO         VALID      24-09-24 06:00       VISIBLE            192
                   ******************************                                                                                                                                                                                                                                                   ------------
                   sum                                                                                                                                                                                                                                                                                       384

Let’s have it examined and confirmed down to the bottom, so block dump level.
Here it is.

Test with low Clustering Factor

First, I will dump the entire index tree of IDX_ID_SORTED – the one which indexes the aligned values in the table. For that I’ll need the object_id.
Somewhere in the trace file you’ll find the list of “leaves” (index leaf blocks) – just navigate to “begin tree dump” within the trace file.

OBJECT                                                       SUBOBJECT_NAME             OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE          
------------------------------------------------------------ ------------------------- ---------- -------------- -------------------- 
TSTUS.IDX_ID_SORTED                                                                        181826         181826 INDEX                

20:28:21 SYS@CDBTST1->TPDB1>
20:28:21 SYS@CDBTST1->TPDB1> ALTER SESSION SET TRACEFILE_IDENTIFIER='&trfile_suffix';
Enter value for trfile_suffix: IDX_SORT_TREE
old   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='&trfile_suffix'
new   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='IDX_SORT_TREE'

Session altered.

INST_ID      SID    SERIAL# OSPID                         PID TRACEFILE
------- -------- ---------- ------------------------ -------- ---------------------------------------
      1      252      21007 268114                        254 CDBTST1_ora_268114_IDX_SORT_TREE.trc

SYS@CDBTST1->TPDB1> alter session set events 'immediate trace name treedump level 181826' ;

Session altered.


----- begin tree dump
branch: 0x112e80bb 288260283 (0: nrow: 38, level: 2)
   branch: 0xc618836 207718454 (-1: nrow: 672, level: 1)
      leaf: 0x112e80bc 288260284 (-1: row:485.485 avs:824)
      leaf: 0x112e80bd 288260285 (0: row:479.479 avs:816)
      leaf: 0x112e80be 288260286 (1: row:479.479 avs:816)
      leaf: 0x112e80bf 288260287 (2: row:479.479 avs:816)
      leaf: 0x112e80c0 288260288 (3: row:479.479 avs:816)
      leaf: 0x112e80c1 288260289 (4: row:478.478 avs:830)
      leaf: 0x112e80c2 288260290 (5: row:479.479 avs:816)
      leaf: 0x112e80c3 288260291 (6: row:479.479 avs:816)
      leaf: 0x112e80c4 288260292 (7: row:479.479 avs:816)
      leaf: 0x112e80c5 288260293 (8: row:478.478 avs:830)
      leaf: 0x112e80c6 288260294 (9: row:479.479 avs:816)
      leaf: 0x112e80c7 288260295 (10: row:479.479 avs:816)
      leaf: 0x112e80c9 288260297 (11: row:479.479 avs:816)
      leaf: 0x112e80ca 288260298 (12: row:479.479 avs:816)
      leaf: 0x112e80cb 288260299 (19: row:478.478 avs:830)
      leaf: 0x112e80cc 288260300 (21: row:479.479 avs:816)
      leaf: 0x112e80cd 288260301 (15: row:479.479 avs:816)
      leaf: 0x112e80ce 288260302 (16: row:479.479 avs:816)
      leaf: 0x112e80cf 288260303 (19: row:479.479 avs:816)
      leaf: 0x112e80d0 288260304 (18: row:478.478 avs:830)
(...)
(...)

I took the first leaf with 485 entries.
Now let’s dump a specific leaf block (I’ll need file# and block#) :

INST_ID      SID    SERIAL# OSPID                         PID TRACEFILE
------- -------- ---------- ------------------------ -------- -------------------------------------
      1     6661      56320 328117                        307 CDBTST1_ora_328117_idx_bl_dump.trc



SYS@CDBTST1->TPDB1> select  DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(&&rba), DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(&&rba)  from dual;
Enter value for dba: 288260284
old   1: select  DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(&&rba), DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(&&rba)  from dual
new   1: select  DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(288260284), DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(288260284)  from dual

DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(288260284) DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(288260284)
----------------------------------------------- ------------------------------------------------
                                             68                                          3047612


SYS@CDBTST1->TPDB1> alter system dump datafile 68 block 3047612 ;

System altered.

Indeed, that index block leaf contains 485 entries for SORTED_ID (rows are numbered starting from 0 – navigate in the dump to the row#0) :

row#0[8020] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 02              --> SORTED_ID = 1 as decimal
col 1; len 6; (6):  11 71 ad d3 00 00
row#1[8008] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 03               --> SORTED_ID = 2 as decimal
col 1; len 6; (6):  11 71 ad d3 00 01
row#2[7996] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 04              --> SORTED_ID = 3 as decimal
col 1; len 6; (6):  11 71 ad d3 00 02
row#3[7984] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 05
col 1; len 6; (6):  11 71 ad d3 00 03
row#4[7972] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  11 71 ad d3 00 04
row#5[7960] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 07
col 1; len 6; (6):  11 71 ad d3 00 05
row#6[7948] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 08
col 1; len 6; (6):  11 71 ad d3 00 06
row#7[7936] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 09
col 1; len 6; (6):  11 71 ad d3 00 07
(…)
(…) - skipped ..
(…)
row#484[1830] flag: -------, lock: 0, len=13
col 0; len 3; (3):  c2 05 56              --> SORTED_ID = 485 as decimal
col 1; len 6; (6):  11 71 ad de 00 10

As expected, the indexed values within the leaf block are sorted (no need to convert the hex values, it’s pretty clear – I took the first 3 entries and the last one) :

for ro#0: col 0; len 2; (2):  c1 02
for ro#1: col 0; len 2; (2):  c1 03
for ro#3: col 0; len 2; (2):  c1 04
(...)
for ro#484: col 0; len 2; (2):  c2 05 56

Word of caution.
The decoding of hexadecimal values to the decimal INTEGER values is not straight forward, there are bytes devoted to store the exponent, mantissa, etc.
How you can easily check the matching decimal values (the ones you’d see in your sql*plus session being a result of a SELECT SORTED_ID from TSCLUF) with their hex-representation within the dumps is sort of reverse checks like that :

SQL> select dump(1,16) from dual ;

DUMP(1,16)
-----------------
Typ=2 Len=2: c1,2      --> matching the row#0, col0 

SQL> select dump(2,16) from dual ;

DUMP(2,16)
-----------------
Typ=2 Len=2: c1,3     --> matching the row#1, col0

SQL> select dump(485,16) from dual ;

DUMP(485,16)
--------------------
Typ=2 Len=3: c2,5,56  --> matching the row#484, col0

Now, the moment of truth.
Let’s take the range of values: 1-485 which is the equivalent of a single index leaf block (the 485 entries from the dumped leaf) and count the number of blocks in the table for the selected range of ID_SORTED :

SYS@CDBTST1->TPDB1> select count(distinct DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) ) "amount_blks"  from TSTUS.TSCLUF i  
where id_sorted <= 485  ;

amount_blks
-----------
         19

Ah, the values of ID_SORTED ranging from 1 to 485 which fill the first leaf of the index, are distributed across 19 physical table blocks.
It does not tell us much until we compare this number to the corresponding number from high CLUF.

Let’s take a closer look how would this translate to a corresponding range scan of the table for a query with the predicate: WHERE ID_SORTED <= 485, so being an equivalent of a range of values encompassing just single index leaf block directed to the table.
With the below query I will simply count the number of adjacent values.

As a result, in each of these 19 table blocks you’ll find 26 rows that satisfy this predicate, giving us a clear picture of how the data is distributed and will eventually be accessed from within this table.
I applied quite a big filler (PAD column) – this is why just 26 rows, you’ll see it later.
Here it is :

SYS@CDBTST1->TPDB1> select DBMS_ROWID.ROWID_RELATIVE_FNO(rowid) as "#File", DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) "#Block", count(*) "Number_rows_in_this_block" 
                             from  TSTUS.TSCLUF i  
                             where id_sorted <= 485 
                             group by DBMS_ROWID.ROWID_RELATIVE_FNO(rowid), DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) 
                             order by 1,2 ;

     #File     #Block Number_rows_in_this_block
---------- ---------- -------------------------
        54    2646760                        26
        54    2646761                        26
        54    2646762                        26
        54    2646763                        26
        54    2646764                        26
        54    2646765                        26
        54    2646766                        26
        54    2646767                        26
        69    3255763                        26
        69    3255764                        26
        69    3255765                        26
        69    3255766                        26
        69    3255767                        26
        69    3255769                        26
        69    3255770                        26
        69    3255771                        26
        69    3255772                        26
        69    3255773                        26
        69    3255774                        17

19 rows selected. --> also: number of table's blocks

How to interpret the above?

Let’s take the #Block = 2646760 (marked red) and play with a simple assumption :
If, for example, a query had the following condition in its predicate section: “Select * from TSTUS.TSCLUF where ID_SORTED between N and N+26”, we can expect that this query would return 26 rows accessing just one physical block of the table.

This obviously makes a huge and positive difference as oppose to the predicate characterized by high CLUF, that is, for the predicate: “where ID_MESS between N and N+26”.
We will see it soon.

For even better understanding let’s combine the two outputs from above and list the block# along with the ordinal number of each row within that block and the true value of ID_SORTED within that block#.
Here it is.

I left just several representative entries.
We can expect 19 different blocks with 26 rows (see #Row ranging from 0-25) in each block with altogether 485 rows.
This is indeed what we achieved :

SQL> break on #Block skip 1
SQL> select rowid, DBMS_ROWID.ROWID_RELATIVE_FNO(rowid) as "#File", DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) "#Block" , DBMS_ROWID.ROWID_ROW_NUMBER(rowid) "#Row", ID_SORTED 
from TSTUS.TSCLUF i  where id_sorted <= 485 
order by "#Block", "#Row" ;


ROWID                   #File     #Block       #Row  ID_SORTED
------------------ ---------- ---------- ---------- ----------
AAAsZBAA2AAKGLoAAA         54    2646760          0        131
AAAsZBAA2AAKGLoAAB         54                     1        132
AAAsZBAA2AAKGLoAAC         54                     2        133
AAAsZBAA2AAKGLoAAD         54                     3        134
AAAsZBAA2AAKGLoAAE         54                     4        135
AAAsZBAA2AAKGLoAAF         54                     5        136
AAAsZBAA2AAKGLoAAG         54                     6        137
AAAsZBAA2AAKGLoAAH         54                     7        138
AAAsZBAA2AAKGLoAAI         54                     8        139
AAAsZBAA2AAKGLoAAJ         54                     9        140
AAAsZBAA2AAKGLoAAK         54                    10        141
AAAsZBAA2AAKGLoAAL         54                    11        142
AAAsZBAA2AAKGLoAAM         54                    12        143
AAAsZBAA2AAKGLoAAN         54                    13        144
AAAsZBAA2AAKGLoAAO         54                    14        145
AAAsZBAA2AAKGLoAAP         54                    15        146
AAAsZBAA2AAKGLoAAQ         54                    16        147
AAAsZBAA2AAKGLoAAR         54                    17        148
AAAsZBAA2AAKGLoAAS         54                    18        149
AAAsZBAA2AAKGLoAAT         54                    19        150
AAAsZBAA2AAKGLoAAU         54                    20        151
AAAsZBAA2AAKGLoAAV         54                    21        152
AAAsZBAA2AAKGLoAAW         54                    22        153
AAAsZBAA2AAKGLoAAX         54                    23        154
AAAsZBAA2AAKGLoAAY         54                    24        155
AAAsZBAA2AAKGLoAAZ         54                    25        156

AAAsZBAA2AAKGLpAAA         54    2646761          0        157
AAAsZBAA2AAKGLpAAB         54                     1        158
AAAsZBAA2AAKGLpAAC         54                     2        159
AAAsZBAA2AAKGLpAAD         54                     3        160
AAAsZBAA2AAKGLpAAE         54                     4        161
AAAsZBAA2AAKGLpAAF         54                     5        162
AAAsZBAA2AAKGLpAAG         54                     6        163
AAAsZBAA2AAKGLpAAH         54                     7        164
AAAsZBAA2AAKGLpAAI         54                     8        165
AAAsZBAA2AAKGLpAAJ         54                     9        166
AAAsZBAA2AAKGLpAAK         54                    10        167
AAAsZBAA2AAKGLpAAL         54                    11        168
AAAsZBAA2AAKGLpAAM         54                    12        169
AAAsZBAA2AAKGLpAAN         54                    13        170
AAAsZBAA2AAKGLpAAO         54                    14        171
AAAsZBAA2AAKGLpAAP         54                    15        172
AAAsZBAA2AAKGLpAAQ         54                    16        173
AAAsZBAA2AAKGLpAAR         54                    17        174
AAAsZBAA2AAKGLpAAS         54                    18        175
AAAsZBAA2AAKGLpAAT         54                    19        176
AAAsZBAA2AAKGLpAAU         54                    20        177
AAAsZBAA2AAKGLpAAV         54                    21        178
AAAsZBAA2AAKGLpAAW         54                    22        179
AAAsZBAA2AAKGLpAAX         54                    23        180
AAAsZBAA2AAKGLpAAY         54                    24        181
AAAsZBAA2AAKGLpAAZ         54                    25        182

(…)

AAAsZBABFAAMa3TAAA         69    3255763          0          1
AAAsZBABFAAMa3TAAB         69                     1          2
AAAsZBABFAAMa3TAAC         69                     2          3
AAAsZBABFAAMa3TAAD         69                     3          4
AAAsZBABFAAMa3TAAE         69                     4          5
AAAsZBABFAAMa3TAAF         69                     5          6
AAAsZBABFAAMa3TAAG         69                     6          7
AAAsZBABFAAMa3TAAH         69                     7          8
AAAsZBABFAAMa3TAAI         69                     8          9
AAAsZBABFAAMa3TAAJ         69                     9         10
AAAsZBABFAAMa3TAAK         69                    10         11
AAAsZBABFAAMa3TAAL         69                    11         12
AAAsZBABFAAMa3TAAM         69                    12         13
AAAsZBABFAAMa3TAAN         69                    13         14
AAAsZBABFAAMa3TAAO         69                    14         15
AAAsZBABFAAMa3TAAP         69                    15         16
AAAsZBABFAAMa3TAAQ         69                    16         17
AAAsZBABFAAMa3TAAR         69                    17         18
AAAsZBABFAAMa3TAAS         69                    18         19
AAAsZBABFAAMa3TAAT         69                    19         20
AAAsZBABFAAMa3TAAU         69                    20         21
AAAsZBABFAAMa3TAAV         69                    21         22
AAAsZBABFAAMa3TAAW         69                    22         23
AAAsZBABFAAMa3TAAX         69                    23         24
AAAsZBABFAAMa3TAAY         69                    24         25
AAAsZBABFAAMa3TAAZ         69                    25         26

AAAsZBABFAAMa3UAAA         69    3255764          0         27
AAAsZBABFAAMa3UAAB         69                     1         28
AAAsZBABFAAMa3UAAC         69                     2         29
AAAsZBABFAAMa3UAAD         69                     3         30
AAAsZBABFAAMa3UAAE         69                     4         31
AAAsZBABFAAMa3UAAF         69                     5         32
AAAsZBABFAAMa3UAAG         69                     6         33
AAAsZBABFAAMa3UAAH         69                     7         34
AAAsZBABFAAMa3UAAI         69                     8         35
AAAsZBABFAAMa3UAAJ         69                     9         36
AAAsZBABFAAMa3UAAK         69                    10         37
AAAsZBABFAAMa3UAAL         69                    11         38
AAAsZBABFAAMa3UAAM         69                    12         39
AAAsZBABFAAMa3UAAN         69                    13         40
AAAsZBABFAAMa3UAAO         69                    14         41
AAAsZBABFAAMa3UAAP         69                    15         42
AAAsZBABFAAMa3UAAQ         69                    16         43
AAAsZBABFAAMa3UAAR         69                    17         44
AAAsZBABFAAMa3UAAS         69                    18         45
AAAsZBABFAAMa3UAAT         69                    19         46
AAAsZBABFAAMa3UAAU         69                    20         47
AAAsZBABFAAMa3UAAV         69                    21         48
AAAsZBABFAAMa3UAAW         69                    22         49
AAAsZBABFAAMa3UAAX         69                    23         50
AAAsZBABFAAMa3UAAY         69                    24         51
AAAsZBABFAAMa3UAAZ         69                    25         52


(…)

485 rows selected --> roughly: 19 (blocks) x 26 (rows in each block)

The block# = 3255763 (marked above) is btw. the one which partially covers the corresponding index leaf block which I dumped above (see the values of ID_SORTED encompassing the range : 1 – 26. So: “row 0” – “row 25”)
Let’s verify that :

SYS@CDBTST1->TPDB1>  alter system dump datafile 69 block 3255763 ;

System altered.

The corresponding content of the table block representing the file# = 69 and block# = 3255763 is below (excerpt of the dump).

Start dump data blocks tsn: 8 file#:69 minblk 3255763 maxblk 3255763
….many other (interesting) entries skipped
Finally, the content of the dump of the table block we are after : 

block_row_dump:
tab 0, row 0, @0x1e71
tl: 271 fb: --H-FL-- lb: 0x0  cc: 3
col  0: [ 2]  c1 02                     --> value: 1 (integer)
col  1: [ 5]  c4 04 0e 58 36
col  2: [256]
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58
tab 0, row 1, @0x1d62
tl: 271 fb: --H-FL-- lb: 0x0  cc: 3
col  0: [ 2]  c1 03                     --> value: 2 (integer)
col  1: [ 5]  c4 04 30 02 3d
col  2: [256]
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58
…

… more entries … …
… until the last one: row 25 (26th entry) :

tab 0, row 25, @0x3fb
tl: 271 fb: --H-FL-- lb: 0x0  cc: 3
col  0: [ 2]  c1 1b                     --> value: 26 (integer)
col  1: [ 5]  c4 07 4e 22 51
col  2: [256]
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
58 58 58 58 58 58
end_of_block_dump

Indeed, 26 rows with the prefix “col 0” (e.g. the first one: “col  0: [ 2]  c1 02”) being the hex representation of ID_SORTED.

Most importantly: the values for “col 0” are sorted within the table block whereas the values for “col 1” being such representation for ID_MESS are everything but sorted – absolute mess! Almost needless to carry out the subsequent test : )
For completeness, the “col 2” is obviously the hex representation of PAD with 256 bytes filled with ‘X’ (hex 58 is ‘X’). You can check it with e.g.
select utl_raw.cast_to_varchar2(hextoraw( replace(‘&hex_value’,chr(32),”) )) mystr from dual ;
Obviously, the values 1-26 don’t have to be sorted within the block to keep the CLUF low – it’d be enough if the block was kind of closed compound for the values 1-26.
Meaning, the values 1-26 can also be scattered within the block and CLUF won’t change, hence the efficiency of any associated query with the predicate “where ID_SORTED between 1 and 26” will be exactly the same (and satisfactory).
Reason is simple: Oracle fetches the entire (e.g. 8KB) block from disk, then traverses over the rows within that block, possibly filtering them out.
They are indeed sorted within each block since the table was created in such way.


Test with high Clustering Factor

Let’s repeat the above analysis, this time with the column of MESS_ID, so the one which is also indexed (thus sorted within the index leaf blocks), but it’s scattered around the table’s blocks.
As before: I will dump the index tree, then will dump the selected block from that tree (first leaf block) and will carry on with the analogical analysis.

OBJECT                                                       SUBOBJECT_NAME             OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE          
------------------------------------------------------------ ------------------------- ---------- -------------- ------------- 
TSTUS.IDX_ID_MESS                                                                          181827         181827 INDEX                


SYS@CDBTST1->TPDB1> ALTER SESSION SET TRACEFILE_IDENTIFIER='&trfile_suffix';
Enter value for trfile_suffix: IDXTREE_MESS
old   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='&trfile_suffix'
new   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='IDXTREE_MESS'

Session altered.

20:45:28 SYS@CDBTST1->TPDB1>
20:45:28 SYS@CDBTST1->TPDB1>
20:45:28 SYS@CDBTST1->TPDB1> alter session set events 'immediate trace name treedump level 181827' ;

Session altered.



----- begin tree dump
branch: 0x10f640bb 284573883 (0: nrow: 40, level: 2)
   branch: 0xc142b14 202648340 (-1: nrow: 638, level: 1)
      leaf: 0x10f640bc 284573884 (-1: row:485.485 avs:821)
      leaf: 0x10f640bd 284573885 (0: row:479.479 avs:816)
      leaf: 0x10f640be 284573886 (1: row:478.478 avs:830)
      leaf: 0x10f640bf 284573887 (2: row:479.479 avs:818)
      leaf: 0x10f640c0 284573888 (3: row:478.478 avs:829)
      leaf: 0x10f640c1 284573889 (4: row:478.478 avs:827)
      leaf: 0x10f640c2 284573890 (5: row:479.479 avs:820)
      leaf: 0x10f640c3 284573891 (6: row:479.479 avs:821)
      leaf: 0x10f640c4 284573892 (7: row:478.478 avs:829)
(…)
      leaf: 0xda462ef 228877039 (338: row:422.422 avs:827)
      leaf: 0xda462f0 228877040 (339: row:422.422 avs:822)
      leaf: 0xda462f1 228877041 (340: row:41.41 avs:7299)
----- end tree dump

The anatomy of the index tree dump has been presented by Richard Foote here and I have nothing to add, just can learn a lot.
Btw. from Oracle 12c there’s been introduced a new format of the index tree and Richard covers that in his post.

Perhaps just one interesting thing to notice.
row:<number> informs you about the number of indexed entries (values) within a given index leaf block.
You might be wondering why the first leaf block contains 485 such entries but the “capacity” of a leaf block gets gradually decreased up until the last before last leaf with just 422 entries (the very last one can be skipped as simply not being fully stuffed with indexed values).
If you recall that B*Tree index presents a sorted structure of indexed values, and in our case the first value stored in the first leaf block is 1, the last is 10’000’000 you’ll surely come to the following conclusion :
along with the increase of the integer values (from 1 to 10’000’000) Oracle will simply need more room (measured in bytes) to store these values as hex values.
Therefore, as the leaf blocks grow from “left to right”, the less ID_MESS (or ID_SORTED – no difference) entries will be able to be accommodated withing a given leaf block because the numbers grow.
So, for example, the first leaf is able to accommodate 485 entries since the values of ID_SORTED start from 1 and ends with 485.
So, its hex representation spans the range 2 – 3 bytes with first hex: c1 02 (integer: 1) and the last: c2 05 56 (integer: 485).
The last before last block leaf (I am taking the one which is the last full) is able to accommodate just 422 entries, as it needs more space to store a single entry of ID_MESS because in that leaf we are around the values of 10’000’000.

Coming back to the test – let’s dump the selected index leaf.

SYS@CDBTST1->TPDB1> ALTER SESSION SET TRACEFILE_IDENTIFIER='&trfile_suffix'; 
Enter value for trfile_suffix: idx_leaf_dump_mess
old   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='&trfile_suffix'
new   1: ALTER SESSION SET TRACEFILE_IDENTIFIER='idx_leaf_dump_mess'

Session altered.
20:51:25 SYS@CDBTST1->TPDB1>
20:51:26 SYS@CDBTST1->TPDB1> select  DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(&&rba), DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(&&rba)  from dual;
Enter value for rba: 284573884
old   1: select  DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(&&rba), DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(&&rba)  from dual
new   1: select  DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(284573884), DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(284573884)  from dual

DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(284573884) DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(284573884)
----------------------------------------------- ------------------------------------------------
                                             67                                          3555516


SYS@CDBTST1->TPDB1> alter system dump datafile 67 block 3555516 ;

System altered.

As in the case of the previous index, also the IDX_ID_MESS contains sorted values within its leaf blocks.
As before, I took the first leaf and dumped it. Also this leaf block contains 485 sorted entries (rows are numbered starting from 0).

See “col 0” entries :

Leaf block dump
===============
header address 5289869412=0x13b4d0064
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 485
kdxcofbo 1006=0x3ee
kdxcofeo 1827=0x723
kdxcoavs 821
kdxlespl 0
kdxlende 0
kdxlenxt 284573885=0x10f640bd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8032
row#0[8020] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 02                --> MESS_ID = 1 as decimal
col 1; len 6; (6):  0d a4 3f a3 00 15
row#1[8008] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 07                --> MESS_ID = 6 as decimal
col 1; len 6; (6):  11 71 fc b3 00 0e
row#2[7996] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 08                --> MESS_ID = 7 as decimal
col 1; len 6; (6):  10 f6 a3 d6 00 0f
row#3[7984] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 08
col 1; len 6; (6):  10 f6 ef b9 00 0e
row#4[7972] flag: -------, lock: 0, len=12
col 0; len 2; (2):  c1 08
col 1; len 6; (6):  11 e0 85 48 00 0e
(..)
(..)
row#481[1866] flag: -------, lock: 0, len=13
col 0; len 3; (3):  c2 05 5f
col 1; len 6; (6):  11 a1 df 2a 00 01
row#482[1853] flag: -------, lock: 0, len=13
col 0; len 3; (3):  c2 05 60
col 1; len 6; (6):  11 71 f7 ab 00 15
row#483[1840] flag: -------, lock: 0, len=13
col 0; len 3; (3):  c2 05 62
col 1; len 6; (6):  0d 9d 84 4a 00 08
row#484[1827] flag: -------, lock: 0, len=13
col 0; len 3; (3):  c2 05 62             --> MESS_ID = 497 as decimal
col 1; len 6; (6):  11 35 3c ce 00 18
----- end of leaf block Logical dump -----
----- end of leaf block dump -----

You might be wondering, why the values are not consecutive (1,2,3,4…485) – this is because I have used the trunc( dbms_random.value(1,10e6) ) as ID_MESS in order to generate the scattering and the truncating of the outcome of dbms_random ended up with quite a number of the same values (still, its range is broad enough – nearly 6.3 mln distinct values across the range of 10 mln of all rows – see the NDV in “Data distribution and some other characteristics of these columns” somewhere at the beginning of the post).
I could have utilized mod (modulo function) to populate the data. Having applied mod I would have also produced the perfect mess, but with the consecutive values scattered around the table’s blocks.
Every cloud has a silver lining though : ) – with that tiny “imperfectness” I have achieved quite an interesting caveat which I will describe below (or will put it in another short post).

Just as before, let’s check how many table’s blocks Oracle needs in order to store (and eventually visit) the first 485 values which the equivalent of one index leaf block storing the ID_MESS :

SYS@CDBTST1->TPDB1> select count(distinct DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) ) "amount_blks"  from TSTUS.TSCLUF i  where id_mess <= 485  ;

amount_blks
-----------
        469

Oh, it got much worse.

How to understand it?
If you remember, with low CLUF it’d be just 19 physical reads from the table (19 table’s blocks would satisfy the predicate of ID_SORTED <= 485).
Now we’d need 469 physical reads from the same table for the adequate predicate, this time: ID_MESS <= 485.
So, putting it in different way, with the high (admittedly, extremely high) CLUF reading of one index leaf block would trigger 469 physical reads from the corresponding table.
Huge difference, isn’t it?

Again, a bit deeper.
As above, I will run the same simple query grouping by :
DBMS_ROWID.ROWID_RELATIVE_FNO(rowid), DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid)
and counting the number of rows in each block satisfying the predicate ID_MESS <= 485.

SYS@CDBTST1->TPDB1> select DBMS_ROWID.ROWID_RELATIVE_FNO(rowid) as "#File", DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) "#Block", count(*) "Number_rows_in_this_block" 
from  TSTUS.TSCLUF i   
where id_mess <= 485 
group by DBMS_ROWID.ROWID_RELATIVE_FNO(rowid), DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) order by 1,2
;

     #File     #Block Number_rows_in_this_block
---------- ---------- -------------------------
        45    2048617                         1
        45    2048794                         1
        45    2048926                         1
        45    2049315                         1
        46    2057193                         1
        50    2352355                         1
        50    2352375                         1
        50    2352679                         1
        51    2196098                         1
        51    2196799                         1
        53    2458165                         1
        54    1523955                         1
(…)
(…)
        71    2153163                         1
        71    2153975                         1
        71    2156597                         1
        71    2157035                         1
        71    2157316                         1
        71    2158191                         1
        71    2159030                         1
        71    2159697                         1
        71    2162949                         1
        71    2162979                         1

469 rows selected.

No surprise – for this exemplary listing each such block contains just one row which satisfies the predicate!
Note, with low CLUF we had 26 rows!

To be consequent, let’s combine the two outputs from above and list the block# along with the ordinal number of each row within that block and the true value of ID_MESS within that block#.

Here it is (I left several representative entries).

SYS@CDBTST1->TPDB1> break on #Block skip 1

select rowid, DBMS_ROWID.ROWID_RELATIVE_FNO(rowid) as "#File", DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) "#Block" , DBMS_ROWID.ROWID_ROW_NUMBER(rowid) "#Row", ID_MESS from TSTUS.TSCLUF i  where id_mess  <= 485 order by "#Block", "#Row"




ROWID                   #File     #Block       #Row    ID_MESS
------------------ ---------- ---------- ---------- ----------
AAAsZBAA2AAF0DzAAC         54    1523955          2        118

AAAsZBAA2AAF0D/AAJ         54    1523967          9        183

AAAsZBABHAAG+SQAAP         71    1827984         15        102

AAAsZBABHAAG+TdAAW         71    1828061         22        103

AAAsZBABHAAG+bZAAF         71    1828569          5        121

AAAsZBABHAAG+skAAS         71    1829668         18        269

AAAsZBABGAAHWXJAAT         70    1926601         19        484

AAAsZBAA2AAHYN8AAG         54    1934204          6        445

AAAsZBAA2AAHYRmAAD         54    1934438          3        458
(….)
(…)
AAAsZBABDAAN1saAAO         67    3627802         14        381
AAAsZBABDAAN1saAAR         67                    17          8

AAAsZBABDAAN1tlAAO         67    3627877         14         40

AAAsZBABDAAN18BAAN         67    3628801         13        330

AAAsZBABDAAN18EAAV         67    3628804         21         31

AAAsZBABDAAN2e1AAQ         67    3631029         16        129

AAAsZBABDAAN2zhAAZ         67    3632353         25        318

AAAsZBABDAAN3FQAAS         67    3633488         18        147

AAAsZBABDAAN3QPAAE         67    3634191          4         11

AAAsZBABDAAN3ROAAG         67    3634254          6        233

AAAsZBABDAAN3hrAAW         67    3635307         22        445

AAAsZBABDAAN30gAAL         67    3636512         11        270

AAAsZBABDAAN31/AAW         67    3636607         22        456

AAAsZBABDAAN4CtAAL         67    3637421         11         65

AAAsZBABDAAN4OhAAA         67    3638177          0        165

AAAsZBABDAAN4dSAAB         67    3639122          1        421


470 rows selected.

We were “a bit lucky” in only one case, that is only block# 3627802 (marked above) stores more than 1 row (but still only 2) satisfying the predicate <= 485.
if you will, it’s for the ID_MESS = 381 (at the ordinal position 14) and ID_MESS = 8 (at the position 17 in the table’s block).
In all other cases the rest of the table blocks which would need to be visited contain just ONE row satisfying the predicate.
That, of course, is highly inefficient and we will try to quantify the “scale of evil” in the subsequent sections.

So much about the dumps.
Let’s jump to the cases represented by SELECT statements and examine the true effort measured with the number of physical reads which are hidden behind a query retrieving its data from the table by the means of either one: ID_SORTED or ID_MESS in the predicate section.

Clustering Factor – part3.

Leave a Reply

Your email address will not be published. Required fields are marked *