오라클/Tuning

Cardinality 예측 실습

빵순이^^ 2010. 8. 20. 16:52


1. Histogram 없는 경우 : -> Density = 1/NDV

 

 

   -- 1) 동등질의 + 상수      = 1/NDV * Base Cardinality           where c1 = 1
   -- 2) 동등질의 + bind 변수 = 1/NDV * Base Cardinality           where c1 = :b1
   -- 3) 범위질의 + bind 변수 = 개당 5% * Base Cardinality         where c1 > :b1
   -- 4) 존재하지 않는 값     = Max와 Min값을 기준으로 벗어나는    where c1 = 6
                                비율로 일정 비율 감소하다가 일정
                                한계를 벗어나면 1 (p.423 and p.428)

 

                                 density * 전체 row수
                                ----------------------- = 구간별 감소 갯수
                                  최대값 - 최소값



2. Histogram 있는 경우

 


 (1) Frequency : -> Density = (1/전체 Row의 개수) / 2

 

   -- 1) 동등질의 + 상수      = 실제 Row의 개수                  where c1 = 1
   -- 2) 동등질의 + bind 변수 = 1/NDV * Base Cardinality         where c1 = :b1
   -- 3) 범위질의 + bind 변수 = 개당 5% * Base Cardinality       where c1 > :b1
   -- 4) 존재하지 않는 값     = 1 or minimum_frequency/2         where c1 = 6

 

(2) Height-balanced : -> Density = sum of the square of the frequency of the nonpopular values /
                                   (number of nonnull rows * number of nonpopular nonull rows)

                                    cf. 10.2.0.2 이후 버전 기준

 

    -- 1) 동등질의 + 상수 -> Popular Value      = (해당 버킷수 / 전체 버킷수) * Base Cardinality
                          -> Non Popular Value  = Density * Base Cardinality

    -- 2) 동등질의 + bind 변수                  = Density * Base Cardinality
    -- 3) 범위질의 + bind 변수                  = 개당 5% * Base Cardinality
    -- 4) 존재하지 않는 값                      = Max와 Min값을 기준으로 벗어나는   
                                                  비율로 일정 비율 감소하다가 일정
                                                  한계를 벗어나면 1 (p.423 and p.428)

Cardinality 예측 실습



더보기

drop table t1 purge;
create table t1(c1 int, c2 int);
create index t1_n1 on t1(c1+1);

-- c1 = frequency
-- c2 = height-balanced

insert into t1
select 1, mod(level, 2)+1
from dual
connect by level <= 10000
union all
select 2, mod(level, 300)+1
from dual
connect by level <= 1000
union all
select 3, mod(level, 50)+1
from dual
connect by level <= 100
union all
select 4, mod(level, 5)+1
from dual
connect by level <= 10
;


commit;


  

1. Histogram 없는 경우

SQL> exec dbms_stats.gather_table_stats(user, 't1', method_opt=>'for all columns size 1');

SQL> @tab_stat t1

TABLE_NAME                    : T1
NUM_ROWS                      : 11110
-----------------
TABLE_NAME                    : T1
COLUMN_NAME                   : C1
NUM_DISTINCT                  : 4
-----------------
TABLE_NAME                    : T1
COLUMN_NAME                   : C2
NUM_DISTINCT                  : 300
-----------------



   -- 1) select * from t1 where c1 = 1;     select 1/4 * 11110 from dual;
   -- 2) select * from t1 where c1 = :b1;   select 1/4 * 11110 from dual;     
   -- 3) select * from t1 where c1 > :b1;   select 0.05 * 11110 from dual;
   -- 4) select * from t1 where c1 = 5;     select (1/4 * 11110) - (1/4 * 11110 / (4-1)) from dual;
         select * from t1 where c1 = 6;     select (1/4 * 11110) - (1/4 * 11110 / (4-1)) - (1/4 * 11110 / (4-1)) from dual;


 

2. Histogram 있는 경우

SQL> exec dbms_stats.gather_table_stats(user, 't1', method_opt=>'for all columns size skewonly');

SQL> @tab_stat t1

TABLE_NAME                    : T1
COLUMN_NAME                   : C1
NUM_DISTINCT                  : 4
NUM_NULLS                     : 0
DENSITY                       : .000045004500450045
LOW_VALUE                     : C102
HIGH_VALUE                    : C105
HISTOGRAM                     : FREQUENCY
-----------------
TABLE_NAME                    : T1
COLUMN_NAME                   : C2
NUM_DISTINCT                  : 300
NUM_NULLS                     : 0
DENSITY                       : .406332971731016
LOW_VALUE                     : C102
HIGH_VALUE                    : C204
HISTOGRAM                     : HEIGHT BALANCED
-----------------

TABLE_NAME           COLUMN_NAME          ENDPOINT_NUMBER ENDPOINT_VALUE
-------------------- -------------------- --------------- --------------------
T1                   C1                             10000 1()
T1                   C1                             11000 2()
T1                   C1                             11100 3()
T1                   C1                             11110 4()
T1                   C2                               113 1()
T1                   C2                               228 2()
T1                   C2                               229 4()
... 생략 ...
T1                   C2                               253 286()
T1                   C2                               254 300()

 (1) Frequency : -> Density = (1/전체 Row의 개수) / 2

   -- 1) select * from t1 where c1 = 1;     10000
   -- 2) select * from t1 where c1 = :b1;   select 1/4 * 11110 from dual; 
   -- 3) select * from t1 where c1 > :b1;   select 0.05 * 11110 from dual;
   -- 4) select * from t1 where c1 = 5;     1
         select * from t1 where c1 = 6;     1

(2) Height-balanced : -> Density = sum of the square of the frequency of the nonpopular values /
                                   (number of nonnull rows * number of nonpopular nonull rows) (10.2.0.2 이후 버전 기준)

   -- 1) select * from t1 where c2 = 1;     select 113/254 * 11110 from dual;
         select * from t1 where c2 = 4;     select .406332971731016 * 11110 from dual;
   -- 2) select * from t1 where c2 = :b1;   select .406332971731016 * 11110 from dual;  
   -- 3) select * from t1 where c2 > :b1;   select 0.05 * 11110 from dual;
   -- 4) select * from t1 where c2 = 301;   select (.406332971731016 * 11110) - (.406332971731016 * 11110 / (300 - 1)) from dual;
         select * from t1 where c2 = 302;   select (.406332971731016 * 11110) - (.406332971731016 * 11110 / (300 - 1)) -(.406332971731016 * 11110 / (300 - 1)) from dual; 


 

 


출처 : http://cafe.naver.com/gseducation.cafe?iframe_url=/ArticleRead.nhn%3Farticleid=469