PostgreSQLスキルアップノート（自己啓発のための個人サイト）
HOT（Heap Only Tuple）を実機で確認


このページはそのままカット&ペーストで実行して試すことができます。
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
■■■■ PostgreSQL スキルアップノート
■■■■
■◆■■ HOT（Heap Only Tuple）を実機で確認
■■■■
■■■■
■■■■ 2012/12/11
■■■■ 使用環境：PostgreSQL9.1.4 (CentOS6.2)
                                                                   (C) 2012 ohdb
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
★自己都合により実機確認は9.1、マニュアルへのリンクは9.2としています。ご了承下さい。

マニュアルには詳しい記述が見つからない(?)のでリンクはありません。
下記Let's Postgresの記事を参照。

【マニュアル】

なし

【参考記事】
Let's Postgres→●[記事]




■１■ 確認の都合上、autovacuumを一旦停止
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

更新後すぐにautovacuum（※）されると確認に支障がでるため以下により一時的に停止。
※ postgresql.confのautovacuum_naptimeにもよる。（デフォルトなら１分）


【□】 pg_ctl restart -o '-c autovacuum=off' 



【参考】以下本題とは無関係

　・pg_ctlの-oオプションとは？

    起動時の一時的なパラメータ変更ができる。
    postgresql.confはそのままにしておけるので主に試行的な用途に使用。
    今回はautovacuumパラメータをoffにして起動(restart)している。

    →●[マニュアル]

    なお、複数指定する場合は以下のようにする。
    pg_ctl restart -o '-c autovacuum=off -c archive_mode=off' 


　・「-o autovacuum=off」ではなく「-o '-c autovacuum=off'」となるのはなぜ？   

    pg_ctlの-oオプションは、postgresコマンド(等)に渡すオプションをそのまま記述するためのもの。
    続けて指定している-c（シングルクォートの中）はpostgresコマンドの-cオプションを意味する。


■２■ テーブルの作成
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

以下の４テーブルを作成。

テーブル名  FILLFACTOR       備考
--------------------------------------------------------
t1          FILLFACTORなし   HOTは非適用になるはず
t2          FILLFACTOR=90    ある程度の効果？
t3          FILLFACTOR=75    かなりの効果？
t4          FILLFACTOR=50    十分に効く？　でも格納効率上は問題


【□】 psql -d test1 -U test1

【□】 以下のcreate文を実行。

-- テーブルt1・FILLFACTORなし
drop table if exists t1;
create table t1 (c1 serial primary key,idxcol varchar(20),noidxcol varchar(20));
create index t1_idx1 on t1(idxcol);

-- テーブルt2・FILLFACTOR 90%
drop table if exists t2;
create table t2 (c1 serial primary key,idxcol varchar(20),noidxcol varchar(20))  WITH (FILLFACTOR=90);
create index t2_idx1 on t2(idxcol);

-- テーブルt3・FILLFACTOR 75%
drop table if exists t3;
create table t3 (c1 serial primary key,idxcol varchar(20),noidxcol varchar(20))  WITH (FILLFACTOR=75);
create index t3_idx1 on t3(idxcol);

-- テーブルt4・FILLFACTOR 50%
drop table if exists t4;
create table t4 (c1 serial primary key,idxcol varchar(20),noidxcol varchar(20))  WITH (FILLFACTOR=50);
create index t4_idx1 on t4(idxcol);


【□】 FILLFACTORを確認

   FILLFACTORの設定内容はpg_classのreloptionsで確認できる。

select 
    relowner,
    (select rolname from pg_roles r where r.oid=c.relowner) as ownername,
    (select nspname from pg_namespace n where n.oid=c.relnamespace) as namespace,
    relkind,
    relname,
    reloptions
  from pg_class c 
  where
    relkind='r' and    relname ~'^t[1-4]$'
    order by relname
  ;



 relowner | ownername | namespace | relkind | relname |   reloptions★
----------+-----------+-----------+---------+---------+-----------------
   272004 | test1     | public    | r       | t1      |
   272004 | test1     | public    | r       | t2      | {fillfactor=90}
   272004 | test1     | public    | r       | t3      | {fillfactor=75}
   272004 | test1     | public    | r       | t4      | {fillfactor=50}

【参考】reloptionsに表示されない場合

　指定なしでデフォルトで作成されたものはreloptionsには値がない。
　この場合
　   テーブルであれば           100
  　 インデックスであれば        90    (Btreeの場合）
　を意味する。



■３■ データ格納
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

全カラムともc1のserial値と同じ値を入れておく。


【□】 以下によりテーブルのデータ格納用ファンクションを作成

-- ここから --------
CREATE OR REPLACE FUNCTION f_t1(kensu integer,tbl varchar) returns integer
AS $$
  DECLARE
   inscount integer;
   i integer;
  BEGIN
    FOR i IN 1..kensu LOOP
     execute 'insert into ' || tbl || ' values(default,cast(currval(''t1_c1_seq'') as varchar),cast(currval(''t1_c1_seq'') as varchar))';
     inscount=i;
    END LOOP;
    return inscount;
  END;
$$ LANGUAGE plpgsql
;
-- ここまで --------


以上で作成したファンクションf_t1を使い、４テーブルに同じデータを挿入してみる。


【□】 以下により各10万件のデータのinsert

-- ここから --------
\timing on
select f_t1(100000,'t1');
select f_t1(100000,'t2');
select f_t1(100000,'t3');
select f_t1(100000,'t4');
-- ここまで --------





■４■ 検証１ とりあえずHOTを発生をさせてみる
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

FILLFACTORの違うそれぞれのテーブルに対して実施


【□】 データのUPDATE

update t1 set noidxcol='9999';     /* t1　FILLFACTORなし */
update t2 set noidxcol='9999';     /* t2　FILLFACTORあり90 */
update t3 set noidxcol='9999';     /* t3　FILLFACTORあり75 */
update t4 set noidxcol='9999';     /* t4　FILLFACTORあり50 */



【参考】UPDATEにかかった処理時間　

t1 Time: 1081.013 ms
t2 Time: 954.607 ms
t3 Time: 979.461 ms
t4 Time: 308.316 ms


【□】 状態確認

select relname, n_tup_ins,n_tup_upd,n_tup_del,n_tup_hot_upd,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze
from pg_stat_user_tables where relname in('t1','t2','t3','t4') order by relname;

 relname | n_tup_ins | n_tup_upd | n_tup_del |★n_tup_hot_upd| n_live_tup |★n_dead_tup| last_vacuum | last_autovacuum | last_analyze | last_autoanalyze
---------+-----------+-----------+-----------+---------------+------------+------------+-------------+-----------------+--------------+------------------
 t1      |    100000 |    100000 |         0 |             0 |     100000 |     100000 |             |                 |              |
 t2      |    100000 |    100000 |         0 |         13490 |     100000 |     100000 |             |                 |              |
 t3      |    100000 |    100000 |         0 |         40184 |     100000 |     100000 |             |                 |              |
 t4      |    100000 |    100000 |         0 |        100000 |     100000 |     100000 |             |                 |              |


【上記からわかること】

n_tup_hot_updの値より

・テーブルt1はHOTが効いていない。（FILLFACTORが100だから同じページ内には書けなかった）
・テーブルt2～t4はHOTが効いた
  ただし、HOTが適用できた度合いがテーブルによってかなり違う。【重要】
  FILLFACTORが90と高いテーブルt2は、わずか13490行に留まっている。

n_dead_tupの値より

・全テーブルともに10万行の不要な行が発生した。



autovacuumは止めてあるので、このままの状態で次の検証へ・・



■５■ 次の検証へのメモ
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

・現在autovacuumは停止中なので発生した不要領域はそのままになっている。

・autovacuumを再開させれば当然不要領域は回収される。
　しかし、vacuumしなくても解消されるはずのものがこの中に含まれている。

　・・・それは、n_dead_tupのうちHOTが効いた分（n_tup_hot_upd）

　つまり
            回収予定    非回収予定
   ------------------------------------------------
    t1              0 | 100000行
    t2          13490 |  86510行    (10万-13490)
    t3          40184 |  59816行    (10万-40184)
    t4         100000 |      0行    


  SELECTしただけで回収されていくはず。
  本当にそうなるのか、次項の検証で確認する。



■６■ 検証２ vacuumなしで不要領域が回収されることの検証
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

vacuumなしで不要領域が回収される状況を確認する。
以下のようにselectすることで不要領域の回収が行われるはず。


【□】 それぞれ以下のように参照してみる。

     select * from t1 where c1<=1000;
     select * from t2 where c1<=1000;
     select * from t3 where c1<=1000;
     select * from t4 where c1<=1000;


【□】 状態確認

select relname, n_tup_ins,n_tup_upd,n_tup_del,n_tup_hot_upd,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze
from pg_stat_user_tables where relname in('t1','t2','t3','t4') order by 1;

 relname | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup |★n_dead_tup| last_vacuum | last_autovacuum | last_analyze | last_autoanalyze
---------+-----------+-----------+-----------+---------------+------------+------------+-------------+-----------------+--------------+------------------
 t1      |    100000 |    100000 |         0 |             0 |     100000 |     100000 |             |                 |              |
 t2      |    100000 |    100000 |         0 |         13490 |     100000 |★    86510 |             |                 |              |
 t3      |    100000 |    100000 |         0 |         40184 |     100000 |★    59816 |             |                 |              |
 t4      |    100000 |    100000 |         0 |        100000 |     100000 |★        4 |             |                 |              |


【上記からわかること】

★1000行のselectによって確かにn_dead_tupの値が減っている・・

・t1（FILLFACTORなし）はHOTupdateがないので、回収はされていない。
・t2（FILLFACTOR90）はHOTupdateされた13490行すべてが回収。
・t3（FILLFACTOR75）はHOTupdateされた40184行すべてが回収。
・t4（FILLFACTOR50）はHOTupdateされた100000行の大半（4行を除いて）がすべて回収。   

以上、HOTupdateにより発生した不要領域はvacuumしなくても大半が回収された。（ほぼ■５■項のとおり）



■７■ 実装にあたって考慮すべきこと（FILLFACTOR）
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

大量に更新する場合、HOTの効果はFILLFACTORの設定に大きく左右される。

今回のように全件UPDATEするようなケースの場合、FILLFACTORが90程度ではHOTは部分的にしか効かなかった。

インデックスカラムの更新があろうがなかろうが、結局は「postgresは追記型」であることに変わりがない。
このため、たとえわずかな（１バイトの）UPDATEでも１行追記するだけの空きが同じページ内に必要。
繰り返せば90程度のFILLFACTOR値ではすぐにページに入らなくなる。


★疑問
今回はUPDATE１回で１０万行の大量更新をさせたが、もし１行つづUPDATE（１０万回UPDATE）したらどうなるのだろう？
次の処理に回収をまかせている仕組み上、上とは異なる結果になる（はず）
・・・次の機会に確認



【参考】インデックスのFILLFACTOR

インデックスはデフォルトでFILLFACTOR 90が設定されている。
(ただしBtreeの場合でインデックスの種類により異なる）
→●[マニュアル]

マニュアルより
・ページ内いっぱいになるとページ分割
・テーブルの変化がほとんどない場合は100が最善
・更新が非常に多い場合は小さな値を設定しておく必要がある（ページ分割が少なくなる）

ただし、Let's Postgresの記事にはインデックスのFILLFACTORはあまり考慮する必要がないと記載あり。
「・・・インデックスはデフォルトで・・・90となっています。こちらは、特に変更する必要はないでしょう。」




■８■ 検証４ INDEXカラムの更新ではHOTが効かないことの検証
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

・autovacuumはされていない。
・前の検証から引き続き実施。　各値は前の検証時の状態からスタート

状態確認

select relname, n_tup_ins,n_tup_upd,n_tup_del,n_tup_hot_upd,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze
from pg_stat_user_tables where relname in('t1','t2','t3','t4') order by 1;



 relname | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze
---------+-----------+-----------+-----------+---------------+------------+------------+-------------+-----------------+--------------+------------------
 t1      |    100000 |    100000 |         0 |             0 |     100000 |     100000 |             |                 |              |
 t2      |    100000 |    100000 |         0 |         13490 |     100000 |      86510 |             |                 |              |
 t3      |    100000 |    100000 |         0 |         40184 |     100000 |      59816 |             |                 |              |
 t4      |    100000 |    100000 |         0 |        100000 |     100000 |          4 |             |                 |              |



インデックスカラムの更新

update t1 set idxcol='9999';     /* FILLFACTORなしテーブル */
update t2 set idxcol='9999';     /* FILLFACTORあり90 テーブル*/
update t3 set idxcol='9999';     /* FILLFACTORあり75 テーブル*/
update t4 set idxcol='9999';     /* FILLFACTORあり50 テーブル*/



状態確認

select relname, n_tup_ins,n_tup_upd,n_tup_del,n_tup_hot_upd,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze
from pg_stat_user_tables where relname in('t1','t2','t3','t4') order by 1;



 relname | n_tup_ins | n_tup_upd | n_tup_del |★n_tup_hot_upd| n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze
---------+-----------+-----------+-----------+---------------+------------+------------+-------------+-----------------+--------------+------------------
 t1      |    100000 |    200000 |         0 |             0 |     100000 |     200000 |             |                 |              |
 t2      |    100000 |    200000 |         0 |         13490 |     100000 |     186510 |             |                 |              |
 t3      |    100000 |    200000 |         0 |         40184 |     100000 |     159816 |             |                 |              |
 t4      |    100000 |    200000 |         0 |        100000 |     100000 |     100004 |             |                 |              |


【上記からわかること】

　n_tup_hot_updの値は前のまま。
　今回のupdateではHOTは発生しなかった。


■９■ 検証５ HOTupdateではインデックスは書き換えが発生していないことの確認
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

今回見送り



■１０■ 最後にvacuum実行
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

vacuumdbコマンドの場合の例


【□】 \! vacuumdb -a -z    

　　※ -aはクラスタ内の全ＤＢ対象。 -zはアナライズも実施


状態確認

select relname, n_tup_ins,n_tup_upd,n_tup_del,n_tup_hot_upd,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze
from pg_stat_user_tables where relname in('t1','t2','t3','t4') order by 1;

 relname | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup |          last_vacuum          | last_autovacuum |         last_analyze          | last_autoanalyze
---------+-----------+-----------+-----------+---------------+------------+------------+-------------------------------+-----------------+-------------------------------+------------------
 t1      |    100000 |    200000 |         0 |             0 |     100000 |          0 | 2012-12-17 02:04:20.605032+09 |                 | 2012-12-17 02:04:20.719179+09 |
 t2      |    100000 |    200000 |         0 |         13490 |     100000 |          0 | 2012-12-17 02:04:21.450947+09 |                 | 2012-12-17 02:04:21.606505+09 |
 t3      |    100000 |    200000 |         0 |         40184 |     100000 |          0 | 2012-12-17 02:04:21.900873+09 |                 | 2012-12-17 02:04:22.046578+09 |
 t4      |    100000 |    200000 |         0 |        100000 |     100000 |          0 | 2012-12-17 02:04:22.216231+09 |                 | 2012-12-17 02:04:22.390233+09 |


last_vacuumとlast_analyzeに実施時刻が入り、vacuum/analyzeが実行されたことを確認。


【補足】pg_stat_user_tablesのvacuum日時のカラムは２つ

last_vacuum と last_autovacuum

現在autovacuumを一時的に停止しているので、上記のようになるが、
通常であれば自動で実行され、その状況はlast_vacuumカラムでなく、
last_autovacuumカラムに値が入る。

analyzeについても同様でlast_analyze,last_autoanalyzeの２カラムがある。



■１１■ 後始末（vacuumデーモンを再開）
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━


【□】 pg_ctl restart 


以上


■参考■
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━


\d pg_stat_user_tables

test1=# \d pg_stat_user_tables
          View "pg_catalog.pg_stat_user_tables"
      Column       |           Type           | Modifiers
-------------------+--------------------------+-----------
 relid             | oid                      |
 schemaname        | name                     |
 relname           | name                     |
 seq_scan          | bigint                   |
 seq_tup_read      | bigint                   |
 idx_scan          | bigint                   |
 idx_tup_fetch     | bigint                   |
 n_tup_ins         | bigint                   |
 n_tup_upd         | bigint                   |
 n_tup_del         | bigint                   |
 n_tup_hot_upd     | bigint                   |
 n_live_tup        | bigint                   |
 n_dead_tup        | bigint                   |
 last_vacuum       | timestamp with time zone |
 last_autovacuum   | timestamp with time zone |
 last_analyze      | timestamp with time zone |
 last_autoanalyze  | timestamp with time zone |
 vacuum_count      | bigint                   |
 autovacuum_count  | bigint                   |
 analyze_count     | bigint                   |
 autoanalyze_count | bigint                   |



\d pg_stat_user_indexes

View "pg_catalog.pg_stat_user_indexes"
    Column     |  Type  | Modifiers
---------------+--------+-----------
 relid         | oid    |
 indexrelid    | oid    |
 schemaname    | name   |
 relname       | name   |
 indexrelname  | name   |
 idx_scan      | bigint |
 idx_tup_read  | bigint |
 idx_tup_fetch | bigint |


test1=# \d pg_class
      Table "pg_catalog.pg_class"
     Column     |   Type    | Modifiers
----------------+-----------+-----------
 relname        | name      | not null
 relnamespace   | oid       | not null
 reltype        | oid       | not null
 reloftype      | oid       | not null
 relowner       | oid       | not null
 relam          | oid       | not null
 relfilenode    | oid       | not null
 reltablespace  | oid       | not null
 relpages       | integer   | not null
 reltuples      | real      | not null
 reltoastrelid  | oid       | not null
 reltoastidxid  | oid       | not null
 relhasindex    | boolean   | not null
 relisshared    | boolean   | not null
 relpersistence | "char"    | not null
 relkind        | "char"    | not null
 relnatts       | smallint  | not null
 relchecks      | smallint  | not null
 relhasoids     | boolean   | not null
 relhaspkey     | boolean   | not null
 relhasrules    | boolean   | not null
 relhastriggers | boolean   | not null
 relhassubclass | boolean   | not null
 relfrozenxid   | xid       | not null
 relacl         | aclitem[] |
 reloptions     | text[]    |
Indexes:
    "pg_class_oid_index" UNIQUE, btree (oid)
    "pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)


est1=# \d pg_roles
              View "pg_catalog.pg_roles"
     Column     |           Type           | Modifiers
----------------+--------------------------+-----------
 rolname        | name                     |
 rolsuper       | boolean                  |
 rolinherit     | boolean                  |
 rolcreaterole  | boolean                  |
 rolcreatedb    | boolean                  |
 rolcatupdate   | boolean                  |
 rolcanlogin    | boolean                  |
 rolreplication | boolean                  |
 rolconnlimit   | integer                  |
 rolpassword    | text                     |
 rolvaliduntil  | timestamp with time zone |
 rolconfig      | text[]                   |
 oid            | oid                      |





以上