Remote threads 是仅当 Hive metastore 作为单独的服务运行是启动,请求需要开启 compactor。
有以下几种:
1. AcidOpenTxnsCounterService 统计当前 open 的事务数
从表 TXNS 中统计状态为 open 的事务。此事务数量可以再 hive metrics 中。
2. AcidHouseKeeperService
定期调用 txnHandler.performTimeOuts();
默认的 txn 的 timeout是 300s。删除300秒没有心跳的事务。
TXN_TIMEOUT("metastore.txn.timeout", "hive.txn.timeout", 300, TimeUnit.SECONDS,"time after which transactions are declared aborted if the client has not sent a heartbeat."),
DumpDirCleanerTask
dump dir 是
REPLDIR("hive.repl.rootdir","/user/hive/repl/","HDFS root dir for all replication dumps."),
此目录的 ttl 是
REPL_DUMPDIR_TTL("hive.repl.dumpdir.ttl", "7d",new TimeValidator(TimeUnit.DAYS),
AcidCompactionHistoryService
周期行调用 txnHandler.purgeCompactionHistory();
purgeCompactionHistory 的内容如下。对于可以合并的 entity(分区或者表(非分区表)),只保留最后几次的history。
/*** For any given compactable entity (partition, table if not partitioned) the history of compactions* may look like "sssfffaaasffss", for example. The idea is to retain the tail (most recent) of the* history such that a configurable number of each type of state is present. Any other entries* can be purged. This scheme has advantage of always retaining the last failure/success even if* it's not recent.* @throws MetaException*/@RetrySemantics.SafeToRetryvoid purgeCompactionHistory() throws MetaException;
RuntimeStatsCleanerTask
如果此参数设置true,则query reexecution 的时候会收集统计信息。
HIVE_QUERY_REEXECUTION_ALWAYS_COLLECT_OPERATOR_STATS("hive.query.reexecution.always.collect.operator.stats", false,"If sessionstats are enabled; this option can be used to collect statistics all the time"),
RawStore ms = HMSHandler.getMSForConf(conf);
int maxRetainSecs=(int) MetastoreConf.getTimeVar(conf, MetastoreConf.ConfVars.RUNTIME_STATS_MAX_AGE, TimeUnit.SECONDS);int deleteCnt = ms.deleteRuntimeStats(maxRetainSecs);
maxRetainSecs 默认 3天。
删除表 RUNTIME_STATS 中 createTime <= 3天前的记录
RUNTIME_STATS_MAX_AGE("runtime.stats.max.age", "hive.metastore.runtime.stats.max.age", 86400 * 3, TimeUnit.SECONDS,"Stat entries which are older than this are removed.")
AcidWriteSetService
默认每60秒的周期调用一次 txnHandler.performWriteSetGC();
.
执行类似以下的语句。
select min(txn_id) commitHighWaterMark from TXNS where txn_state='OPEN';
delete from WRITE_SET where ws_commit_id < commitHighWaterMark;
EventCleanerTask
定期执行以下语句。
RawStore ms = HMSHandler.getMSForConf(conf);
long deleteCnt = ms.cleanupEvents();
cleanupEvents 删除 PARTITION_EVENTS
表的过期数据。
PARTITION_EVENTS 表的数据,进在 hcatalog 中使用
MaterializationsRebuildLockCleanerTask
从表 MATERIALIZATION_REBUILD_LOCKS 中找到过期数据,然后删除。