Hive msck 描述

MSCK SQL 语法如下：

MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS];

1. 背景

先创建3个分区，把分区文件删除。这时 metastore 有这个3个分区，文件上不存在。再在文件系统上创建其他两个分区，这两个分区在 metastore 不存在。

1.1 初始化SQL

drop table if exists t_drop_partitions;
create table  t_drop_partitions(c1 string)partitioned by (pt string,hr string) stored as textfile location '/tmp/t_drop_partitions';
alter table t_drop_partitions add partition(pt='20230101',hr='00') location '/tmp/t_drop_partitions/pt=20230101/hr=00';
alter table t_drop_partitions add partition(pt='20230101',hr='01') location '/tmp/t_drop_partitions/pt=20230101/hr=01';
alter table t_drop_partitions add partition(pt='20230101',hr='02') location '/tmp/t_drop_partitions/pt=20230101/hr=02';
! hadoop fs -rm -r /tmp/t_drop_partitions/pt=20230101/hr=00;
! hadoop fs -rm -r /tmp/t_drop_partitions/pt=20230101/hr=01;
! hadoop fs -rm -r /tmp/t_drop_partitions/pt=20230101/hr=02;
! hadoop fs -mkdir -p /tmp/t_drop_partitions/pt=20230101/hr=03;
! hadoop fs -mkdir -p /tmp/t_drop_partitions/pt=20230101/hr=04;

show partitions t_drop_partitions;
显示有 3 个分区

hive>  show partitions t_drop_partitions;
OK
pt=20230101/hr=00
pt=20230101/hr=01
pt=20230101/hr=02

 ! hadoop fs -ls /tmp/t_drop_partitions/pt=20230101;Found 2 items
drwxr-xr-x   - hive hadoop          0 2023-04-23 14:52 /tmp/t_drop_partitions/pt=20230101/hr=03
drwxr-xr-x   - hive hadoop          0 2023-04-23 14:52 /tmp/t_drop_partitions/pt=20230101/hr=04

2. MSCK TABLE

2.1 MSCK TABLE 没有选项

执行初始化SQL。

hive> MSCK TABLE t_drop_partitions;
OK
Partitions not in metastore:	t_drop_partitions:pt=20230101/hr=03	t_drop_partitions:pt=20230101/hr=04
Partitions missing from filesystem:	t_drop_partitions:pt=20230101/hr=00	t_drop_partitions:pt=20230101/hr=01	t_drop_partitions:pt=20230101/hr=02
Time taken: 0.094 seconds, Fetched: 2 row(s)

MSCK TABLE 不做任何操作，仅仅显示 metastore 和文件系统中不匹配的分区。
Partitions not in metastore: 这些分区存在于文件系统，但是不在 Metastore 。
Partitions missing from filesystem: 这些存在于Metastore，但是不在文件系统。

show partitions 还是显示3个分区

hive> show partitions t_drop_partitions;
OK
pt=20230101/hr=00
pt=20230101/hr=01
pt=20230101/hr=02
Time taken: 0.105 s

2.2 MSCK REPAIR TABLE

MSCK [REPAIR] TABLE 对表的分区进行修复。后面可以根修复的选项，默认是 ADD。
ADD PARTITIONS: 把文件系统上存在，metastore 上不存在的分区添加到 metastore。
DROP PARTITIONS: 把 metastore 上存在，文件系统上不存在的分区从 metastore 删除。
SYNC PARTITIONS: 代表 ADD PARTITIONS 和 DROP PARTITIONS 同时选中。

2.2. 1ADD PARTITIONS

执行初始化SQL。
执行 MSCK REPAIR TABLE。

hive>  MSCK REPAIR TABLE t_drop_partitions;
OK
Partitions not in metastore:	t_drop_partitions:pt=20230101/hr=03	t_drop_partitions:pt=20230101/hr=04
Partitions missing from filesystem:	t_drop_partitions:pt=20230101/hr=00	t_drop_partitions:pt=20230101/hr=01	t_drop_partitions:pt=20230101/hr=02
Repair: Added partition to metastore t_drop_partitions:pt=20230101/hr=03
Repair: Added partition to metastore t_drop_partitions:pt=20230101/hr=04

Repair: Added partition to metastore: 部分显示了 metastore 添加了哪些分区。

hive> show partitions t_drop_partitions;
OK
pt=20230101/hr=00
pt=20230101/hr=01
pt=20230101/hr=02
pt=20230101/hr=03
pt=20230101/hr=04

现在有5个分区。

2.2. DROP PARTITIONS

执行初始化SQL。
执行 MSCK REPAIR TABLE xxx DROP PARTITIONS。

hive>  MSCK REPAIR TABLE t_drop_partitions DROP PARTITIONS;
OK
Partitions not in metastore:	t_drop_partitions:pt=20230101/hr=03	t_drop_partitions:pt=20230101/hr=04
Partitions missing from filesystem:	t_drop_partitions:pt=20230101/hr=00	t_drop_partitions:pt=20230101/hr=01	t_drop_partitions:pt=20230101/hr=02
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=02
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=01
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=00

Repair: Dropped partition: 部分显示了从metastore 删除了哪些分区。

hive> show partitions t_drop_partitions;
OK
Time taken: 0.066 seconds

现在有0个分区。

2.3. SYNC PARTITIONS

执行初始化SQL。
执行 MSCK REPAIR TABLE xxx SYNC PARTITIONS。

Partitions not in metastore:	t_drop_partitions:pt=20230101/hr=03	t_drop_partitions:pt=20230101/hr=04
Partitions missing from filesystem:	t_drop_partitions:pt=20230101/hr=00	t_drop_partitions:pt=20230101/hr=01	t_drop_partitions:pt=20230101/hr=02
Repair: Added partition to metastore t_drop_partitions:pt=20230101/hr=03
Repair: Added partition to metastore t_drop_partitions:pt=20230101/hr=04
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=00
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=02
Repair: Dropped partition from metastore test.t_drop_partitions:pt=20230101/hr=01
Time taken: 0.225 seconds, Fetched: 7 row(s)

Repair: Added partition to metastore: 显示了在 metastore 添加了哪些分区。
Repair: Dropped partition from metastore : 显示了从metastore 删除了哪些分区。

hive> show partitions t_drop_partitions;
OK
Time taken: 0.066 seconds

现在有0个分区。

Hive msck 描述

1. 背景

1.1 初始化SQL

2. MSCK TABLE

2.1 MSCK TABLE 没有选项

2.2 MSCK REPAIR TABLE

2.2. 1ADD PARTITIONS

2.2. DROP PARTITIONS

2.3. SYNC PARTITIONS

相关文章

vue实现多布局模式

MobTech MobPush|A/B测试提升运营决策

类加载和实例化

算法——分布式——一致性哈希、一致性hash图解动画

带你打开GCC的大门

DATAFAKER 使用方法记录

Python 读写 CSV 数据

为什么企业要做大规模敏捷？