目录
- 1. 引言
- 1.1 Hadoop
- 1.2 HBase
- 1.3 Hive
- 2. Beeline
- 2.1 使用Beeline访问Hive
- 2.1.1 通过beeline直接连接Hive
- 2.1.2 先进入beeline客户端再连接Hive
- 2.1.3 先进入beeline客户端再连接MySQL
- 2.2 Beeline命令
- 3. Hive JDBC
- 3.1 pom.xml中依赖配置
- 3.2 Util工具类
- 3.3 代码
- 3.4 结果
- 参考
1. 引言
尽管Hadoop和HBase都提供了对应的启动和停止脚本,但是启动的过程都很繁琐。而Hive没有提供对应的启动和停止脚本,因此它的启动和停止比Hadoop和HBase更加困难。下面将展示如何把Hadoop、HBase和Hive配置成系统服务,然后通过service name start[/stop]
进行启停。
创建Hadoop、HBase、Hive服务脚本的命令:
cd /etc/init.d
touch hadoop hbase hive
chmod +x hadoop hbase hive
1.1 Hadoop
用vim hadoop
编辑hadoop,输入如下内容后,然后按ESC输入:wq!保存。
#!/bin/bash
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=rootLOG_FILE=/var/log/hadoop-service.log
RED='\033[0;31m'
NC='\033[0m'case "$1" in
start)echo "$(date '+%Y-%m-%d %H:%M:%S') Starting Hadoop cluster..." >>$LOG_FILEif start-dfs.sh >>$LOG_FILE 2>&1; thenif start-yarn.sh >>$LOG_FILE 2>&1; thenecho "Cluster started successfully."elseecho "${RED}YARN failed to start, Check $LOG_FILE.${NC}"exit 1fielseecho "${RED}HDFS failed to start, Check $LOG_FILE.${NC}"exit 1fi;;
stop)echo "$(date '+%Y-%m-%d %H:%M:%S') Stopping Hadoop cluster..." >>$LOG_FILEstop-yarn.sh >>$LOG_FILE 2>&1stop-dfs.sh >>$LOG_FILE 2>&1echo "Cluster stopped.";;
*)echo "Usage: $0 {start|stop}."exit 1;;
esacexit 0
1.2 HBase
用vim hbase
编辑hbase,输入如下内容,然后按ESC输入:wq!保存。由于HBase依赖Hadoop,因此在HBase服务启动之前必须检查Hadoop是否启动,下面脚本中的check_dependencies函数实现了检查Hadoop是否启动的功能。
(下面的Hive同样依赖于Hadoop,同样需要该函数)
#!/bin/bash
LOG_FILE=/var/log/hbase-service.log
RED='\033[0;31m'
NC='\033[0m'check_dependencies() {if ! hdfs dfsadmin -report &>/dev/null; thenecho -e "${RED}Error: HDFS didn't run!${NC}"return 1fiif ! yarn node -list &>/dev/null; thenecho -e "${RED}Error: YARN didn't run!${NC}"return 1fi
}case "$1" in
start)if ! check_dependencies; thenexit 1fiecho "$(date '+%Y-%m-%d %H:%M:%S') Starting HBase cluster..." >>$LOG_FILEif start-hbase.sh >>$LOG_FILE 2>&1; thenecho "HBase started successfully."elseecho "${RED}HBase failed to start, Check $LOG_FILE.${NC}"exit 1fi;;
stop)echo "$(date '+%Y-%m-%d %H:%M:%S') Stopping HBase cluster..." >>$LOG_FILEstop-hbase.sh >>$LOG_FILE 2>&1echo "HBase stopped.";;
*)echo "Usage: $0 {start|stop}."exit 1;;
esacexit 0
1.3 Hive
由于Hive没有官方提供的启动和停止脚本,因此Hive服务脚本比上面两个更加复杂。因为在使用Beeline访问Hive之前,必须启动MetaStore和HiveServer2。官方提供的启动方法是:
nohup hive --service metastore &
nohup hive --service hiveserver2 &
接着通过运行jps命令,就会发现多了两个RunJar进程——它们就是MetaStore和Hiveserver2。而停止的方法是通过jps命令查看这两个进程的pid——即jps命令显示出的第一列中的数据,然后通过kill -9 pid1 pid2
(pid1,pid2代指这两个RunJar的pid)。因此Hive的启动与停止非常繁琐。
用vim hive
编辑hive,输入如下内容,然后按ESC输入:wq!保存。下面的check_dependencies函数实现了检查Hadoop是否启动的功能,start_metastore函数实现了MetaStore服务的启动,start_hiveserver2实现了HiveServer2服务的启动,stop_process实现了这两个服务的停止,脚本中停止这两个服务的具体调用命令是:
stop_process "MetaStore" "metastore"
stop_process "HiveServer2" "hiveserver2"
此外,在start_metastore和start_hiveserver2中通过nc命令来检查MetaStore和HiveServer2的指定端口是否开放,以达到判断这两个服务是否启动的目的。并且由于端口开放可能存在延迟的原因,在这两个函数设置RETRY_TIMES来进行多次判断。
#!/bin/bash
LOG_DIR=/var/log/hive
if [ ! -d $LOG_DIR ]; thenmkdir -p $LOG_DIR
fiHOST=$(hostname -I)
START_LOG=$LOG_DIR/hive-start.log
STOP_LOG=$LOG_DIR/hive-stop.log
METASTORE_PORT=9083
HIVESERVER2_PORT=10000
RETRY_TIMES=3RED='\033[0;31m'
NC='\033[0m'check_dependencies() {if ! nc --version &>/dev/null; thenecho -e "${RED}Please use apt to install ncat!${NC}"return 1fiif ! pgrep --version &>/dev/null; thenecho -e "${RED}Please use apt to install procps-ng!${NC}"return 1fiif ! hdfs dfsadmin -report &>/dev/null; thenecho -e "${RED}Error: HDFS didn't run!${NC}"return 1fiif ! yarn node -list &>/dev/null; thenecho -e "${RED}Error: YARN didn't run!${NC}"return 1fi
}start_metastore() {local retry=0echo "$(date '+%Y-%m-%d %H:%M:%S') Starting metastore service..." >>$START_LOGnohup hive --service metastore -p $METASTORE_PORT >>$START_LOG 2>&1 &while [ $retry -lt $RETRY_TIMES ]; dosleep 5if nc -z $HOST $METASTORE_PORT; thenreturn 0fi((retry++))doneif [ $retry -eq $RETRY_TIMES ]; thenecho -e "${RED}MetaStore failed to start, Check $START_LOG.${NC}"return 1fi
}start_hiveserver2() {local retry=0echo "$(date '+%Y-%m-%d %H:%M:%S') Starting hiveserver2 service...)" >>$START_LOGnohup hive --service hiveserver2 >>$START_LOG 2>&1 &while [ $retry -le $RETRY_TIMES ]; dosleep 5if nc -z $HOST $HIVESERVER2_PORT; thenreturn 0fi((retry++))doneif [ $retry -eq $RETRY_TIMES ]; thenecho -e "${RED}HiveServer2 failed to start, Check $START_LOG.${NC}"return 1fi
}stop_process() {local name=$1local pattern=$2local pidpid=$(pgrep -f "$pattern")if [[ -n $pid ]]; thenecho "Stopping $name..." >>$STOP_LOGkill $pidsleep 10if pgrep -f "$pattern" &>/dev/null; thenkill -9 $pidfifi
}case "$1" in
start)echo "$(date '+%Y-%m-%d %H:%M:%S') Starting Hive Cluster..." >>$START_LOGif ! check_dependencies; thenexit 1fiif ! start_metastore; thenexit 1fiif ! start_hiveserver2; thenexit 1fiecho -e "Hive started successfully.";;
stop)echo "$(date '+%Y-%m-%d %H:%M:%S') Stopping Hive Cluster..." >>$STOP_LOGstop_process "MetaStore" "metastore"stop_process "HiveServer2" "hiveserver2"echo "Hive stopped.";;
*)echo "Usage: $0 {start|stop}"exit 1;;
esacexit 0
2. Beeline
Beeline是Hive目前的客户端,可以通过JDBC连接远程服务。
2.1 使用Beeline访问Hive
首先必须启动MetaStore服务和HiveServer2服务,如果你在/etc/init.d中配置了上面提到的Hadoop、HBase、Hive服务的话,就可以通过service hadoop start && service hive start
来启动MetaStore服务和HiveServer2服务。
2.1.1 通过beeline直接连接Hive
beeline -u 'jdbc:hive2://172.18.0.2:10000' -n root
,退出的话需要输入beeline命令!exit
。
2.1.2 先进入beeline客户端再连接Hive
先输入hive
进入beeline客户端,然后输入!connect 'jdbc:hive2://172.18.0.2:10000
,接着输入对应的用户名root和密码123456,最后同样通过!exit
退出。
2.1.3 先进入beeline客户端再连接MySQL
输入!connect 'jdbc:mysql://172.18.0.3:3306'
,然后输入用户名root和密码123456,最后用!close
结束连接。
2.2 Beeline命令
3. Hive JDBC
3.1 pom.xml中依赖配置
<dependencies><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.11</version><scope>test</scope></dependency><dependency><groupId>org.apache.hive</groupId><artifactId>hive-service</artifactId><version>4.0.1</version><exclusions><exclusion><groupId>org.apache.logging.log4j</groupId><artifactId>log4j-slf4j-impl</artifactId></exclusion></exclusions></dependency><dependency><groupId>org.apache.hive</groupId><artifactId>hive-jdbc</artifactId><version>4.0.1</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client-runtime</artifactId><version>3.3.6</version></dependency></dependencies>
3.2 Util工具类
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;public class Util {public static Connection getConnection(String driverName, String url, String user, String password) throws ClassNotFoundException, SQLException {Class.forName(driverName);Connection conn = DriverManager.getConnection(url, user, password);return conn;}public static Statement getStatement(Connection conn) throws SQLException {return conn.createStatement();}public static void closeStatement(Statement stmt) throws SQLException {stmt.close();}public static void closeConnection(Connection conn) throws SQLException {conn.close();}
}
3.3 代码
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
import java.sql.Statement;public class App {public static void main(String[] args) throws ClassNotFoundException, SQLException {Connection conn = Util.getConnection("org.apache.hive.jdbc.HiveDriver", "jdbc:hive2://172.18.0.2:10000", "root","123456");Statement stmt = Util.getStatement(conn);stmt.execute("drop table if exists users");stmt.execute("create table users (user_id int, fname string, lname string) row format delimited fields terminated by ','");stmt.execute("insert into users (user_id, fname, lname) values(222, 'yang', 'yang2')");stmt.execute("load data local inpath '/root/CodeProject/hive-advance/data.txt' into table users");String sql = "select * from users";ResultSet res = stmt.executeQuery(sql);ResultSetMetaData meta = res.getMetaData();for (int i = 1; i <= meta.getColumnCount(); i++) {System.out.print(meta.getColumnName(i) + "\t");}System.out.println();while (res.next()) {System.out.print(res.getInt(1) + "\t\t");System.out.print(res.getString(2) + "\t\t");System.out.print(res.getString(3));System.out.println();}sql = "show tables";System.out.println("Running: " + sql);res = stmt.executeQuery(sql);while (res.next()) {System.out.print(res.getString(1));}System.out.println("");sql = "desc users";System.out.println("Running: " + sql);res = stmt.executeQuery(sql);while (res.next()) {System.out.println(res.getString(1) + "\t" + res.getString(2));}Util.closeStatement(stmt);Util.closeConnection(conn);}
}
3.4 结果
参考
吴章勇 杨强著 大数据Hadoop3.X分布式处理实战
https://stackoverflow.com/questions/60943753/new-hiveconf-exception-noclassdeffounderror-com-ctc-wstx-io-inputbootstrapper通过添加hadoop-client-runtime依赖解决了这个问题。