Hive安装保姆级安装教程

1、内嵌模式

上传 压缩包  /opt/modules
解压：
tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt/installs/
重命名：
mv apache-hive-3.1.2-bin/ hive
配置环境变量：vi /etc/profileexport HIVE_HOME=/opt/installs/hiveexport PATH=$HIVE_HOME/bin:$PATH
刷新环境变量：
source /etc/profile
配置hive-env.sh
进入这个文件夹下：/opt/installs/hive/conf
cp hive-env.sh.template hive-env.sh
修改hive-env.sh 中的内容：
export HIVE_CONF_DIR=/opt/installs/hive/conf
export JAVA_HOME=/opt/installs/jdk
export HADOOP_HOME=/opt/installs/hadoop
export HIVE_AUX_JARS_PATH=/opt/installs/hive/lib进入到conf 文件夹下，修改这个文件hive-site.xml
cp hive-default.xml.template hive-site.xml
接着开始修改：
把Hive-site.xml 中所有包含${system:java.io.tmpdir}替换成/opt/installs/hive/tmp。
如果系统默认没有指定系统用户名,那么要把配置${system:user.name}替换成当前用户名root。

在这里插入图片描述
启动集群：

start-dfs.sh
start-yarn.sh

给hdfs创建文件夹：

[root@bigdata01 conf] hdfs dfs -mkdir -p /user/hive/warehouse 
[root@bigdata01 conf] hdfs dfs -mkdir -p /tmp/hive/ 
[root@bigdata01 conf] hdfs dfs -chmod 750 /user/hive/warehouse 
[root@bigdata01 conf] hdfs dfs -chmod 777 /tmp/hive

初始化元数据，因为是内嵌模式，所以使用的数据库是derby

schematool --initSchema -dbType derby

在hive-site.xml中，3215行，96列的地方有一个非法字符
将这个非法字符，删除，保存即可。
在这里插入图片描述
需要再次进行元数据的初始化操作：

   schematool --initSchema -dbType derby

提示初始化成功！
初始化操作要在hive的家目录执行，执行完毕之后，会出现一个文件夹：
在这里插入图片描述
测试是否成功：

输入hive  进入后，可以编写sql
hive> show databases;
OK
default

3、测试内嵌模式

-- 进入后可以执行下面命令进行操作：
hive>show databases;          -- 查看数据库
hive>show tables;                   -- 查看表
-- 创建表
hive> create table dog(id int,name string);
hive> select * from dog;
hive> insert into dog values(1,'wangcai');
hive> desc dog; -- 查看表结构
hive> quit; -- 退出

2、本地模式–最常使用的模式

第一步：检查你的mysql是否正常

systemctl status mysqld

第二步：删除以前的derby数据

进入到hive中，删除 
rm -rf metastore_db/ derby.log

hivesitexml_93">第三步：修改配置文件 hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--Licensed to the Apache Software Foundation (ASF) under one or morecontributor license agreements.  See the NOTICE file distributed withthis work for additional information regarding copyright ownership.The ASF licenses this file to You under the Apache License, Version 2.0(the "License"); you may not use this file except in compliance withthe License.  You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.
--><configuration><!--配置MySql的连接字符串-->
<property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value><description>JDBC connect string for a JDBC metastore</description>
</property>
<!--配置MySql的连接驱动-->
<property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.cj.jdbc.Driver</value><description>Driver class name for a JDBC metastore</description>
</property>
<!--配置登录MySql的用户-->
<property><name>javax.jdo.option.ConnectionUserName</name><value>root</value><description>username to use against metastore database</description>
</property>
<!--配置登录MySql的密码-->
<property><name>javax.jdo.option.ConnectionPassword</name><value>123456</value><description>password to use against metastore database</description>
</property>
<!-- 以下两个不需要修改，只需要了解即可 -->
<!-- 该参数主要指定Hive的数据存储目录  -->
<property><name>hive.metastore.warehouse.dir</name><value>/user/hive/warehouse</value></property>
<!-- 该参数主要指定Hive的临时文件存储目录  --><property><name>hive.exec.scratchdir</name><value>/tmp/hive</value></property></configuration>

将mysql的驱动包，上传至 hive 的lib 文件夹下
在这里插入图片描述
初始化元数据（本质就是在mysql中创建数据库，并且添加元数据）

schematool --initSchema -dbType mysql

测试：同时打开两个窗口都可以使用，支持多个会话。

create database mydb01;
use mydb01;create table stu (id int,name string);insert 语句 走MR任务
insert into stu values(1,'wangcai');
select * from stu;
select * from stu limit 10; 
不走MR任务。
创建表的时候，varchar类型需要指定字符长度，否则报错！

3、Hive的远程模式

1、创建临时目录

[root@bigdata01 ~] cd /opt/installs/hive/
[root@bigdata01 hive] mkdir iotmp
[root@bigdata01 hive] chmod 777 iotmp

2、前期准备工作

hive-site.xml

<!--Hive工作的本地临时存储空间-->
<property><name>hive.exec.local.scratchdir</name><value>/opt/installs/hive/iotmp/root</value>
</property>
<!--如果启用了日志功能，则存储操作日志的顶级目录-->
<property><name>hive.server2.logging.operation.log.location</name><value>/opt/installs/hive/iotmp/root/operation_logs</value>
</property>
<!--Hive运行时结构化日志文件的位置-->
<property><name>hive.querylog.location</name><value>/opt/installs/hive/iotmp/root</value>
</property>
<!--用于在远程文件系统中添加资源的临时本地目录-->
<property><name>hive.downloaded.resources.dir</name><value>/opt/installs/hive/iotmp/${Hive.session.id}_resources</value>
</property>

修改 core-site.xml【hadoop】的

<property><name>hadoop.proxyuser.root.hosts</name><value>*</value>
</property>
<property><name>hadoop.proxyuser.root.groups</name><value>*</value>
</property>
<property><name>hadoop.http.staticuser.user</name><value>root</value>
</property>
<!-- 不开启权限检查 -->
<property><name>dfs.permissions.enabled</name><value>false</value>
</property>

修改集群的三个core-site.xml，记得修改一个，同步一下，并且重启hdfs

xsync.sh core-site.xml
stop-dfs.sh
start-dfs.sh

3、开始配置远程服务（两个）

1）配置hiveserver2服务
修改hive-site.xml

    <property><name>hive.server2.thrift.bind.host</name><value>bigdata01</value><description>Bind host on which to run the HiveServer2 Thrift service.</description></property><property><name>hive.server2.thrift.port</name><value>10000</value><description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description></property>

可以启动：

1. 该服务端口号默认是10000
2. 可以单独启动此服务进程，供远程客户端连接；此服务内置metastore服务。
3. 启动方式：方法1：直接调用hiveserver2。会进入监听状态不退出。方法2：hive --service hiveserver2 &    # 进入后台启动方法3：nohup hive --service hiveserver2 >/dev/null 2>&1 & #信息送入黑洞。

演示第一种启动方式：hiveserver2
在这里插入图片描述
可以使用beeline进行测试：

连接方式：
方式1：step1. beeline 回车step2. !connect jdbc:hive2://bigdata01:10000 回车step3. 输入用户名 回车 数据库用户名step4. 输入密码 回车  此处的密码是数据库密码
方法2(直连)：beeline -u jdbc:hive2://bigdata01:10000 -n 用户名
解析: hive2，是Hive的协议名称ip:  Hiveserver2服务所在的主机IP。10000，是Hiveserver2的端口号
退出：Ctrl+ C 可以退出客户端

2）metastore 服务
metastore服务意义：为别人连接mysql元数据提供服务的。

警告：
假如 hive 直接进入的，操作了数据库，其实底层已经帮助创建了一个metastore服务器，可能叫ms01
通过hiveserver2 运行的命令，默认底层帮你创建了一个metastore服务器，可能叫ms02，假如有很多人连接我的mysql，就会有很多个metastore，非常的占用资源。
解决方案就是：配置一个专门的metastore,只有它可以代理mysql服务，别人必须经过它跟mysql进行交互。这样解决内存。
警告：只要配置了metastore以后，必须启动，否则报错！

修改hive-site.xml

   修改hive-site.xml的配置注意：想要连接metastore服务的客户端必须配置如下属性和属性值<property><name>hive.metastore.uris</name> <value>thrift://bigdata01:9083</value></property>解析：thrift:是协议名称ip为metastore服务所在的主机ip地址9083是默认端口号

启动方式：

方法1：hive --service metastore &
方法2：nohup hive --service metastore 2>&1 >/dev/null &  #信息送入黑洞。解析：2>&1 >/dev/null   意思就是把错误输出2重定向到标准输出1，也就是屏幕，标准输出进了“黑洞”，也就是标准输出进了黑洞，错误输出打印到屏幕。Linux系统预留可三个文件描述符：0、1和2，他们的意义如下所示：0——标准输入（stdin）-- System.in1——标准输出（stdout）--System.out2——标准错误（stderr） --System.err

测试：

没有启动metastore 服务器之前，hive进入报错！
hive> show databases;
FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient启动之后，直接测试，发现可以使用。
hive> show databases;
OK
default
Time taken: 1.211 seconds, Fetched: 1 row(s)

一个启动脚本
经常启动metastore 以及hiveserver2这两个服务，命令有点长，为了长期使用，可以编写一个命令：

vim /usr/local/sbin/h-server

#!/bin/bash# hive 服务控制脚本，可以控制 Hive 的 metastore 和 hiveserver2 服务的启停
# 使用方式: hive-server-manager.sh [start|stop|status] [metastore|hiveserver2]
#   - start  : 一键开启metastore和hiveserver2服务，也可以指定服务开启
#   - stop   : 一键停止metastore和hiveserver2服务，也可以指定服务停止
#   - status : 一键查看metastore和hiveserver2服务，也可以指定服务查看help_info() {echo "+---------------------------------------------------------------------------------+"echo "|             本脚本可以一键控制 Hive 的 metastore 和 hiveserver2 服务            |"echo "| 使用方式: hive-server-manager.sh [start|stop|status] [metastore|hiveserver2]    |"echo "+---------------------------------------------------------------------------------+"echo "| 第一个参数用来指定操作命令，可以选择 开始(start)、停止(stop)、状态查看(status)  |"echo "| 第二个参数用来指定操作的服务，可以选择 metastore、hiveserver2，默认为全部       |"echo "+---------------------------------------------------------------------------------+"echo "|     - start  : 一键开启metastore和hiveserver2服务，也可以指定服务开启           |"echo "|     - stop   : 一键停止metastore和hiveserver2服务，也可以指定服务停止           |"echo "|     - status : 一键查看metastore和hiveserver2服务，也可以指定服务查看           |"echo "+---------------------------------------------------------------------------------+"exit -1
}# 获取操作命令
op=$1
# 获取操作的服务
server=$2# 检查参数是否正确
if [ ! $op ]; thenhelp_info
elif [ $op != "start" -a $op != "stop" -a $op != "status" ]; thenhelp_info
fi# 检查进程状态
metastore_pid=`ps aux | grep org.apache.hadoop.hive.metastore.HiveMetaStore | grep -v grep | awk '{print $2}'`
hiveserver2_pid=`ps aux | grep proc_hiveserver2 | grep -v grep | awk '{print $2}'`# 检查日志文件夹的存在情况，如果不存在则创建这个文件夹
log_dir=/var/log/my_hive_log
if [ ! -e $log_dir ]; thenmkdir -p $log_dir
fi
# 开启服务
start_metastore() {# 检查是否开启，如果未开启，则开启 metastore 服务if [ $metastore_pid ]; thenecho "metastore   服务已经开启，进程号: $metastore_pid，已跳过"elsenohup hive --service metastore >> $log_dir/metastore.log 2>&1 &echo "metastore   服务已经开启，日志输出在 $log_dir/metastore.log"fi
}
start_hiveserver2() {# 检查是否开启，如果未开启，则开启 hiveserver2 服务if [ $hiveserver2_pid ]; thenecho "hiveserver2 服务已经开启，进程号: $hiveserver2_pid，已跳过"elsenohup hive --service hiveserver2 >> $log_dir/hiveserver2.log 2>&1 &echo "hiveserver2 服务已经开启，日志输出在 $log_dir/hiveserver2.log"fi 
}# 停止服务
stop_metastore() {if [ $metastore_pid ]; thenkill -9 $metastore_pidfiecho "metastore   服务已停止" 
}
stop_hiveserver2() {if [ $hiveserver2_pid ]; then kill -9 $hiveserver2_pidfiecho "hiveserver2 服务已停止"
}# 查询服务
status_metastore() {if [ $metastore_pid ]; thenecho "metastore   服务已开启，进程号: $metastore_pid"elseecho "metastore   服务未开启"fi
}
status_hiveserver2() {if [ $hiveserver2_pid ]; thenecho "hiveserver2 服务已开启，进程号: $hiveserver2_pid"elseecho "hiveserver2 服务未开启"fi
}# 控制操作
if [ ! $server ]; then${op}_metastore${op}_hiveserver2
elif [ $server == "metastore" ]; then${op}_metastore
elif [ $server == "hiveserver2" ]; then${op}_hiveserver2
elseecho "服务选择错误"help_info
fi

 cd /usr/local/sbinchmod u+x h-server

在这里插入图片描述