kyuubi、sparksql部署实战与连接

news/2024/11/14 15:06:41/

一、下载spark和kyuubi的软件包

spark官网下载

https://spark.apache.org/downloads.html

kyuubi官网下载

https://www.apache.org/dyn/closer.lua/kyuubi/kyuubi-1.9.0/apache-kyuubi-1.9.0-bin.tgz

二、部署spark

1、spark配置spark-env.sh

YARN_CONF_DIR=/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/etc/hadoop

2、spark中使用hive元数据,需添加hive的hive-site.xml

三、配置kyuubi环境

1、kyuubi-defaults.conf

y
kyuubi.frontend.bind.host                bigdata30
kyuubi.frontend.protocols                THRIFT_BINARY,REST
kyuubi.frontend.thrift.binary.bind.port  10009
# kyuubi.frontend.rest.bind.port           10099
#
kyuubi.engine.type                       SPARK_SQL
kyuubi.engine.share.level                USER
# kyuubi.session.engine.initialize.timeout PT3M# 高可用
kyuubi.ha.enabled                           true
kyuubi.ha.client.class                      org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient kyuubi.ha.addresses                         bigdata30:2181,bigdata31:2181,bigdata32:2181
kyuubi.ha.namespace                         kyuubi# 如果启动了kerberos需要配置如下# kyuubi.ha.zookeeper.auth.type               KERBEROSkyuubi.ha.zookeeper.auth.principal          zookeeper/_HOST@HADOOP.COM
kyuubi.ha.zookeeper.auth.keytab             /etc/security/keytabs/zookeeper.keytab# kyuubi 启动kerberos认证配置
kyuubi.authentication                       KERBEROS
kyuubi.kinit.principal                      hive/_HOST@HADOOP.COM
kyuubi.kinit.keytab                         /etc/security/keytabs/hive.keytab #kyuuibi pool
kyuubi.backend.engine.exec.pool.size  30
kyuubi.backend.engine.exec.pool.wait.queue.size  100#spark
spark.master           yarn
# spark.driver.memory    2g
# spark.executor.memory  4g
# spark.driver.cores     1
# spark.executor.cores   3#spark sql优化
spark.sql.adaptive.enabled              true
spark.sql.adaptive.forceApply              false
spark.sql.adaptive.logLevel              info
spark.sql.adaptive.advisoryPartitionSizeInBytes              256m
spark.sql.adaptive.coalescePartitions.enabled              true
spark.sql.adaptive.coalescePartitions.minPartitionNum              1
spark.sql.adaptive.coalescePartitions.initialPartitionNum              1
spark.sql.adaptive.fetchShuffleBlocksInBatch              true
spark.sql.adaptive.localShuffleReader.enabled              true
spark.sql.adaptive.skewJoin.enabled              true
spark.sql.adaptive.skewJoin.skewedPartitionFactor              5
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes              400m
spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin              0.2
# spark.sql.adaptive.optimizer.excludedRules
spark.sql.autoBroadcastJoinThreshold              -1
# Details in https://kyuubi.readthedocs.io/en/master/configuration/settings.html# #静态资源申请
# spark.executor.instances      2
# spark.executor.cores          2
# spark.executor.memory         2g# 动态资源申请
spark.dynamicAllocation.enabled              true
# # ##false if prefer shuffle tracking than ESS
# spark.shuffle.service.enabled              true
spark.dynamicAllocation.initialExecutors              1
spark.dynamicAllocation.minExecutors              1
spark.dynamicAllocation.maxExecutors              5
# spark.executor.cores 3
# spark.exevutor.memory 4g
spark.dynamicAllocation.executorAllocationRatio              0.5
spark.dynamicAllocation.executorIdleTimeout              60s
spark.dynamicAllocation.cachedExecutorIdleTimeout              30min
# true if prefer shuffle tracking than ESS
spark.dynamicAllocation.shuffleTracking.enabled              true
spark.dynamicAllocation.shuffleTracking.timeout              30min
spark.dynamicAllocation.schedulerBacklogTimeout              1s
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout              1s
spark.cleaner.periodicGC.interval              5min
# # For a user named kent
# ___hive___.spark.dynamicAllocation.maxExecutors  10

2、kyuubi-env.sh

export JAVA_HOME=/usr/java/jdk1.8.0_131
export SPARK_HOME=/home/soft/spark-3.5.1-bin-hadoop3
# export FLINK_HOME=/opt/flink
export HIVE_HOME=/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hive
# export FLINK_HADOOP_CLASSPATH=/path/to/hadoop-client-runtime-3.3.2.jar:/path/to/hadoop-client-api-3.3.2.jar
# export HIVE_HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/common/lib/commons-collections-3.2.2.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-runtime-3.1.0.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-api-3.1.0.jar:${HADOOP_HOME}/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar
export HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop
export YARN_CONF_DIR=/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/etc/hadoop
export KYUUBI_JAVA_OPTS="-Xmx10g -XX:MaxMetaspaceSize=512m -XX:MaxDirectMemorySize=1024m -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:+UseCondCardMark -XX:+UseGCOverheadLimit -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -verbose:gc -Xloggc:./logs/kyuubi-server-gc-%t.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=20M"
export KYUUBI_BEELINE_OPTS="-Xmx2g -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+UseCondCardMark"

3、创建keytab文件到配置指定路径

如果是高可用的每个机器放置好keytab文件

[root@bigdata31 ~]# ll /etc/security/keytabs/
总用量 12
-rw-r--r-- 1 root root  970 4月  28 23:12 hive.keytab
-rw-r--r-- 1 root root 1040 4月  28 21:47 zookeeper.keytab

4、启动与关闭

sudo -u hive bin/kyuubi start
sudo -u hive bin/kyuubi stop
或则
sudo -u hive bin/kyuubi restart

四、测试连接

1、beline连接

1.1、非ha方式

[root@bigdata30 apache-kyuubi-1.9.0-bin]# beeline -u 'jdbc:hive2://bigdata30:10009/;principal=hive/_HOST@HADOOP.COM'
Connecting to jdbc:hive2://bigdata30:10009/;principal=hive/_HOST@HADOOP.COM
Connected to: Spark SQL (version 3.5.1)
Driver: Hive JDBC (version 2.1.1-cdh6.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.1.1-cdh6.2.1 by Apache Hive
0: jdbc:hive2://bigdata30:10009/> 

1.2、ha方式连接

beeline -u 'jdbc:hive2://bigdata30:2181,bigdata31:2181,bigdata32:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi;principal=hive/_HOST@HADOOP.COM'

[root@bigdata30 apache-kyuubi-1.9.0-bin]# beeline -u 'jdbc:hive2://bigdata30:2181,bigdata31:2181,bigdata32:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi;principal=hive/_HOST@HADOOP.COM'
Connecting to jdbc:hive2://bigdata30:2181,bigdata31:2181,bigdata32:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi;principal=hive/bigdata30@HADOOP.COM
24/04/28 22:56:40 [main]: INFO jdbc.HiveConnection: Connected to 10.8.3.30:10009
Connected to: Spark SQL (version 3.5.1)
Driver: Hive JDBC (version 2.1.1-cdh6.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.1.1-cdh6.2.1 by Apache Hive
0: jdbc:hive2://bigdata30:2181,bigdata31:2181>

2、dbeaver连接

软件包 hive-jdbc-uber-2.6.3.0-235.jar

2.1、非ha连接

url模板

jdbc:hive2://{host}[:{port}][/{database}];AuthMech=1;KrbRealm=HADOOP.COM;KrbHostFQDN={host};KrbServiceName={server};KrbAuthType=2;principal={user}/_HOST@HADOOP.COM

连接信息

jdbc:hive2://bigdata30:10009/default;AuthMech=1;KrbRealm=HADOOP.COM;KrbHostFQDN=bigdata30;KrbServiceName=hive;KrbAuthType=2;principal=hive/_HOST@HADOOP.COM

还有一种类似beline的连接方式,非常精简,看着清爽:

jdbc:hive2://{host}[:{port}][/{database}];principal={user}/_HOST@HADOOP.COM

jdbc:hive2://bigdata30:10009/default;principal=hive/_HOST@HADOOP.COM

注意数据库信息必须填写,否则会报错

2.2、ha连接


http://www.ppmy.cn/news/1451891.html

相关文章

【23-Scikit-learn最佳实践:项目案例分析与总结】

文章目录 前言项目案例分析背景数据准备数据预处理探索性数据分析(EDA)特征选择模型构建与评估处理不平衡数据集最终模型评估与部署Scikit-learn最佳实践总结前言 经过一系列深入浅出的探讨,我们已经通过了解Scikit-learn与Python的亲密关系,掌握了如何安装和使用Scikit-le…

好用的电子文档管理系统应具备哪些功能?

文档是我们工作生活的基本组成部分,但你可能没有注意到花了多少时间处理它们。 良好的电子文档管理可以节省时间并简化整个组织的流程,来进一步了解下吧! 文档管理软件的用例 电子文档管理系统,8Manage EDMS,高亚科技 文档管理不仅仅是组…

Android by viewModels()

在Android中,您可以使用ViewModel来管理UI相关的数据,而不会在配置更改(如旋转屏幕)后丢失数据。by viewModels()是一个Kotlin扩展函数,它允许您以类型安全的方式从Fragment或Activity中获取ViewModel实例。 以下是如…

深入理解人工智能:从基础到前沿技术

摘要: 人工智能(AI)是计算机科学的一个分支,它试图理解和构建智能实体。近年来,随着计算能力的提升和大数据的普及,AI技术取得了显著的进步,广泛应用于语音识别、图像处理、自然语言处理等领域…

汇编语言——比较两个字符串STRING1和STRING2所含字符是否完全相同,若相同则显示MATCH, 不相同则显示NO MATCH

CMPS 串比较指令: CMPS SRC, DST CMPSB (字节) CMPSW (字) 执行操作: ((SI)) - ((DI)) 根据比较结果置条件标志位:相等 ZF1;不等 ZF0 字节操作:(SI)←(SI)1, (DI)←(DI…

android zygote进程启动流程

一&#xff0c;启动入口 app_main.cpp int main(int argc, char* const argv[]) {if (!LOG_NDEBUG) {String8 argv_String;for (int i 0; i < argc; i) {argv_String.append("\"");argv_String.append(argv[i]);argv_String.append("\" ")…

无人机+无人车:自组网协同技术及应用前景详解

无人车&#xff0c;也被称为自动驾驶汽车、电脑驾驶汽车或轮式移动机器人&#xff0c;是一种通过电脑系统实现无人驾驶的智能汽车。这种汽车依靠人工智能、视觉计算、雷达、监控装置和全球定位系统协同合作&#xff0c;使得电脑可以在没有任何人类主动操作的情况下&#xff0c;…

【设计模式】工厂方法模式(Factory Method Pattern)

目录标题 工厂方法设计模式详解1. 介绍2. 结构3. 实现步骤3.1 创建抽象产品接口3.2 创建具体产品类3.3 创建抽象工厂接口3.4 创建具体工厂类3.5 客户端使用 4. 好处与优点5. 坏处与缺点6. 适用场景7. 总结 工厂方法设计模式详解 1. 介绍 工厂方法模式是一种创建型设计模式&am…