基于容器云提交spark job任务

news/2024/10/30 23:28:13/

容器云提交spark job任务

容器云提交Kind=Job类型的spark任务,首先需要申请具有Job任务提交权限的rbac,然后编写对应的yaml文件,通过spark内置的spark-submit命令,提交用户程序(jar包)到集群执行。

1、创建任务job提交权限rbac

创建rbac账户,并分配资源权限,Pod服务账户创建参考,kubernetes api查询命令(kubectl api-resources);

cat > ecc-recommend-rbac.yaml << EOF
---
apiVersion: v1
kind: Namespace
metadata:name: item-dev-recommendlabels:name: item-dev-recommend
---
#基于namespace创建服务账户spark-cdp
apiVersion: v1
kind: ServiceAccount
metadata:name: spark-cdpnamespace: item-dev-recommend---
#创建角色资源权限
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:name: spark-cdpnamespace: item-dev-recommend
rules:- apiGroups:- ""resources:- podsverbs:- '*'- apiGroups:- ""resources:- configmapsverbs:- '*'- apiGroups:- ""resources:- services- secretsverbs:- create- get- delete- apiGroups:- extensionsresources:- ingressesverbs:- create- get- delete- apiGroups:- ""resources:- nodesverbs:- get- apiGroups:- ""resources:- resourcequotasverbs:- get- list- watch- apiGroups:- ""resources:- eventsverbs:- create- update- patch- apiGroups:- apiextensions.k8s.ioresources:- customresourcedefinitionsverbs:- create- get- update- delete- apiGroups:- admissionregistration.k8s.ioresources:- mutatingwebhookconfigurations- validatingwebhookconfigurationsverbs:- create- get- update- delete- apiGroups:- sparkoperator.k8s.ioresources:- sparkapplications- scheduledsparkapplications- sparkapplications/status- scheduledsparkapplications/statusverbs:- '*'- apiGroups:- scheduling.volcano.shresources:- podgroups- queues- queues/statusverbs:- get- list- watch- create- delete- update- apiGroups:- batchresources:- cronjobs- jobsverbs:- '*'   ---
#服务账户spark-cdp绑定角色
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:name: spark-cdpnamespace: item-dev-recommend
roleRef:apiGroup: rbac.authorization.k8s.iokind: Rolename: spark-cdp
subjects:- kind: ServiceAccountname: spark-cdpEOF

2、spark pv,pvc

  • 构建pv
    挂载NFS,定义pv访问模式(accessModes)和存储容量(capacity);
cat >ecc-recommend-pv.yaml <<EOF
apiVersion: v1
kind: PersistentVolume
metadata:name: dev-cdp-pv01namespace: item-dev-recommend
spec:capacity:storage: 10GiaccessModes:#访问三种模式:ReadWriteOnce,ReadOnlyMany,ReadWriteMany- ReadWriteOncenfs:path: /data/nfsserver: 192.168.0.135EOF
  • 构建pvc
cat >ecc-recommend-pvc.yaml <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: dev-cdp-pvc01namespace: item-dev-recommend
spec:accessModes:#匹配模式- ReadWriteOnceresources:requests:storage: 10GiEOF

3、spark-submit任务提交

将java/scala程序包开发完成后,通过spark-submit命令提交jar包到集群执行。

cat >ecc-recommend-sparksubmit.yaml <<EOF
---
apiVersion: batch/v1
kind: Job
metadata:name: item-recommend-jobnamespace: item-dev-recommendlabels:k8s-app: item-recommend-job
spec:template:metadata:labels:k8s-app: item-recommend-jobspec:containers:name: item-recommend-job- args:- /opt/spark/bin/spark-submit- --class- com.www.ecc.com.recommend.ItemRecommender- --master- k8s://https:/$(KUBERNETES_SERVICE_HOST):$(KUBERNETES_SERVICE_PORT)- --name- item-recommend-job- --jars- /opt/spark/jars/spark-cassandra-connector_2.11-2.3.4.jar- --conf- spark.kubernetes.authenticate.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt- --conf- spark.kubernetes.authenticate.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token- --conf- spark.kubernetes.driver.limit.cores=3- --conf- spark.kubernetes.executor.limit.cores=8- --conf- spark.kubernetes.driver.limit.memory=5g- --conf- spark.kubernetes.executor.limit.memory=32g- --conf- spark.executor.instances=8- --conf- spark.sql.crossJoin.enable=true- --conf- spark.executor.cores=6- --conf- spark.executor.memory=32g- --conf- spark.driver.cores=3- --conf- spark.dirver.memory=5g- --conf- spark.sql.autoBroadcastJoinThreshold=-1- --conf- spark.kubernetes.namespace=item-dev-recommend- --conf- spark.driver.port=45970- --conf- spark.blockManager.port=45980- --conf- spark.kubernetes.container.image=acpimagehub.ecc.cn/spark:3.11- --conf- spark.executor.extraJavaOptions="-Duser.timezone=GMT+08:00"- --conf- spark.driver.extraJavaOptions="-Duser.timezone=GMT+08:00"- --conf- spark.default.parallelism=500- /odsdata/item-recommender-1.0.0-SNAPSHOT.jar- env:- name: SPARK_SHUFFLE_PARTITIONSvalue: "100"- name: CASSANDR_HOSTvalue: "192.168.0.1,192.168.0.2,192.168.0.3"- name: CASSANDRA_PORTvalue: "9042"- name: AUTH_USERNAMEvalue: "user"- name: AUTH_PASSWORDvalue: "123456"image: acpimagehub.ecc.cn/spark:3.11imagePullPolicy: IfNotPresentports:- containerPort: 9000name: 9000tcp2protocol: TCPresources:limits:cpu: "3"memory: 2Girequests:cpu: "3"memory: 2GivolumeMounts:- mountPath: /odsdataname: item-spark-pvcvolumes:- name: item-spark-pvcpersistentVolumeClaim:claimName: dev-cdp-pvc01dnsPolicy: ClusterFirstrestartPolicy: Neverhostname: item-recommend-jobsecurityContext: {}serviceAccountName: spark-cdp
---
apiVersion: v1
kind: Service
metadata:name: item-recommend-jobnamespace: item-dev-recommend
spec:type: NodePortports:- name: sparkjob-tcp4040port: 4040protocol: TCPtargetPort: 4040#spark driver port- name: sparkjob-tcp-45970port: 45970protocol: TCPtargetPort: 45970#spark ui- name: sparkjob-tcp-48080port: 48080protocol: TCPtargetPort: 48080#spark executor port- name: sparkjob-tcp-45980port: 45980protocol: TCPtargetPort: 45980selector:k8s-app: item-recommend-jobEOF

4、打包插件小记

<build><resources><resource><directory>src/main/resources</directory><includes><include>*.properties</include></includes><filtering>false</filtering></resource></resources><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-surefire-plugin</artifactId><configuration><skipTests>true</skipTests></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><version>3.6.1</version><configuration><source>${java.version}</source><target>${java.version}</target><encoding>${project.build.sourceEncoding}</encoding></configuration><executions><execution><phase>compile</phase><goals><goal>compile</goal></goals></execution></executions></plugin><plugin><groupId>net.alchim31.maven</groupId><artifactId>scala-maven-plugin</artifactId><version>3.2.2</version><executions><execution><id>scala-compile-first</id><phase>process-resources</phase><goals><goal>add-source</goal><goal>compile</goal><goal>testCompile</goal></goals></execution></executions></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-shade-plugin</artifactId><version>3.2.1</version><executions><execution><phase>package</phase><goals><goal>shade</goal></goals></execution></executions></plugin></plugins></build>

http://www.ppmy.cn/news/29844.html

相关文章

C++11使用多线程(线程池)计算相似度实现性能优化

需求&#xff1a;图像识别中&#xff0c;注册的样本多了会影响计算速度&#xff0c;成为性能瓶颈&#xff0c;其中一个优化方法就是使用多线程。例如&#xff0c;注册了了3000个特征&#xff0c;每个特征4096个float。可以把3000个特征比对放到4个线程中进行计算&#xff0c;然…

lavis多模态开源框架学习--安装

安装lavis安装lavis测试安装问题过程中的其他操作安装lavis 因为lavis已经发布在pypi中&#xff0c;所以可以直接利用pip安装 pip install salesforce-lavis测试安装 from lavis.models import model_zoo print(model_zoo) # # Architectures Types # # …

【Java|golang】1487. 保证文件名唯一---golang中string方法的坑

给你一个长度为 n 的字符串数组 names 。你将会在文件系统中创建 n 个文件夹&#xff1a;在第 i 分钟&#xff0c;新建名为 names[i] 的文件夹。 由于两个文件 不能 共享相同的文件名&#xff0c;因此如果新建文件夹使用的文件名已经被占用&#xff0c;系统会以 (k) 的形式为新…

SpringBoot项目的快速创建方式(包含第一个程序的运行)

目录 一、IDEA所用的版本以及插件 二、操作步骤 一、IDEA所用的版本以及插件 idea的版本&#xff1a; idea2022版本下载安装配置与卸载详细步骤&#xff08;包含运行第一个java程序教程&#xff09;_idea2022下载_云边的快乐猫的博客-CSDN博客 如果英文看不懂就点击&#x1…

ChatGPT解答:python大批量读写ini文件时,性能很低,有什么解决方法吗,给出具体的思路和实例

ChatGPT解答&#xff1a; python大批量读写ini文件时&#xff0c;性能很低&#xff0c;有什么解决方法吗&#xff0c;给出具体的思路和实例 ChatGPTDemo Based on OpenAI API (gpt-3.5-turbo). python大批量读写ini文件时&#xff0c;性能很低&#xff0c;有什么解决方法吗&…

打印名片-课后程序(Python程序开发案例教程-黑马程序员编著-第一章-课后作业)

实例2&#xff1a;打印名片 名片是标示姓名及其所属组织、公司单位和联系方法的纸片&#xff0c;也是新朋友互相认识、自我介绍的快速有效的方法。本实例要求编写程序&#xff0c;模拟输出效果如图1所示的名片。 图1 名片样式 实例目标 掌握print()函数的用法 实例分析 名片…

混合图像python旗舰版

仔细看这个图像。然后后退几米再看。你看到了什么&#xff1f;混合图像是指将一张图片的低频与另一张图片的高频相结合的图片。根据观看距离的不同&#xff0c;所得到的图像有两种解释。在上面的图片中&#xff0c;你可以看到阿尔伯特爱因斯坦&#xff0c;一旦你离开屏幕或缩小…

22. linux系统基础

递归遍历指定文件下所有的文件&#xff0c;而且你还可以统计一下普通文件的总个数&#xff0c;既然能统计普通文件&#xff0c;能统计其他文件吗&#xff1f;比如目录文件&#xff0c; 这个是main函数里面我们调用了 &#xff0c;这个checkdird这个函数&#xff0c;需要传递一个…