问题描述:
SAP HANA Spark Controller(2.4.4)连接HDFS集群失败,hana_controller.log 日志显示以下报错:
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1814
分析与建议
该问题是kerberos的问题,与SHSC无关
建议升级至 SHSC 最新(当前是2.6.0 版本)
SAP 开发的解释如下
Kerberos (SHSC)
The SHSC uses the SparkContext as the entry to leverage Spark and subsequently connecting to Hive. And the related Kerberos configuration in a Hadoop
cluster for a long running Spark applications is pretty simple. Please see below references for further details:
Long-Running Applications
Configuring Spark on YARN for Long-Running Applications
SHSC only has to provide the spark.yarn.keytab and spark.yarn.principal for the process.
However, since SHSC doesn’t utilize the spark-submit entry point, but SparkContext as an entry point (also a valid method), the tokens aren’t renewed
automatically. Unlike spark-submit jobs, SHSC has to renew it manually within the existing SparkContext which SHSC does periodically.
As stated explained in SAP Note 3004055 SHSC doesn’t perform any sort of authentication or authorization, but only passes on the required Kerberos information
such as keytab, principal, and tokens.
In addition, you may also want to read Hadoop Delegation Tokens Explained.
Error Message:
In regards to the error messages that can be found from the SHSC logs. Those are written/reported very clearly by hadoop not SHSC
as can be seen from the call stack information such as org.apache.hadoop.* org.apache.hadoop.security.authentication.client.AuthenticationException org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1814)Please refer to the attchment called "SHSC_KerberosSequence_Error.png" illustrating the calling sequence and occurrance of the
Kerberos issue. As one can see the authentication was already performed successfully from SHSC throughout the involved Hadoop components.
During query execution from Spark to Hadoop/HDFS the authentication issue occurrs with below call stack. 22/11/08 11:53:23 DEBUG SAPHanaSpark-akka.actor.default-dispatcher-32 security.UserGroupInformation: PrivilegedActionException as:hanaes (auth:PROXY) via hanaes/sunpaphmp03.sun.nec.co.jp@SUN.NEC.CO.JP (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
22/11/08 11:53:23 ERROR SAPHanaSpark-akka.actor.default-dispatcher-32 network.AsyncExecutor: Exception happened in Async Executor. Forwarding to Handlerjava.io.IOException: DestHost:destPort sunpaphmp02.sun.nec.co.jp:8020 , LocalHost:localPort SUNPAPHMP03.sun.nec.co.jp/10.179.8.32:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:423)at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806)at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1501)at org.apache.hadoop.ipc.Client.call(Client.java:1443)at org.apache.hadoop.ipc.Client.call(Client.java:1353)at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)at com.sun.proxy.$Proxy9.getDelegationToken(Unknown Source)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1071)at sun.reflect.GeneratedMethodAccessor55.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)at com.sun.proxy.$Proxy10.getDelegationToken(Unknown Source)at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:700)at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1814)at org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95)at org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76)at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143)at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:216)at org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.listStatus(AvroContainerInputFormat.java:42)at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)at scala.Option.getOrElse(Option.scala:121)at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)at scala.Option.getOrElse(Option.scala:121)at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)at scala.Option.getOrElse(Option.scala:121)at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)at scala.Option.getOrElse(Option.scala:121)at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)at scala.Option.getOrElse(Option.scala:121)at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)at scala.Option.getOrElse(Option.scala:121)at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)at scala.Option.getOrElse(Option.scala:121)at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)at org.apache.spark.sql.hana.DistributedDataSetImpl.coalesceToMaxPartitions(DistributedDataSetFactoryImpl.scala:178)at org.apache.spark.sql.hana.DistributedDataSetImpl.transferDatafromPartitions(DistributedDataSetFactoryImpl.scala:189)at com.sap.hana.spark.SparkFacade.dispatchQueryResultFn(SparkFacade.scala:302)at com.sap.hana.spark.SparkFacade.executeQueryAsync(SparkFacade.scala:395)at com.sap.hana.network.AsyncExecutor.executeJob(RequestRouter.scala:97)at com.sap.hana.network.AsyncExecutor$$anonfun$receive$1$$anon$2.run(RequestRouter.scala:53)at com.sap.hana.network.AsyncExecutor$$anonfun$receive$1$$anon$2.run(RequestRouter.scala:50)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:360)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)at com.sap.hana.network.AsyncExecutor$$anonfun$receive$1.applyOrElse(RequestRouter.scala:50)at akka.actor.Actor$class.aroundReceive(Actor.scala:465)at com.sap.hana.network.AsyncExecutor.aroundReceive(RequestRouter.scala:40)at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)at akka.actor.ActorCell.invoke(ActorCell.scala:487)at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)at akka.dispatch.Mailbox.run(Mailbox.scala:220)at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:757)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:720)at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:813)at org.apache.hadoop.ipc.Client$Connection.access$3600(Client.java:410)at org.apache.hadoop.ipc.Client.getConnection(Client.java:1558)at org.apache.hadoop.ipc.Client.call(Client.java:1389)... 81 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:173)at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:390)at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:614)at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:410)at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:800)at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:796)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:796)... 84 more
If there would be an issue with missing tokens in a cache (which SHSC also doesn’t have) there should be related error messages
such as:
HDFS_DELEGATION_TOKEN can’t be found in cache
And the call stack also actually shows:
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143)at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)t