本文主要讨论三个问题
- ecr帮助程序在docker上如何配置
- eks集群访问ecr仓库的逻辑
- kubelet授权ecr的源码分析
ecr帮助程序
在docker环境下,可以通过在$HOME/.docker/config.json
中指定凭证管理程序
docker login
aws同样提供了证书助手,避免手动执行ecr认证命令
amazon-ecr-credential-helpe
帮助程序本身是一个二进制命令,只需要在PATH中找到就可以
eks集群访问ecr仓库
在1.24集群后默认运行时成为containerd,在我们启动集群后,会发现对于一些私有的ecr仓库,即使仓库本身没有配置放行策略,节点仍旧能够从此仓库拉取镜像。
例如以下镜像托管在aws官方的ecr仓库中。然而在节点的$HOME/.docker/config.json
文件中并没有发现帮助程序的配置
918309763551.dkr.ecr.cn-north-1.amazonaws.com.cn/amazon-k8s-cni:v1.11.4-eksbuild.1
原来,从 Kubernetes v1.20 开始,kubelet 可以使用 exec 插件动态获得针对某容器镜像库的凭据。kubelet 需要设置以下两个标志:
--image-credential-provider-config
—— 凭据提供程序插件配置文件的路径。--image-credential-provider-bin-dir
—— 凭据提供程序插件二进制可执行文件所在目录的路径。
kubelet启动节点时kubelet日志中有以下参数配置
I0218 05:30:51.783345 5014 flags.go:64] FLAG: --image-credential-provider-bin-dir="/etc/eks/ecr-credential-provider"
I0218 05:30:51.783350 5014 flags.go:64] FLAG: --image-credential-provider-config="/etc/eks/ecr-credential-provider/ecr-credential-provider-config"
去除访问ecr权限后报错如下
Warning Failed 11s kubelet Failed to pull image "xxxxxx.dkr.ecr.cn-north-1.amazonaws.com.cn/amazonlinux:latest": rpc error: code = Unknown desc = failed to pull and unpack image "xxxxxx.dkr.ecr.cn-north-1.amazonaws.com.cn/amazonlinux:latest": failed to resolve reference "xxxxxx.dkr.ecr.cn-north-1.amazonaws.com.cn/amazonlinux:latest": pulling from host xxxxxx.dkr.ecr.cn-north-1.amazonaws.com.cn failed with status code [manifests latst[]: 401 Unauthorized
Warning Failed 11s kubelet Error: ErrImagePull
节点上的kubelet日志显示
E0218 05:50:49.260465 5014 aws_credentials.go:184] error getting credentials from ECR for xxxxxx.dkr.ecr.cn-north-1.amazonaws.com.cn AccessDeniedException: User: arn:aws-cn:sts::xxxxxx:assumed-role/eksctl-test124-nodegroup-test124-NodeInstanceRole-1NVLMK3YZWYY3/i-0a598be817afd520b is not authorized to perform: ecr:GetAuthorizationToken on resource: * because no identity-based policy allows the ecr:GetAuthorizationToken action
对应源码如下,可见kubelet通过插件向ecr请求凭证
cfg, err := p.getFromECR(parsed)
if err != nil {klog.Errorf("error getting credentials from ECR for %s %v", parsed.registry, err)return credentialprovider.DockerConfig{}
}
kubelet的凭证获取程序的配置文件,满足以下条件时匹配
- 两者都包含相同数量的域部分并且每个部分都匹配。
- 匹配图片的 URL 路径必须是目标图片 URL 路径的前缀。
- 如果 matchImages 包含端口,则该端口也必须在镜像中匹配。
配置 kubelet 镜像凭据提供程序
$ cat /etc/eks/ecr-credential-provider/ecr-credential-provider-config
apiVersion: kubelet.config.k8s.io/v1beta1
kind: CredentialProviderConfig
providers:- name: ecr-credential-providermatchImages:- "*.dkr.ecr.*.amazonaws.com"- "*.dkr.ecr.*.amazonaws.cn"- "*.dkr.ecr-fips.*.amazonaws.com"- "*.dkr.ecr.us-iso-east-1.c2s.ic.gov"- "*.dkr.ecr.us-isob-east-1.sc2s.sgov.gov"defaultCacheDuration: "12h"apiVersion: credentialprovider.kubelet.k8s.io/v1beta1args:- get-credentials
分析源码
https://github.com/kubernetes/kubernetes/blob/master/pkg/credentialprovider/aws/aws_credentials.go
https://github1s.com/kubernetes/kubernetes/blob/master/pkg/credentialprovider/aws/aws_credentials.go#L187
这里额外提一下github1s项目,在vscode终端中浏览github仓库,提供高亮和折叠等功能,这样就不用下载源码了
首先初始化凭证提供程序,创建一个ecr token的缓存
// init registers a credential provider for each registryURLTemplate and creates
// an ECR token getter factory with a new cache to store token getters
func init() {credentialprovider.RegisterCredentialProvider("amazon-ecr",newECRProvider(&ecrTokenGetterFactory{cache: make(map[string]tokenGetter)},ec2ValidationImpl,))
}
以上init函数初始化了一个包含ecrProvider的对象,包括缓存,token工程函数,一个判断ec2环境的函数
type ecrProvider struct {cache cache.StoregetterFactory tokenGetterFactoryisEC2 ec2ValidationFunc
}
核心逻辑是一个名为Provider的方法
func (p *ecrProvider) Provide(image string) credentialprovider.DockerConfig {parsed, err := parseRepoURL(image)if err != nil {return credentialprovider.DockerConfig{}}// 避免由于非aws平台造成的aws sdk执行延迟,判断是否为ec2环境// 具体方法是ecrProvider中的isEC2函数,依据为 1. 获取实例的uuid 2. 获取凭证session// 只会执行一次once.Do(func() {isEC2 = p.isEC2()if isEC2 && credentialprovider.AreLegacyCloudCredentialProvidersDisabled() {klog.V(4).Infof("AWS credential provider is now disabled. Please refer to sig-cloud-provider for guidance on external credential provider integration for AWS")}})// 从cache中查找ecr tokenif cfg, exists := p.getFromCache(parsed); exists {klog.V(3).Infof("Got ECR credentials from cache for %s", parsed.registry)return cfg}klog.V(3).Info("unable to get ECR credentials from cache, checking ECR API")// 向ecr发起获取token请求cfg, err := p.getFromECR(parsed)if err != nil {klog.Errorf("error getting credentials from ECR for %s %v", parsed.registry, err)return credentialprovider.DockerConfig{}}
对于新节点需要向ecr发起请求获取token
func (p *ecrProvider) getFromECR(parsed *parsedURL) (credentialprovider.DockerConfig, error) {
cfg := credentialprovider.DockerConfig{}// 解析区域getter, err := p.getterFactory.GetTokenGetterForRegion(parsed.region)if err != nil {return cfg, err}// 构造参数并发送请求到ecr获取tokenparams := &ecr.GetAuthorizationTokenInput{RegistryIds: []*string{aws.String(parsed.registryID)}}output, err := getter.GetAuthorizationToken(params)...data := output.AuthorizationData[0]if data.AuthorizationToken == nil {return cfg, errors.New("authorization token in response is nil")}// 加入缓存entry, err := makeCacheEntry(data, parsed.registry)cfg[entry.registry] = entry.credentialsreturn cfg, nil
}
前面kubelet中的错误就是由于getFromECR
执行失败,导致kubelet输出error getting credentials from ECR