Prometheus是一款由SoundCloud开发的开源监控系统,它提供了实时监测和报警功能。它的优点包括:
易于安装和配置,可以快速地搭建起一个监控系统;提供了丰富的数据采集方式和查询语言,可以快速地构建监控指标;社区支持广泛,有很多现成的监控模板可以参考;可以与许多第三方工具集成,比如Kubernetes、Elasticsearch等。
Prometheus的缺点包括:
界面相对较为简单,缺乏可视化效果和交互性;对于大规模的监控需求,扩展性和性能可能会有问题;社区中的一些监控插件可能不稳定,需要自己进行调试和优化。
使用kubernetes来部署prometheus服务,prometheus数据持久化到NFS。亲测可用
1.部署ClusterRole,ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: prometheus
rules:- apiGroups: [""] # "" indicates the core API groupresources:- nodes- nodes/proxy- services- endpoints- podsverbs:- get- watch- list- apiGroups:- extensionsresources:- ingressesverbs:- get- watch- list- nonResourceURLs: ["/metrics"]verbs:- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: prometheus
subjects:- kind: ServiceAccountname: prometheusnamespace: ns-monitor
roleRef:kind: ClusterRolename: prometheusapiGroup: rbac.authorization.k8s.io
2.部署servicaccount
apiVersion: v1
kind: ServiceAccount
metadata:name: prometheusnamespace: ns-monitorlabels:app: prometheus
3.部署configmap
configmap里面的配置不需要的可以直接删除
apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-confnamespace: ns-monitorlabels:app: prometheus
data:prometheus.yml: |-# my global configglobal:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configurationalerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files:# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:# Here it's Prometheus itself.scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: 'prometheus'# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ['localhost:9090']- job_name: 'grafana'static_configs:- targets:- 'grafana-service.ns-monitor:3000'- job_name: 'kubernetes-apiservers'kubernetes_sd_configs:- role: endpoints# Default to scraping over https. If required, just disable this or change to# `http`.scheme: https# This TLS & bearer token file config is used to connect to the actual scrape# endpoints for cluster components. This is separate to discovery auth# configuration because discovery & scraping are two separate concerns in# Prometheus. The discovery auth config is automatic if Prometheus runs inside# the cluster. Otherwise, more config options have to be provided within the# <kubernetes_sd_config>.tls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt# If your node certificates are self-signed or use a different CA to the# master CA, then disable certificate verification below. Note that# certificate verification is an integral part of a secure infrastructure# so this should only be disabled in a controlled environment. You can# disable certificate verification by uncommenting the line below.## insecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token# Keep only the default/kubernetes service endpoints for the https port. This# will add targets for each API server which Kubernetes adds an endpoint to# the default/kubernetes service.relabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]action: keepregex: default;kubernetes;https# Scrape config for nodes (kubelet).## Rather than connecting directly to the node, the scrape is proxied though the# Kubernetes apiserver. This means it will work if Prometheus is running out of# cluster, or can't connect to nodes for some other reason (e.g. because of# firewalling).- job_name: 'kubernetes-nodes'# Default to scraping over https. If required, just disable this or change to# `http`.scheme: https# This TLS & bearer token file config is used to connect to the actual scrape# endpoints for cluster components. This is separate to discovery auth# configuration because discovery & scraping are two separate concerns in# Prometheus. The discovery auth config is automatic if Prometheus runs inside# the cluster. Otherwise, more config options have to be provided within the# <kubernetes_sd_config>.tls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- target_label: __address__replacement: kubernetes.default.svc:443- source_labels: [__meta_kubernetes_node_name]regex: (.+)target_label: __metrics_path__replacement: /api/v1/nodes/${1}/proxy/metrics# Scrape config for Kubelet cAdvisor.## This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics# (those whose names begin with 'container_') have been removed from the# Kubelet metrics endpoint. This job scrapes the cAdvisor endpoint to# retrieve those metrics.## In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor# HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics"# in that case (and ensure cAdvisor's HTTP server hasn't been disabled with# the --cadvisor-port=0 Kubelet flag).## This job is not necessary and should be removed in Kubernetes 1.6 and# earlier versions, or it will cause the metrics to be scraped twice.- job_name: 'kubernetes-cadvisor'# Default to scraping over https. If required, just disable this or change to# `http`.scheme: https# This TLS & bearer token file config is used to connect to the actual scrape# endpoints for cluster components. This is separate to discovery auth# configuration because discovery & scraping are two separate concerns in# Prometheus. The discovery auth config is automatic if Prometheus runs inside# the cluster. Otherwise, more config options have to be provided within the# <kubernetes_sd_config>.tls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- target_label: __address__replacement: kubernetes.default.svc:443- source_labels: [__meta_kubernetes_node_name]regex: (.+)target_label: __metrics_path__replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor# Scrape config for service endpoints.## The relabeling allows the actual service scrape endpoint to be configured# via the following annotations:## * `prometheus.io/scrape`: Only scrape services that have a value of `true`# * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need# to set this to `https` & most likely set the `tls_config` of the scrape config.# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.# * `prometheus.io/port`: If the metrics are exposed on a different port to the# service then set this appropriately.- job_name: 'kubernetes-service-endpoints'kubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]action: keepregex: true- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]action: replacetarget_label: __scheme__regex: (https?)- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]action: replacetarget_label: __metrics_path__regex: (.+)- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]action: replacetarget_label: __address__regex: ([^:]+)(?::\d+)?;(\d+)replacement: $1:$2- action: labelmapregex: __meta_kubernetes_service_label_(.+)- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: kubernetes_namespace- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: kubernetes_name# Example scrape config for probing services via the Blackbox Exporter.## The relabeling allows the actual service scrape endpoint to be configured# via the following annotations:## * `prometheus.io/probe`: Only probe services that have a value of `true`- job_name: 'kubernetes-services'metrics_path: /probeparams:module: [http_2xx]kubernetes_sd_configs:- role: servicerelabel_configs:- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]action: keepregex: true- source_labels: [__address__]target_label: __param_target- target_label: __address__replacement: blackbox-exporter.example.com:9115- source_labels: [__param_target]target_label: instance- action: labelmapregex: __meta_kubernetes_service_label_(.+)- source_labels: [__meta_kubernetes_namespace]target_label: kubernetes_namespace- source_labels: [__meta_kubernetes_service_name]target_label: kubernetes_name# Example scrape config for probing ingresses via the Blackbox Exporter.## The relabeling allows the actual ingress scrape endpoint to be configured# via the following annotations:## * `prometheus.io/probe`: Only probe services that have a value of `true`- job_name: 'kubernetes-ingresses'metrics_path: /probeparams:module: [http_2xx]kubernetes_sd_configs:- role: ingressrelabel_configs:- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]action: keepregex: true- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]regex: (.+);(.+);(.+)replacement: ${1}://${2}${3}target_label: __param_target- target_label: __address__replacement: blackbox-exporter.example.com:9115- source_labels: [__param_target]target_label: instance- action: labelmapregex: __meta_kubernetes_ingress_label_(.+)- source_labels: [__meta_kubernetes_namespace]target_label: kubernetes_namespace- source_labels: [__meta_kubernetes_ingress_name]target_label: kubernetes_name# Example scrape config for pods## The relabeling allows the actual pod scrape endpoint to be configured via the# following annotations:## * `prometheus.io/scrape`: Only scrape pods that have a value of `true`# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.# * `prometheus.io/port`: Scrape the pod on the indicated port instead of the# pod's declared ports (default is a port-free target if none are declared).- job_name: 'kubernetes-pods'kubernetes_sd_configs:- role: podrelabel_configs:- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]action: keepregex: true- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]action: replacetarget_label: __metrics_path__regex: (.+)- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]action: replaceregex: ([^:]+)(?::\d+)?;(\d+)replacement: $1:$2target_label: __address__- action: labelmapregex: __meta_kubernetes_pod_label_(.+)- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: kubernetes_namespace- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: kubernetes_pod_name---
apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-rulesnamespace: ns-monitorlabels:app: prometheus
data:cpu-usage.rule: |groups:- name: NodeCPUUsagerules:- alert: NodeCPUUsageexpr: (100 - (avg by (instance) (irate(node_cpu{name="node-exporter",mode="idle"}[5m])) * 100)) > 75for: 2mlabels:severity: "page"annotations:summary: "{{$labels.instance}}: High CPU usage detected"description: "{{$labels.instance}}: CPU usage is above 75% (current value is: {{ $value }})"
4.pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: prometheus-data-pvcnamespace: ns-monitor
spec:accessModes:- ReadWriteOnceresources:requests:storage: 5Giselector:matchLabels:name: prometheus-data-pvrelease: stable
5.pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:name: "prometheus-data-pv"labels:name: prometheus-data-pvrelease: stable
spec:capacity:storage: 5GiaccessModes:- ReadWriteOncepersistentVolumeReclaimPolicy: Recyclenfs:path: /nfs/prometheus/dataserver: 192.168.0.1
6.部署deployment
kind: Deployment
apiVersion: apps/v1
metadata:labels:app: prometheusname: prometheusnamespace: ns-monitor
spec:replicas: 1revisionHistoryLimit: 10selector:matchLabels:app: prometheustemplate:metadata:labels:app: prometheusspec:serviceAccountName: prometheussecurityContext:runAsUser: 0containers:- name: prometheusimage: prom/prometheus:latestimagePullPolicy: IfNotPresentvolumeMounts:- mountPath: /prometheusname: prometheus-data-volume- mountPath: /etc/prometheus/prometheus.ymlname: prometheus-conf-volumesubPath: prometheus.yml- mountPath: /etc/prometheus/rulesname: prometheus-rules-volumeports:- containerPort: 9090protocol: TCPvolumes:- name: prometheus-data-volumepersistentVolumeClaim:claimName: prometheus-data-pvc- name: prometheus-conf-volumeconfigMap:name: prometheus-conf- name: prometheus-rules-volumeconfigMap:name: prometheus-rulestolerations:- key: node-role.kubernetes.io/mastereffect: NoSchedule
7.部署service
kind: Service
apiVersion: v1
metadata:annotations:prometheus.io/scrape: 'true'labels:app: prometheusname: prometheus-servicenamespace: ns-monitor
spec:ports:- port: 9090targetPort: 9090selector:app: prometheustype: NodePort
8.验证
部署完以后,开始进行验证
在浏览器输出svc的ip地址和端口号:http://10.101.171.168:9090/graph,就可以访问prometheus的页面了。