Prometheus Thanos 多集群监控
原文链接:https://particule.io/en/blog/thanos-monitoring/
介绍
https://github.com/particuleio/teks/tree/main/terragrunt/live/thanos
https://github.com/particuleio/terraform-kubernetes-addons/tree/main/modules/aws
Kubernetes普罗米修斯技术栈
Prometheus:收集度量标准
告警管理器:根据指标查询向各种提供者发送警报
Grafana:可视化豪华仪表板
Thanos,它来了
Thanos Store
Thanos Sidecar
Thanos Query
多集群架构
一个观察者集群[3]
一个被观察集群[4]
.
├── env_tags.yaml
├── eu-west-1
│ ├── clusters
│ │ └── observer
│ │ ├── eks
│ │ │ ├── kubeconfig
│ │ │ └── terragrunt.hcl
│ │ ├── eks-addons
│ │ │ └── terragrunt.hcl
│ │ └── vpc
│ │ └── terragrunt.hcl
│ └── region_values.yaml
└── eu-west-3
├── clusters
│ └── observee
│ ├── cluster_values.yaml
│ ├── eks
│ │ ├── kubeconfig
│ │ └── terragrunt.hcl
│ ├── eks-addons
│ │ └── terragrunt.hcl
│ └── vpc
│ └── terragrunt.hcl
└── region_values.yaml
Grafana启用
Thanos边车上传到特定的桶
kube-prometheus-stack = {
enabled = true
allowed_cidrs = dependency.vpc.outputs.private_subnets_cidr_blocks
thanos_sidecar_enabled = true
thanos_bucket_force_destroy = true
extra_values = <<-EXTRA_VALUES
grafana:
deploymentStrategy:
type: Recreate
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "letsencrypt"
hosts:
- grafana.${local.default_domain_suffix}
tls:
- secretName: grafana.${local.default_domain_suffix}
hosts:
- grafana.${local.default_domain_suffix}
persistence:
enabled: true
storageClassName: ebs-sc
accessModes:
- ReadWriteOnce
size: 1Gi
prometheus:
prometheusSpec:
replicas: 1
retention: 2d
retentionSize: "10GB"
ruleSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
podMonitorSelectorNilUsesHelmValues: false
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: ebs-sc
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
EXTRA_VALUES
这个CA将被进入sidecar的被观察集群所信任
为Thanos querier组件生成TLS证书,这些组件将查询被观察集群
Thanos组件全部部署完成
查询前端,作为Grafana的数据源端点
存储网关用于查询观察者桶
Query将对存储网关和其他查询器执行查询
配置了TLS的Thanos查询器对每个被观察集群进行查询
thanos-tls-querier = {
"observee" = {
enabled = true
default_global_requests = true
default_global_limits = false
stores = [
"thanos-sidecar.${local.default_domain_suffix}:443"
]
}
}
thanos-storegateway = {
"observee" = {
enabled = true
default_global_requests = true
default_global_limits = false
bucket = "thanos-store-pio-thanos-observee"
region = "eu-west-3"
}
Thanos这边就是上传给观察者特定的桶
Thanos边车与TLS客户端认证的入口对象一起发布,并信任观察者集群CA
kube-prometheus-stack = {
enabled = true
allowed_cidrs = dependency.vpc.outputs.private_subnets_cidr_blocks
thanos_sidecar_enabled = true
thanos_bucket_force_destroy = true
extra_values = <<-EXTRA_VALUES
grafana:
enabled: false
prometheus:
thanosIngress:
enabled: true
ingressClassName: nginx
annotations:
cert-manager.io/cluster-issuer: "letsencrypt"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
nginx.ingress.kubernetes.io/auth-tls-verify-client: "on"
nginx.ingress.kubernetes.io/auth-tls-secret: "monitoring/thanos-ca"
hosts:
- thanos-sidecar.${local.default_domain_suffix}
paths:
- /
tls:
- secretName: thanos-sidecar.${local.default_domain_suffix}
hosts:
- thanos-sidecar.${local.default_domain_suffix}
prometheusSpec:
replicas: 1
retention: 2d
retentionSize: "6GB"
ruleSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
podMonitorSelectorNilUsesHelmValues: false
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: ebs-sc
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
EXTRA_VALUES
Thanos压缩器来管理这个特定集群的下采样
thanos = {
enabled = true
bucket_force_destroy = true
trusted_ca_content = dependency.thanos-ca.outputs.thanos_ca
extra_values = <<-EXTRA_VALUES
compactor:
retentionResolution5m: 90d
query:
enabled: false
queryFrontend:
enabled: false
storegateway:
enabled: false
EXTRA_VALUES
}
再深入一点
kubectl -n monitoring get pods
NAME READY STATUS RESTARTS AGE
alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 120m
kube-prometheus-stack-grafana-c8768466b-rd8wm 2/2 Running 0 120m
kube-prometheus-stack-kube-state-metrics-5cf575d8f8-x59rd 1/1 Running 0 120m
kube-prometheus-stack-operator-6856b9bb58-hdrb2 1/1 Running 0 119m
kube-prometheus-stack-prometheus-node-exporter-8hvmv 1/1 Running 0 117m
kube-prometheus-stack-prometheus-node-exporter-cwlfd 1/1 Running 0 120m
kube-prometheus-stack-prometheus-node-exporter-rsss5 1/1 Running 0 120m
kube-prometheus-stack-prometheus-node-exporter-rzgr9 1/1 Running 0 120m
prometheus-kube-prometheus-stack-prometheus-0 3/3 Running 1 120m
thanos-compactor-74784bd59d-vmvps 1/1 Running 0 119m
thanos-query-7c74db546c-d7bp8 1/1 Running 0 12m
thanos-query-7c74db546c-ndnx2 1/1 Running 0 12m
thanos-query-frontend-5cbcb65b57-5sx8z 1/1 Running 0 119m
thanos-query-frontend-5cbcb65b57-qjhxg 1/1 Running 0 119m
thanos-storegateway-0 1/1 Running 0 119m
thanos-storegateway-1 1/1 Running 0 118m
thanos-storegateway-observee-storegateway-0 1/1 Running 0 12m
thanos-storegateway-observee-storegateway-1 1/1 Running 0 11m
thanos-tls-querier-observee-query-dfb9f79f9-4str8 1/1 Running 0 29m
thanos-tls-querier-observee-query-dfb9f79f9-xsq24 1/1 Running 0 29m
kubectl -n monitoring get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
kube-prometheus-stack-grafanagrafana.thanos.teks-tg.clusterfrak-dynamics.io k8s-ingressn-ingressn-afa0a48374-f507283b6cd101c5.elb.eu-west-1.amazonaws.com 80, 443 123m
kubectl -n monitoring get pods
NAME READY STATUS RESTARTS AGE
alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 39m
kube-prometheus-stack-kube-state-metrics-5cf575d8f8-ct292 1/1 Running 0 39m
kube-prometheus-stack-operator-6856b9bb58-4cngc 1/1 Running 0 39m
kube-prometheus-stack-prometheus-node-exporter-bs4wp 1/1 Running 0 39m
kube-prometheus-stack-prometheus-node-exporter-c57ss 1/1 Running 0 39m
kube-prometheus-stack-prometheus-node-exporter-cp5ch 1/1 Running 0 39m
kube-prometheus-stack-prometheus-node-exporter-tnqvq 1/1 Running 0 39m
kube-prometheus-stack-prometheus-node-exporter-z2p49 1/1 Running 0 39m
kube-prometheus-stack-prometheus-node-exporter-zzqp7 1/1 Running 0 39m
prometheus-kube-prometheus-stack-prometheus-0 3/3 Running 1 39m
thanos-compactor-7576dcbcfc-6pd4v 1/1 Running 0 38m
kubectl -n monitoring get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
kube-prometheus-stack-thanos-gateway nginx thanos-sidecar.thanos.teks-tg.clusterfrak-dynamics.io k8s-ingressn-ingressn-95903f6102-d2ce9013ac068b9e.elb.eu-west-3.amazonaws.com 80, 443 40m
k -n monitoring logs -f thanos-tls-querier-observee-query-687dd88ff5-nzpdh
level=info ts=2021-02-23T15:37:35.692346206Z caller=storeset.go:387 component=storeset msg="adding new storeAPI to query storeset" address=thanos-sidecar.thanos.teks-tg.clusterfrak-dynamics.io:443 extLset="{cluster=\"pio-thanos-observee\", prometheus=\"monitoring/kube-prometheus-stack-prometheus\", prometheus_replica=\"prometheus-kube-prometheus-stack-prometheus-0\"}"
kubectl -n monitoring port-forward thanos-tls-querier-observee-query-687dd88ff5-nzpdh 10902
kubectl -n monitoring port-forward thanos-query-7c74db546c-d7bp8 10902
观察者把本地Thanos聚集
我们的存储网关(一个用于远程观测者集群,一个用于本地观测者集群)
本地TLS查询器,它可以查询被观察的sidecar
在Grafana可视化
总结
- END -
推荐阅读 31天拿下Kubernetes含金量最高的CKA+CKS证书! 三只兔子的故事理解 Kubernetes 污点和容忍,真的很简单! 比 netstat 好用?Linux 网络状态工具 ss 详解 Linux Shell 脚本编程最佳实践 我的云服务器被植入挖矿木马,CPU飙升200% 做了这么多年运维工作,现在才看清职业方向 一篇文章讲清楚云原生图景及发展路线 K8s kubectl 常用命令总结(建议收藏) 一名运维小哥对运维规则的10个总结 K8s运维锦囊,19个常见故障解决方法 Linux 系统日常巡检脚本 终于明白了 DevOps 与 SRE 的区别! 编写 Dockerfile 最佳实践 搭建一套完整的企业级 K8s 集群(kubeadm方式)
点亮,服务器三年不宕机