使用 Loki 微服务模式部署生产集群
前面我们提到了 Loki 部署的单体模式和读写分离两种模式,当你的每天日志规模超过了 TB 的量级,那么可能我们就需要使用到微服务模式来部署 Loki 了。
微服务部署模式将 Loki 的组件实例化为不同的进程,每个进程都被调用并指定其目标,每个组件都会产生一个用于内部请求的 gRPC 服务器和一个用于外部 API 请求的 HTTP 服务。
ingester distributor query-frontend query-scheduler querier index-gateway ruler compactor
将组件作为单独的微服务运行允许通过增加微服务的数量来进行扩展,定制的集群对各个组件具有更好的可观察性。微服务模式部署是最高效的 Loki 安装,但是,它们的设置和维护也是最复杂的。
对于超大的 Loki 集群或需要对扩展和集群操作进行更多控制的集群,建议使用微服务模式。
微服务模式最适合在 Kubernetes 集群中部署,提供了 Jsonnet 和 Helm Chart 两种安装方式。
Helm Chart
同样这里我们还是使用 Helm Chart 的方式来安装微服务模式的 Loki,在安装之前记得将前面章节安装的 Loki 相关服务删除。
首先获取微服务模式的 Chart 包:
$ helm repo add grafana https://grafana.github.io/helm-charts
$ helm pull grafana/loki-distributed --untar --version 0.48.4
$ cd loki-simple-scalable
该 Chart 包支持下表中显示的组件,Ingester、distributor、querier 和 query-frontend 组件是始终安装的,其他组件是可选的。
组件 | 可选 | 默认开启? |
---|---|---|
gateway | ✅ | ✅ |
ingester | ❎ | n/a |
distributor | ❎ | n/a |
querier | ❎ | n/a |
query-frontend | ❎ | n/a |
table-manager | ✅ | ❎ |
compactor | ✅ | ❎ |
ruler | ✅ | ❎ |
index-gateway | ✅ | ❎ |
memcached-chunks | ✅ | ❎ |
memcached-frontend | ✅ | ❎ |
memcached-index-queries | ✅ | ❎ |
memcached-index-writes | ✅ | ❎ |
该 Chart 包在微服务模式下配置 Loki,已经过测试,可以与 boltdb-shipper
和 memberlist
一起使用,而其他存储和发现选项也可以使用,但是,该图表不支持设置 Consul 或 Etcd 以进行发现,它们需要进行单独配置,相反,可以使用不需要单独的键/值存储的 memberlist
。默认情况下该 Chart 包会为成员列表创建了一个 Headless Service,ingester、distributor、querier 和 ruler 是其中的一部分。
安装minio
比如我们这里使用 memberlist、boltdb-shipper 和 minio 来作存储,由于这个 Chart 包没有包含 minio,所以需要我们先单独安装 minio:
$ helm repo add minio https://helm.min.io/
$ helm pull minio/minio --untar --version 8.0.10
$ cd minio
创建一个如下所示的 values 文件:
# ci/loki-values.yaml
accessKey: "myaccessKey"
secretKey: "mysecretKey"
persistence:
enabled: true
storageClass: "local-path"
accessMode: ReadWriteOnce
size: 5Gi
service:
type: NodePort
port: 9000
nodePort: 32000
resources:
requests:
memory: 1Gi
直接使用上面配置的 values 文件安装 minio:
$ helm upgrade --install minio -n logging -f ci/loki-values.yaml .
Release "minio" does not exist. Installing it now.
NAME: minio
LAST DEPLOYED: Sun Jun 19 16:56:28 2022
NAMESPACE: logging
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Minio can be accessed via port 9000 on the following DNS name from within your cluster:
minio.logging.svc.cluster.local
To access Minio from localhost, run the below commands:
1. export POD_NAME=$(kubectl get pods --namespace logging -l "release=minio" -o jsonpath="{.items[0].metadata.name}")
2. kubectl port-forward $POD_NAME 9000 --namespace logging
Read more about port forwarding here: http://kubernetes.io/docs/user-guide/kubectl/kubectl_port-forward/
You can now access Minio server on http://localhost:9000. Follow the below steps to connect to Minio server with mc client:
1. Download the Minio mc client - https://docs.minio.io/docs/minio-client-quickstart-guide
2. Get the ACCESS_KEY=$(kubectl get secret minio -o jsonpath="{.data.accesskey}" | base64 --decode) and the SECRET_KEY=$(kubectl get secret minio -o jsonpath="{.data.secretkey}" | base64 --decode)
3. mc alias set minio-local http://localhost:9000 "$ACCESS_KEY" "$SECRET_KEY" --api s3v4
4. mc ls minio-local
Alternately, you can use your browser or the Minio SDK to access the server - https://docs.minio.io/categories/17
安装完成后查看对应的 Pod 状态:
$ kubectl get pods -n logging
NAME READY STATUS RESTARTS AGE
minio-548656f786-gctk9 1/1 Running 0 2m45s
$ kubectl get svc -n logging
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
minio NodePort 10.111.58.196 <none> 9000:32000/TCP 3h16m
可以通过指定的 32000
端口来访问 minio:
然后记得创建一个名为 loki-data
的 bucket。
安装Loki
现在将我们的对象存储准备好后,接下来我们来安装微服务模式的 Loki,首先创建一个如下所示的 values 文件:
# ci/minio-values.yaml
loki:
structuredConfig:
ingester:
max_transfer_retries: 0
chunk_idle_period: 1h
chunk_target_size: 1536000
max_chunk_age: 1h
storage_config:
aws:
endpoint: minio.logging.svc.cluster.local:9000
insecure: true
bucketnames: loki-data
access_key_id: myaccessKey
secret_access_key: mysecretKey
s3forcepathstyle: true
boltdb_shipper:
shared_store: s3
schema_config:
configs:
- from: 2022-06-21
store: boltdb-shipper
object_store: s3
schema: v12
index:
prefix: loki_index_
period: 24h
distributor:
replicas: 2
ingester:
replicas: 2
persistence:
enabled: true
size: 1Gi
storageClass: local-path
querier:
replicas: 2
persistence:
enabled: true
size: 1Gi
storageClass: local-path
queryFrontend:
replicas: 2
gateway:
nginxConfig:
httpSnippet: |-
client_max_body_size 100M;
serverSnippet: |-
client_max_body_size 100M;
上述配置会选择性地覆盖 loki.config
模板文件中的默认值,使用 loki.structuredConfig
可以在外部设置大多数配置参数。loki.config
、loki.schemaConfig
和 loki.storageConfig
也可以与 loki.structuredConfig
结合使用。loki.structuredConfig
中的值优先级更高。
这里我们通过 loki.structuredConfig.storage_config.aws
指定了用于保存数据的 minio 配置,为了高可用,核心的几个组件我们配置了2个副本,ingester
和 querier
配置了持久化存储。
现在使用上面的 values 文件进行一键安装:
$ helm upgrade --install loki -n logging -f ci/minio-values.yaml .
Release "loki" does not exist. Installing it now.
NAME: loki
LAST DEPLOYED: Tue Jun 21 16:20:10 2022
NAMESPACE: logging
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
***********************************************************************
Welcome to Grafana Loki
Chart version: 0.48.4
Loki version: 2.5.0
***********************************************************************
Installed components:
* gateway
* ingester
* distributor
* querier
* query-frontend
上面会分别安装几个组件:gateway、ingester、distributor、querier、query-frontend,对应的 Pod 状态如下所示:
$ kubectl get pods -n logging
NAME READY STATUS RESTARTS AGE
loki-loki-distributed-distributor-5dfdd5bd78-nxdq8 1/1 Running 0 2m40s
loki-loki-distributed-distributor-5dfdd5bd78-rh4gz 1/1 Running 0 116s
loki-loki-distributed-gateway-6f4cfd898c-hpszv 1/1 Running 0 21m
loki-loki-distributed-ingester-0 1/1 Running 0 96s
loki-loki-distributed-ingester-1 1/1 Running 0 2m38s
loki-loki-distributed-querier-0 1/1 Running 0 2m2s
loki-loki-distributed-querier-1 1/1 Running 0 2m33s
loki-loki-distributed-query-frontend-6d9845cb5b-p4vns 1/1 Running 0 4s
loki-loki-distributed-query-frontend-6d9845cb5b-sq5hr 1/1 Running 0 2m40s
minio-548656f786-gctk9 1/1 Running 1 (123m ago) 47h
$ kubectl get svc -n logging
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
loki-loki-distributed-distributor ClusterIP 10.102.156.127 <none> 3100/TCP,9095/TCP 22m
loki-loki-distributed-gateway ClusterIP 10.111.73.138 <none> 80/TCP 22m
loki-loki-distributed-ingester ClusterIP 10.98.238.236 <none> 3100/TCP,9095/TCP 22m
loki-loki-distributed-ingester-headless ClusterIP None <none> 3100/TCP,9095/TCP 22m
loki-loki-distributed-memberlist ClusterIP None <none> 7946/TCP 22m
loki-loki-distributed-querier ClusterIP 10.101.117.137 <none> 3100/TCP,9095/TCP 22m
loki-loki-distributed-querier-headless ClusterIP None <none> 3100/TCP,9095/TCP 22m
loki-loki-distributed-query-frontend ClusterIP None <none> 3100/TCP,9095/TCP,9096/TCP 22m
minio NodePort 10.111.58.196 <none> 9000:32000/TCP 47h
Loki 对应的配置文件如下所示:
$ kubectl get cm -n logging loki-loki-distributed -o yaml
apiVersion: v1
data:
config.yaml: |
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s
compactor:
shared_store: filesystem
distributor:
ring:
kvstore:
store: memberlist
frontend:
compress_responses: true
log_queries_longer_than: 5s
tail_proxy_url: http://loki-loki-distributed-querier:3100
frontend_worker:
frontend_address: loki-loki-distributed-query-frontend:9095
ingester:
chunk_block_size: 262144
chunk_encoding: snappy
chunk_idle_period: 1h
chunk_retain_period: 1m
chunk_target_size: 1536000
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 1
max_chunk_age: 1h
max_transfer_retries: 0
wal:
dir: /var/loki/wal
limits_config:
enforce_metric_name: false
max_cache_freshness_per_query: 10m
reject_old_samples: true
reject_old_samples_max_age: 168h
split_queries_by_interval: 15m
memberlist:
join_members:
- loki-loki-distributed-memberlist
query_range:
align_queries_with_step: true
cache_results: true
max_retries: 5
results_cache:
cache:
enable_fifocache: true
fifocache:
max_size_items: 1024
validity: 24h
ruler:
alertmanager_url: https://alertmanager.xx
external_url: https://alertmanager.xx
ring:
kvstore:
store: memberlist
rule_path: /tmp/loki/scratch
storage:
local:
directory: /etc/loki/rules
type: local
schema_config:
configs:
- from: "2022-06-21"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v12
store: boltdb-shipper
server:
http_listen_port: 3100
storage_config:
aws:
access_key_id: myaccessKey
bucketnames: loki-data
endpoint: minio.logging.svc.cluster.local:9000
insecure: true
s3forcepathstyle: true
secret_access_key: mysecretKey
boltdb_shipper:
active_index_directory: /var/loki/index
cache_location: /var/loki/cache
cache_ttl: 168h
shared_store: s3
filesystem:
directory: /var/loki/chunks
table_manager:
retention_deletes_enabled: false
retention_period: 0s
kind: ConfigMap
# ......
同样其中有一个 gateway 组件会来帮助我们将请求路由到正确的组件中去,该组件同样就是一个 nginx 服务,对应的配置如下所示:
$ kubectl -n logging exec -it loki-loki-distributed-gateway-6f4cfd898c-hpszv -- cat /etc/nginx/nginx.conf
worker_processes 5; ## Default: 1
error_log /dev/stderr;
pid /tmp/nginx.pid;
worker_rlimit_nofile 8192;
events {
worker_connections 4096; ## Default: 1024
}
http {
client_body_temp_path /tmp/client_temp;
proxy_temp_path /tmp/proxy_temp_path;
fastcgi_temp_path /tmp/fastcgi_temp;
uwsgi_temp_path /tmp/uwsgi_temp;
scgi_temp_path /tmp/scgi_temp;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] $status '
'"$request" $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /dev/stderr main;
sendfile on;
tcp_nopush on;
resolver kube-dns.kube-system.svc.cluster.local;
client_max_body_size 100M;
server {
listen 8080;
location = / {
return 200 'OK';
auth_basic off;
}
location = /api/prom/push {
proxy_pass http://loki-loki-distributed-distributor.logging.svc.cluster.local:3100$request_uri;
}
location = /api/prom/tail {
proxy_pass http://loki-loki-distributed-querier.logging.svc.cluster.local:3100$request_uri;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
# Ruler
location ~ /prometheus/api/v1/alerts.* {
proxy_pass http://loki-loki-distributed-ruler.logging.svc.cluster.local:3100$request_uri;
}
location ~ /prometheus/api/v1/rules.* {
proxy_pass http://loki-loki-distributed-ruler.logging.svc.cluster.local:3100$request_uri;
}
location ~ /api/prom/rules.* {
proxy_pass http://loki-loki-distributed-ruler.logging.svc.cluster.local:3100$request_uri;
}
location ~ /api/prom/alerts.* {
proxy_pass http://loki-loki-distributed-ruler.logging.svc.cluster.local:3100$request_uri;
}
location ~ /api/prom/.* {
proxy_pass http://loki-loki-distributed-query-frontend.logging.svc.cluster.local:3100$request_uri;
}
location = /loki/api/v1/push {
proxy_pass http://loki-loki-distributed-distributor.logging.svc.cluster.local:3100$request_uri;
}
location = /loki/api/v1/tail {
proxy_pass http://loki-loki-distributed-querier.logging.svc.cluster.local:3100$request_uri;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
location ~ /loki/api/.* {
proxy_pass http://loki-loki-distributed-query-frontend.logging.svc.cluster.local:3100$request_uri;
}
client_max_body_size 100M;
}
}
从上面配置可以看出对应的 Push 端点 /api/prom/push
与 /loki/api/v1/push
会转发给 http://loki-loki-distributed-distributor.logging.svc.cluster.local:3100$request_uri;
,也就是对应的 distributor
服务:
$ kubectl get pods -n logging -l app.kubernetes.io/component=distributor,app.kubernetes.io/instance=loki,app.kubernetes.io/name=loki-distributed
NAME READY STATUS RESTARTS AGE
loki-loki-distributed-distributor-5dfdd5bd78-nxdq8 1/1 Running 0 8m20s
loki-loki-distributed-distributor-5dfdd5bd78-rh4gz 1/1 Running 0 7m36s
所以如果我们要写入日志数据,自然现在是写入到 gateway 的 Push 端点上去。为了验证应用是否正常,接下来我们再安装 Promtail 和 Grafana 来进行数据的读写。
安装Promtail
获取 promtail
的 Chart 包并解压:
$ helm pull grafana/promtail --untar
$ cd promtail
创建一个如下所示的 values 文件:
# ci/minio-values.yaml
rbac:
pspEnabled: false
config:
clients:
- url: http://loki-loki-distributed-gateway/loki/api/v1/push
注意我们需要将 Promtail 中配置的 Loki 地址为 http://loki-loki-distributed-gateway/loki/api/v1/push
,这样就是 Promtail 将日志数据首先发送到 gateway 上面去,然后 gateway 根据我们的 Endpoints 去转发给 write 节点,使用上面的 values 文件来安装 Promtail:
$ helm upgrade --install promtail -n logging -f ci/minio-values.yaml .
Release "promtail" does not exist. Installing it now.
NAME: promtail
LAST DEPLOYED: Tue Jun 21 16:31:34 2022
NAMESPACE: logging
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
***********************************************************************
Welcome to Grafana Promtail
Chart version: 5.1.0
Promtail version: 2.5.0
***********************************************************************
Verify the application is working by running these commands:
* kubectl --namespace logging port-forward daemonset/promtail 3101
* curl http://127.0.0.1:3101/metrics
正常安装完成后会在每个节点上运行一个 promtail:
$ kubectl get pods -n logging -l app.kubernetes.io/name=promtail
NAME READY STATUS RESTARTS AGE
promtail-gbjzs 1/1 Running 0 38s
promtail-gjn5p 1/1 Running 0 38s
promtail-z6vhd 1/1 Running 0 38s
正常 promtail 就已经在开始采集所在节点上的所有容器日志了,然后将日志数据 Push 给 gateway,gateway 转发给 write 节点,我们可以查看 gateway 的日志:
$ kubectl logs -f loki-loki-distributed-gateway-6f4cfd898c-hpszv -n logging
10.244.2.26 - - [21/Jun/2022:08:41:24 +0000] 204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.244.2.1 - - [21/Jun/2022:08:41:24 +0000] 200 "GET / HTTP/1.1" 2 "-" "kube-probe/1.22" "-"
10.244.2.26 - - [21/Jun/2022:08:41:25 +0000] 204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.244.1.28 - - [21/Jun/2022:08:41:26 +0000] 204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
......
可以看到 gateway 现在在一直接接收着 /loki/api/v1/push
的请求,也就是 promtail 发送过来的,正常来说现在日志数据已经分发给 write 节点了,write 节点将数据存储在了 minio 中,可以去查看下 minio 中已经有日志数据了,前面安装的时候为 minio 服务指定了一个 32000 的 NodePort 端口:
到这里可以看到数据已经可以正常写入了。
安装Grafana
下面我们来验证下读取路径,安装 Grafana 对接 Loki:
$ helm pull grafana/grafana --untar
$ cd grafana
创建如下所示的 values 配置文件:
# ci/minio-values.yaml
service:
type: NodePort
nodePort: 32001
rbac:
pspEnabled: false
persistence:
enabled: true
storageClassName: local-path
accessModes:
- ReadWriteOnce
size: 1Gi
直接使用上面的 values 文件安装 Grafana:
$ helm upgrade --install grafana -n logging -f ci/minio-values.yaml .
Release "grafana" does not exist. Installing it now.
NAME: grafana
LAST DEPLOYED: Tue Jun 21 16:47:54 2022
NAMESPACE: logging
STATUS: deployed
REVISION: 1
NOTES:
1. Get your 'admin' user password by running:
kubectl get secret --namespace logging grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:
grafana.logging.svc.cluster.local
Get the Grafana URL to visit by running these commands in the same shell:
export NODE_PORT=$(kubectl get --namespace logging -o jsonpath="{.spec.ports[0].nodePort}" services grafana)
export NODE_IP=$(kubectl get nodes --namespace logging -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
3. Login with the password from step 1 and the username: admin
可以通过上面提示中的命令获取登录密码:
$ kubectl get secret --namespace logging grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
然后使用上面的密码和 admin
用户名登录 Grafana:
登录后进入 Grafana 添加一个数据源,这里需要注意要填写 gateway 的地址 http://loki-loki-distributed-gateway
:
保存数据源后,可以进入 Explore
页面过滤日志,比如我们这里来实时查看 gateway 这个应用的日志,如下图所示:
如果你能看到最新的日志数据那说明我们部署成功了微服务模式的 Loki,这种模式灵活性非常高,可以根据需要对不同的组件做扩缩容,但是运维成本也会增加很多。
此外我们还可以来做查询和写入的缓存,我们这里使用的 Helm Chart 是支持 memcached 的,我们也可以自行换成 redis。