kube-state-metrics で CustomResourceDefinition(CRD) のメトリクス監視を行う
kube-state-metrics で CustomResourceDefinition(CRD) の情報を取得し監視する方法を紹介します。kube-state-metrics のデプロイについては prometheus-community/kube-state-metrics の Helm Chart を利用します。
kube-state-metrics
kube-state-metrics はデフォルトで以下のリソースの情報を取得できます。
collectors:
- certificatesigningrequests
- configmaps
- cronjobs
- daemonsets
- deployments
- endpoints
- horizontalpodautoscalers
- ingresses
- jobs
- leases
- limitranges
- mutatingwebhookconfigurations
- namespaces
- networkpolicies
- nodes
- persistentvolumeclaims
- persistentvolumes
- poddisruptionbudgets
- pods
- replicasets
- replicationcontrollers
- resourcequotas
- secrets
- services
- statefulsets
- storageclasses
- validatingwebhookconfigurations
- volumeattachments
さらに customResourceState に取得したい CustomResourceDefinition(CRD) の設定を定義することで CustomResourceDefinition(CRD) のリソースの情報を取得できます。
# Enabling support for customResourceState, will create a configMap including your config that will be read from kube-state-metrics
customResourceState:
enabled: false
# Add (Cluster)Role permissions to list/watch the customResources defined in the config to rbac.extraRules
config: {}
モチベーション
Longhorn のバックアップが正常終了しているか、 Argocd の syncStatus が何らかの異常があってDegraded になっていないか気づきたいというのがありました。
リポジトリ
ArtifactHub: https://artifacthub.io/packages/helm/prometheus-community/kube-state-metrics
Helm Chart Repostiory: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-state-metrics
公式 Repository: https://github.com/kubernetes/kube-state-metrics
kube-state-metrics の values.yaml
Longhorn の backups.longhorn.io と ArgoCD の applications.argoproj.io の CRD のメトリクスを出力する kube-state-metrics の values.yaml です。
customResourceState:
enabled: true
config:
spec:
resources:
- groupVersionKind:
group: longhorn.io
kind: Backup
version: v1beta2
metricNamePrefix: kube_custom_resouce_longhorn
metrics:
- name: backup_state_condition
help: Longhorn backup state condition
labelsFromPath:
backup_id: [metadata, name]
volume_name: [status, volumeName]
snapshot_name: [status, snapshotName]
each:
type: StateSet
stateSet:
path: [status, state]
labelName: state
list: ["InProgress", "Completed", "Error", "Unknown"]
- groupVersionKind:
group: argoproj.io
kind: Application
version: v1alpha1
metricNamePrefix: kube_custom_resouce_argocd
metrics:
- name: application_state
help: ArgoCD application state
labelsFromPath:
app_name: [metadata, name]
namespace: [metadata, namespace]
each:
type: StateSet
stateSet:
path: [status, health, status]
labelName: state
list: ["Healthy", "Degraded", "Progressing", "Suspended", "Missing", "Unknown"]
rbac:
extraRules:
- apiGroups:
- longhorn.io
resources:
- backups
verbs:
- list
- watch
- apiGroups:
- argoproj.io
resources:
- applications
verbs:
- list
- watch
prometheus.yaml の scrape_config (抜粋)に kube-state-metrics の設定します。
scrape_configs:
- job_name: kube-state-metrics
honor_timestamps: true
scrape_interval: 1m
scrape_timeout: 1m
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- kube-state-metrics.monitoring.svc.cluster.local:8080
以下の様にメトリクスが取得できるようになります。
# HELP kube_custom_resouce_argocd_application_state ArgoCD application state
# TYPE kube_custom_resouce_argocd_application_state stateset
kube_custom_resouce_argocd_application_state{app_name="ingress-aggregator",customresource_group="argoproj.io",customresource_kind="Application",customresource_version="v1alpha1",namespace="argocd",state="Degraded"} 0
kube_custom_resouce_argocd_application_state{app_name="ingress-aggregator",customresource_group="argoproj.io",customresource_kind="Application",customresource_version="v1alpha1",namespace="argocd",state="Healthy"} 1
kube_custom_resouce_argocd_application_state{app_name="ingress-aggregator",customresource_group="argoproj.io",customresource_kind="Application",customresource_version="v1alpha1",namespace="argocd",state="Missing"} 0
kube_custom_resouce_argocd_application_state{app_name="ingress-aggregator",customresource_group="argoproj.io",customresource_kind="Application",customresource_version="v1alpha1",namespace="argocd",state="Progressing"} 0
kube_custom_resouce_argocd_application_state{app_name="ingress-aggregator",customresource_group="argoproj.io",customresource_kind="Application",customresource_version="v1alpha1",namespace="argocd",state="Suspended"} 0
kube_custom_resouce_argocd_application_state{app_name="ingress-aggregator",customresource_group="argoproj.io",customresource_kind="Application",customresource_version="v1alpha1",namespace="argocd",state="Unknown"} 0
# HELP kube_custom_resouce_longhorn_backup_state_condition Longhorn backup state condition
# TYPE kube_custom_resouce_longhorn_backup_state_condition stateset
kube_custom_resouce_longhorn_backup_state_condition{backup_id="backup-6783bd10752b4b7e",customresource_group="longhorn.io",customresource_kind="Backup",customresource_version="v1beta2",snapshot_name="producti-abd6862e-988c-4ad2-85af-77d393386e37",state="Completed",volume_name="pvc-529949c3-e126-4f73-9e2c-560878d0709e"} 1
kube_custom_resouce_longhorn_backup_state_condition{backup_id="backup-6783bd10752b4b7e",customresource_group="longhorn.io",customresource_kind="Backup",customresource_version="v1beta2",snapshot_name="producti-abd6862e-988c-4ad2-85af-77d393386e37",state="Error",volume_name="pvc-529949c3-e126-4f73-9e2c-560878d0709e"} 0
kube_custom_resouce_longhorn_backup_state_condition{backup_id="backup-6783bd10752b4b7e",customresource_group="longhorn.io",customresource_kind="Backup",customresource_version="v1beta2",snapshot_name="producti-abd6862e-988c-4ad2-85af-77d393386e37",state="InProgress",volume_name="pvc-529949c3-e126-4f73-9e2c-560878d0709e"} 0
kube_custom_resouce_longhorn_backup_state_condition{backup_id="backup-6783bd10752b4b7e",customresource_group="longhorn.io",customresource_kind="Backup",customresource_version="v1beta2",snapshot_name="producti-abd6862e-988c-4ad2-85af-77d393386e37",state="Unknown",volume_name="pvc-529949c3-e126-4f73-9e2c-560878d0709e"} 0
prometheus などの rules.yaml に 監視設定を定義する。
groups:
- name: kubernetes
rules:
- alert: longhor_backup_error
expr: kube_custom_resouce_longhorn_backup_state_condition{state="Error"} == 1
for: 0m
labels:
severity: alert
annotations:
summary: "Longhorn Backup PVC/{{$labels.backup_id}} Error"
description: "PVC/{{$labels.backup_id}} が Longhorn Backup に失敗した"
- alert: argocd_application_degraded
expr: kube_custom_resouce_argocd_application_state{state="Degraded"} == 1
for: 5m
labels:
severity: alert
annotations:
summary: "Argocd application {{$labels.app_name}} Degraded"
description: "Argocd の application/{{$labels.app_name}} が Degraded になった"