kube-state-metrics で CustomResourceDefinition(CRD) のメトリクス監視を行う

kube-state-metrics で CustomResourceDefinition(CRD) の情報を取得し監視する方法を紹介します。kube-state-metrics のデプロイについては prometheus-community/kube-state-metrics の Helm Chart を利用します。

kube-state-metrics

kube-state-metrics はデフォルトで以下のリソースの情報を取得できます。

collectors:
  - certificatesigningrequests
  - configmaps
  - cronjobs
  - daemonsets
  - deployments
  - endpoints
  - horizontalpodautoscalers
  - ingresses
  - jobs
  - leases
  - limitranges
  - mutatingwebhookconfigurations
  - namespaces
  - networkpolicies
  - nodes
  - persistentvolumeclaims
  - persistentvolumes
  - poddisruptionbudgets
  - pods
  - replicasets
  - replicationcontrollers
  - resourcequotas
  - secrets
  - services
  - statefulsets
  - storageclasses
  - validatingwebhookconfigurations
  - volumeattachments

さらに customResourceState に取得したい CustomResourceDefinition(CRD) の設定を定義することで CustomResourceDefinition(CRD) のリソースの情報を取得できます。

# Enabling support for customResourceState, will create a configMap including your config that will be read from kube-state-metrics
customResourceState:
  enabled: false
  # Add (Cluster)Role permissions to list/watch the customResources defined in the config to rbac.extraRules
  config: {}

モチベーション

Longhorn のバックアップが正常終了しているか、 Argocd の syncStatus が何らかの異常があってDegraded になっていないか気づきたいというのがありました。

リポジトリ

ArtifactHub: https://artifacthub.io/packages/helm/prometheus-community/kube-state-metrics
Helm Chart Repostiory: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-state-metrics
公式 Repository: https://github.com/kubernetes/kube-state-metrics

kube-state-metrics の values.yaml

Longhorn の backups.longhorn.io と ArgoCD の applications.argoproj.io の CRD のメトリクスを出力する kube-state-metrics の values.yaml です。

customResourceState:
  enabled: true
  config:
    spec:
      resources:
        - groupVersionKind:
            group: longhorn.io
            kind: Backup
            version: v1beta2
          metricNamePrefix: kube_custom_resouce_longhorn
          metrics:
            - name: backup_state_condition
              help: Longhorn backup state condition
              labelsFromPath:
                backup_id: [metadata, name]
                volume_name: [status, volumeName]
                snapshot_name: [status, snapshotName]
              each:
                type: StateSet
                stateSet:
                  path: [status, state]
                  labelName: state
                  list: ["InProgress", "Completed", "Error", "Unknown"]
        - groupVersionKind:
            group: argoproj.io
            kind: Application
            version: v1alpha1
          metricNamePrefix: kube_custom_resouce_argocd
          metrics:
            - name: application_state
              help: ArgoCD application state
              labelsFromPath:
                app_name: [metadata, name]
                namespace: [metadata, namespace]
              each:
                type: StateSet
                stateSet:
                  path: [status, health, status]
                  labelName: state
                  list: ["Healthy", "Degraded", "Progressing", "Suspended", "Missing", "Unknown"]
rbac:
  extraRules:
    - apiGroups:
        - longhorn.io
      resources:
        - backups
      verbs:
        - list
        - watch
    - apiGroups:
        - argoproj.io
      resources:
        - applications
      verbs:
        - list
        - watch

prometheus.yaml の scrape_config (抜粋)に kube-state-metrics の設定します。

scrape_configs:
  - job_name: kube-state-metrics
    honor_timestamps: true
    scrape_interval: 1m
    scrape_timeout: 1m
    metrics_path: /metrics
    scheme: http
    static_configs:
    - targets:
      - kube-state-metrics.monitoring.svc.cluster.local:8080

以下の様にメトリクスが取得できるようになります。

# HELP kube_custom_resouce_argocd_application_state ArgoCD application state
# TYPE kube_custom_resouce_argocd_application_state stateset
kube_custom_resouce_argocd_application_state{app_name="ingress-aggregator",customresource_group="argoproj.io",customresource_kind="Application",customresource_version="v1alpha1",namespace="argocd",state="Degraded"} 0
kube_custom_resouce_argocd_application_state{app_name="ingress-aggregator",customresource_group="argoproj.io",customresource_kind="Application",customresource_version="v1alpha1",namespace="argocd",state="Healthy"} 1
kube_custom_resouce_argocd_application_state{app_name="ingress-aggregator",customresource_group="argoproj.io",customresource_kind="Application",customresource_version="v1alpha1",namespace="argocd",state="Missing"} 0
kube_custom_resouce_argocd_application_state{app_name="ingress-aggregator",customresource_group="argoproj.io",customresource_kind="Application",customresource_version="v1alpha1",namespace="argocd",state="Progressing"} 0
kube_custom_resouce_argocd_application_state{app_name="ingress-aggregator",customresource_group="argoproj.io",customresource_kind="Application",customresource_version="v1alpha1",namespace="argocd",state="Suspended"} 0
kube_custom_resouce_argocd_application_state{app_name="ingress-aggregator",customresource_group="argoproj.io",customresource_kind="Application",customresource_version="v1alpha1",namespace="argocd",state="Unknown"} 0
# HELP kube_custom_resouce_longhorn_backup_state_condition Longhorn backup state condition
# TYPE kube_custom_resouce_longhorn_backup_state_condition stateset
kube_custom_resouce_longhorn_backup_state_condition{backup_id="backup-6783bd10752b4b7e",customresource_group="longhorn.io",customresource_kind="Backup",customresource_version="v1beta2",snapshot_name="producti-abd6862e-988c-4ad2-85af-77d393386e37",state="Completed",volume_name="pvc-529949c3-e126-4f73-9e2c-560878d0709e"} 1
kube_custom_resouce_longhorn_backup_state_condition{backup_id="backup-6783bd10752b4b7e",customresource_group="longhorn.io",customresource_kind="Backup",customresource_version="v1beta2",snapshot_name="producti-abd6862e-988c-4ad2-85af-77d393386e37",state="Error",volume_name="pvc-529949c3-e126-4f73-9e2c-560878d0709e"} 0
kube_custom_resouce_longhorn_backup_state_condition{backup_id="backup-6783bd10752b4b7e",customresource_group="longhorn.io",customresource_kind="Backup",customresource_version="v1beta2",snapshot_name="producti-abd6862e-988c-4ad2-85af-77d393386e37",state="InProgress",volume_name="pvc-529949c3-e126-4f73-9e2c-560878d0709e"} 0
kube_custom_resouce_longhorn_backup_state_condition{backup_id="backup-6783bd10752b4b7e",customresource_group="longhorn.io",customresource_kind="Backup",customresource_version="v1beta2",snapshot_name="producti-abd6862e-988c-4ad2-85af-77d393386e37",state="Unknown",volume_name="pvc-529949c3-e126-4f73-9e2c-560878d0709e"} 0

prometheus などの rules.yaml に 監視設定を定義する。

groups:
- name: kubernetes
  rules:
  - alert: longhor_backup_error
    expr: kube_custom_resouce_longhorn_backup_state_condition{state="Error"} == 1
    for: 0m
    labels:
      severity: alert
    annotations:
      summary: "Longhorn Backup PVC/{{$labels.backup_id}} Error"
      description: "PVC/{{$labels.backup_id}} が Longhorn Backup に失敗した"
  - alert: argocd_application_degraded
    expr: kube_custom_resouce_argocd_application_state{state="Degraded"} == 1
    for: 5m
    labels:
      severity: alert
    annotations:
      summary: "Argocd application {{$labels.app_name}} Degraded"
      description: "Argocd の application/{{$labels.app_name}} が Degraded になった"

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です