使用 InfluxDB & Flux 缩放 Kubernetes 部署

导航至

本文由 InfluxDB 社区成员和 InfluxAce David Flanagan 撰写。

十八小时前,我和一些同事开会讨论我们的 Kubernetes 计划以及改进运行在 Kubernetes 上的 InfluxDB 集成和支持的宏伟计划。在这次会议上,我阐述了我认为 InfluxDB 在 Kubernetes 上真正发光所需的缺失部分。我不会详细说明,但其中之一是我们需要的指标服务器集成,以提供基于 InfluxDB 内部数据的水平 pod 自动扩展 (HPA)。当我提出我们可以采取的快速启动此功能的选项时,我那位了不起的同事 Giacomo 喊道

“这已经存在了。”

TL;DR

  • 您可以将 kube-metrics-adapter 部署到您的集群中,它支持使用 Flux 查询注释您的 HPA 资源,以控制部署资源的扩展。
  • InfluxData 拥有一个 Helm Charts 仓库,其中包含一个 InfluxDB 2 的图表。
  • Telegraf 可以用作本地指标收集的边车。
  • InfluxDB 2 拥有一个名为 pkger 的组件,它允许通过清单(如 Kubernetes)创建和管理 InfluxDB 资源,实现声明式接口。

使用 Flux 进行部署扩展

Giacomo 继续详细解释了所构建的内容,但我将简要介绍。原来,我们的前同事 Lorenzo Affetti,在年初向 Zalandos 的 metrics-adapter 项目 提交了一些 PR。这些 PR 已被合并,我们实际上可以使用这个项目通过给部署添加一个 Flux 查询来扩展我们的部署。

它是如何工作的?很简单。让我给你展示。

部署 InfluxDB

本文假设您已在集群中运行 InfluxDB 2。如果没有,您可以使用我们的 Helm Chart 在 30 秒内部署 InfluxDB。我现在开始计时…

如果您感到勇敢,可以将其放入终端并希望一切顺利。

kubectl create namespace monitoring 
helm repo add influxdata https://helm.influxdata.com/ 
helm upgrade --install influxdb --namespace=monitoring influxdata/influxdb2

部署指标适配器

首先,我们需要在 Kubernetes 集群中部署 metrics-adapter。Zalando 没有提供用于此目的的 Helm 图表,但 Banzai Cloud 提供了。不幸的是,Banzai Cloud 图表需要对 InfluxDB Collector 进行一些调整,因此今天我们将使用自定义清单来部署它。我知道这并不完美,但您只需要做一次。???

清单

在您盲目地将以下内容复制粘贴到您的集群之前,请注意:在 Deployment 资源的 args 部分中有 3 个硬编码变量。如果您计划在生产环境中部署,请使用 Secrets 并将其挂载为文件或环境变量,而不是像我在这演示中使用的那种随意的方法。

这 3 个硬编码变量是

  • InfluxDB URL
  • 组织名称
  • 令牌
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: custom-metrics-apiserver
  namespace: custom-metrics-server
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: custom-metrics-server-resources
rules:
  - apiGroups:
      - custom.metrics.k8s.io
    resources:
      - "*"
    verbs:
      - "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: external-metrics-server-resources
rules:
  - apiGroups:
      - external.metrics.k8s.io
    resources:
      - "*"
    verbs:
      - "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: custom-metrics-resource-reader
rules:
  - apiGroups:
      - ""
    resources:
      - namespaces
      - pods
      - services
    verbs:
      - get
      - list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: custom-metrics-resource-collector
rules:
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - create
      - patch
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - list
  - apiGroups:
      - apps
    resources:
      - deployments
      - statefulsets
    verbs:
      - get
  - apiGroups:
      - extensions
      - networking.k8s.io
    resources:
      - ingresses
    verbs:
      - get
  - apiGroups:
      - autoscaling
    resources:
      - horizontalpodautoscalers
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: hpa-controller-custom-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: custom-metrics-server-resources
subjects:
  - kind: ServiceAccount
    name: horizontal-pod-autoscaler
    namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: hpa-controller-external-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: external-metrics-server-resources
subjects:
  - kind: ServiceAccount
    name: horizontal-pod-autoscaler
    namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: custom-metrics-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
  - kind: ServiceAccount
    name: custom-metrics-apiserver
    namespace: custom-metrics-server
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-metrics:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
  - kind: ServiceAccount
    name: custom-metrics-apiserver
    namespace: custom-metrics-server
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-metrics-resource-collector
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: custom-metrics-resource-collector
subjects:
  - kind: ServiceAccount
    name: custom-metrics-apiserver
    namespace: custom-metrics-server
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  group: custom.metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: kube-metrics-adapter
    namespace: custom-metrics-server
  version: v1beta1
  versionPriority: 100
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.external.metrics.k8s.io
spec:
  group: external.metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: kube-metrics-adapter
    namespace: custom-metrics-server
  version: v1beta1
  versionPriority: 100
---
apiVersion: v1
kind: Service
metadata:
  name: kube-metrics-adapter
  namespace: custom-metrics-server
spec:
  ports:
    - port: 443
      targetPort: 443
  selector:
    app: kube-metrics-adapter
---
apiVersion: v1
kind: Namespace
metadata:
  name: custom-metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: kube-metrics-adapter
  name: kube-metrics-adapter
  namespace: custom-metrics-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-metrics-adapter
  template:
    metadata:
      labels:
        app: kube-metrics-adapter
    spec:
      containers:
        - args:
            - --influxdb-address=http://influxdb.monitoring.svc:9999
            - --influxdb-token=secret-token
            - --influxdb-org=InfluxData
          image: registry.opensource.zalan.do/teapot/kube-metrics-adapter:v0.1.5
          name: kube-metrics-adapter
      serviceAccountName: custom-metrics-apiserver

大型演示

现在,我们的 InfluxDB 和 Metrics Adapter 已在我们的集群中运行,让我们扩展一些 Pod!

为了使这个演示尽可能完整,我将涵盖使用 Telegraf 作为边车从 nginx 抓取指标,并使用 Kubernetes 的概念 initContainers 来创建我们的指标存储桶,使用 pkger。为了完成这两个步骤,我们需要注入一个 ConfigMap,提供 Telegraf 配置文件和 pkger 清单。我们的 nginx 配置也包括在内,它启用了状态页面。

必须 阅读每个 YAML 文件密钥上面的注释。

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-hpa
data:
  # This is our nginx configuration. It enables the status (/nginx_status) page to be scraped from Telegraf over the shared interface within the pod.
  default.conf: |
    server {
        listen       80;
        listen  [::]:80;
        server_name  localhost;

        location / {
            root   /usr/share/nginx/html;
            index  index.html index.htm;
        }

        location /nginx_status {
          stub_status;
          allow 127.0.0.1;	#only allow requests from localhost
          deny all;		#deny all other hosts
        }

        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   /usr/share/nginx/html;
        }
    }

  # This is our Telegraf configuration. It has the same hard coded values we mentioned earlier. You'll want to move them to secrets for a production deployment,
  # but I'm keeping that out of scope for this demo. We configure Telegraf to pull metrics from nginx and write to our local InfluxDB 2 instance.
  telegraf.conf: |
    [agent]
      interval = "2s"
      flush_interval = "2s"

    [[inputs.nginx]]
      urls = ["https://127.0.0.1/nginx_status"]
      response_timeout = "1s"

    [[outputs.influxdb_v2]]
      urls = ["http://influxdb.monitoring.svc:9999"]
      bucket = "nginx-hpa"
      organization = "InfluxData"
      token = "secret-token"

  # Finally, we need a bucket to store our metrics. You don't need a long retention, as it's only used for HPA.
  buckets.yaml: |
    apiVersion: influxdata.com/v2alpha1
    kind: Bucket
    metadata:
      name: nginx-hpa
    spec:
      description: Nginx HPA Example Bucket
      retentionRules:
      - type: expire
        everySeconds: 900

现在,我将部署 nginx 到集群中。我选择 nginx 因为它很容易通过大量的 HTTP 压力测试工具引起扩展事件;我将使用 baton

我们的 nginx 清单如下。再次提醒,请提取硬编码的值并使用 secrets!

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-hpa
spec:
  selector:
    matchLabels:
      app: nginx-hpa
  template:
    metadata:
      labels:
        app: nginx-hpa
    spec:
      volumes:
        - name: influxdb-config
          configMap:
            name: nginx-hpa
      initContainers:
        - name: influxdb
      image: quay.io/influxdb/influxdb:2.0.0-beta
      volumeMounts:
        - mountPath: /etc/influxdb
      name: influxdb-config
      command:
        - influx
      args:
        - --host
        - http://influxdb.monitoring.svc:9999
        - --token
        - secret-token
        - pkg
        - --file
        - /etc/influxdb/buckets.yaml
        - -o
        - InfluxData
        - --force
        - "true"
      containers:
        - name: nginx
          image: nginx:latest
          volumeMounts:
            - mountPath: /etc/nginx/conf.d/default.conf
              name: influxdb-config
              subPath: default.conf
          ports:
            - containerPort: 80
        - name: telegraf
          image: telegraf:1.16
          volumeMounts:
            - mountPath: /etc/telegraf/telegraf.conf
              name: influxdb-config
              subPath: telegraf.conf

最后,让我们看看完成我们演示的 HorizontalPodAutoscaler 清单。

我们添加了一个注释 metric-config.external.flux-query.influxdb/interval,允许我们指定要执行的 Flux 查询,以获取我们所需的指标,以确定是否应该扩展此部署。我们的 Flux 查询从我们的 nginx 测量中获取 waiting 字段,该字段大于零值是我们需要水平扩展以处理当前流量流量的强烈指标。

我们的目标是尽可能使等待数量接近0/1。我们还可以使用另一个注释,metric-config.external.flux-query.influxdb/interval,来定义我们希望多久检查一次流量和扩展事件。我们将使用5秒间隔。

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
  annotations:
    metric-config.external.flux-query.influxdb/interval: "5s"
    metric-config.external.flux-query.influxdb/http_requests: |
      from(bucket: "nginx-hpa")
        |> range(start: -30s)
        |> filter(fn: (r) => r._measurement == "nginx")
        |> filter(fn: (r) => r._field == "waiting")
        |> group()
        |> max()
        // Rename "_value" to "metricvalue" for letting the metrics server properly unmarshal the result.
        |> rename(columns: {_value: "metricvalue"})
        |> keep(columns: ["metricvalue"])
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-hpa
  minReplicas: 1
  maxReplicas: 4
  metrics:
    - type: External
      external:
        metric:
          name: flux-query
          selector:
            matchLabels:
              query-name: http_requests
        target:
          type: Value
          value: "1"

就是这样!知道了方法,是不是很简单?

如果您想更详细地了解这一内容,或者想了解更多关于使用InfluxDB监控Kubernetes的信息——请查看我的示例存储库,那里有更多精彩内容供您浏览。