Monitoring the Kubernetes Nginx Ingress with the Nginx InfluxDB Module
By
Lorenzo Fontana /
Product, Use Cases, Developer
Feb 26, 2018
Navigate to:
In the process of moving some of our container workloads to Kubernetes we deployed the ingress-nginx project to have an Ingress controller that can instrument Nginx for incoming traffic to exposed services.
The project itself is pretty well crafted, and it met all the expectations we had for a project under the Kubernetes dome. Overall we like the idea that it has a controlling Daemon (the controller) controlling nginx by managing, configuring and scaling it.
However, after a few weeks of usage we noted that the whole thing was lacking one of the most important features in terms of observability: the ability to track down in real time all of the incoming requests in terms of local requests and proxied ones.
In fact, it was very easy for us to pull aggregated metrics on the status of the controller thanks to a mix of usage between the Prometheus endpoint and the Telegraf plugin. On the other hand, for certain situations we noted that it was very useful for us to keep track of every single request, pushing it as soon as it happens directly to InfluxDB.
We want to be able to do that for three main reasons:
- Spot as soon as possible any proxy backend error or unexpected status code;
- Understand how clients are connecting to our services, http methods, type of connection, requested endpoints;
- Taking action (with Kapacitor) on consistent streams of raw data. In this case, it is more effective because of the nature of the data itself. For example, I want an alert if the requests are not processed and not going back to the client that made them, and then, after the alert, I want to take action on that specific request.
When a bad request happens, doing a query like this is the ideal situation:
SELECT * FROM FROM nginx_requests.threedays WHERE "status" = '502'
1518524349994255769 173 0 text/html 152 GET 35 myserver 502 /bad
1518524349994714916 173 0 text/html 152 GET 35 myserver 502 /bad
After some searching in the interwebs, we wrote a module (nginx-influxdb-module) that acts as a filter on each request in a non-blocking fashion and sends out the processed data to an InfluxDB backend using UDP and line protocol.
Kubernetes Ingress and Telegraf as Sidecar
After writing the Nginx module to serve our purpose, we needed a way to connect it to the Kubernetes Ingress Controller. To do so, we actually forked the ingress project to compile the module inside its nginx.
Why forking? We needed to fork for a few reasons:
- Crafting a full pull request to deeply integrate the module (with the configmaps and everything) with the controller is an effort that requires a fork;
- Nginx supports dynamic modules, but has strict runtime requirements that do not allow just dropping in the module shared objects compiled in some other environment;
To use the module in the Kubernetes Nginx ingress controller, you have two options:
- Plain usage with direct UDP connection
- Connection using Telegraf as sidecar proxy
Plain Usage with Direct UDP Connection
Plain Usage with Direct UDP Connection
You can follow the official steps by just replacing the controller image in with-rbac.yml
or without-rbac.yml
with our fork’s controller.
Nota bene: Our fork is mirroring the actual tags of the official ingress controller. For each tag starting from nginx-0.10.2
you will find the equivalent ones here.
We don’t have a deeply integrated set of specific parameters for now, but we rely on the nginx.ingress.kubernetes.io/configuration-snippet annotation:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/configuration-snippet: |
influxdb server_name=yourappname host=your-influxdb port=8089 measurement=nginx enabled=true;
A full example using the annotation to configure the InfluxDB module would look like this:
---
apiVersion: v1
kind: Namespace
metadata:
name: caturday
---
apiVersion: v1
kind: Service
metadata:
name: caturday
namespace: caturday
labels:
app: caturday
spec:
selector:
app: caturday
ports:
- name: caturday
port: 8080
protocol: TCP
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/configuration-snippet: |
influxdb server_name=acceptance-ingress host=127.0.0.1 port=8094 measurement=nginx enabled=true;
name: caturday
namespace: caturday
spec:
rules:
- host: kittens.local
http:
paths:
- backend:
serviceName: caturday
servicePort: 8080
path: /
---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: caturday
namespace: caturday
labels:
app: catrday
spec:
replicas: 3
selector:
matchLabels:
app: caturday
template:
metadata:
labels:
app: caturday
spec:
containers:
- name: caturday
image: docker.io/fntlnz/caturday:latest
resources:
limits:
cpu: 0.1
memory: 100M
Connection Using Telegraf as Sidecar Proxy
This configuration is a bit different than the official one because it involves the deployment of a Telegraf container as a sidecar proxy in every Nginx controller pod. To do this, we need a different controller definition, a configmap to configure Telegraf’s environment variables and a secret for InfluxDB urls, username, password and database.
The Controller
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-ingress-controller
namespace: ingress-nginx
spec:
replicas: 1
selector:
matchLabels:
app: ingress-nginx
template:
metadata:
labels:
app: ingress-nginx
annotations:
prometheus.io/port: '10254'
prometheus.io/scrape: 'true'
spec:
serviceAccountName: nginx-ingress-serviceaccount
initContainers:
- command:
- sh
- -c
- sysctl -w net.core.somaxconn=32768; sysctl -w net.ipv4.ip_local_port_range="1024 65535"
image: docker.io/alpine:3.6
imagePullPolicy: IfNotPresent
name: sysctl
securityContext:
privileged: true
containers:
- name: nginx-ingress-controller
image: quay.io/fntlnz/nginx-ingress-controller:kubernetes-controller-8b30ff6
args:
- /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-http-backend
- --configmap=$(POD_NAMESPACE)/nginx-configuration
- --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
- --udp-services-configmap=$(POD_NAMESPACE)/udp-services
- --annotations-prefix=nginx.ingress.kubernetes.io
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
- name: nginx-telegraf-collector
image: docker.io/telegraf:1.5.2
ports:
- name: udp
containerPort: 8094
env:
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: ENV
valueFrom:
secretKeyRef:
name: telegraf
key: env
- name: MONITOR_RETENTION_POLICY
valueFrom:
secretKeyRef:
name: telegraf
key: monitor_retention_policy
- name: MONITOR_USERNAME
valueFrom:
secretKeyRef:
name: telegraf
key: monitor_username
- name: MONITOR_PASSWORD
valueFrom:
secretKeyRef:
name: telegraf
key: monitor_password
- name: MONITOR_HOST
valueFrom:
secretKeyRef:
name: telegraf
key: monitor_host
- name: MONITOR_DATABASE
valueFrom:
secretKeyRef:
name: telegraf
key: monitor_database
volumeMounts:
- name: config
mountPath: /etc/telegraf
volumes:
- name: config
configMap:
name: telegraf
The Telegraf's ConfigMap
---
apiVersion: v1
kind: ConfigMap
metadata:
name: telegraf
namespace: ingress-nginx
labels:
k8s-app: telegraf
data:
telegraf.conf: |+
[global_tags]
env = "$ENV"
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = false
quiet = false
logfile = ""
hostname = "$HOSTNAME"
omit_hostname = false
[[outputs.influxdb]]
urls = ["$MONITOR_HOST"]
database = "$MONITOR_DATABASE"
retention_policy = "$MONITOR_RETENTION_POLICY"
write_consistency = "any"
timeout = "5s"
username = "$MONITOR_USERNAME"
password = "$MONITOR_PASSWORD"
[[inputs.socket_listener]]
service_address = "udp://:8094"
The configmap needs to be configured in order to allow Telegraf to connect to the InfluxDB backend. To do so:
kubectl create secret -n ingress-nginx generic telegraf \
--from-literal=env=acc \
--from-literal=monitor_retention_policy="threedays" \
--from-literal=monitor_username="" \
--from-literal=monitor_password="" \
--from-literal=monitor_host=http://your-influxdb:8086 \
--from-literal=monitor_database=nginx_ingress
In the above example, we used “threedays” as the retention policy. You can leave the retention policy empty, but if you want to keep three days as in the example you can simply do this query:
CREATE RETENTION POLICY threedays ON nginx_ingress DURATION 3d REPLICATION 1
Next Steps
The general plan is to stabilize the module’s API and tag a 1.0 release. While doing this, we will complete the integration with the Kubernetes Ingress and send a PR to the upstream project to allow people to optionally enable the module in their Ingress controllers.
Is This Ready for Production?
We’re using all of this in production; however, we don’t recommend you to do that at this stage. If you have a staging/acceptance environment, you can test it and contribute to the module to allow us to move forward with a 1.0 release and help us contribute to the kubernetes/ingress-nginx project.
Conclusions
Even if writing this kind of module is an easy pick, it still requires effort. Fortunately, the journey has been made easier thanks to the nginx/nginx-tests repo that allowed us to spot bad and edge behaviors. Also, kudos to the Ingress controller that is highly customizable and exposes the nginx.ingress.kubernetes.io/configuration-snippet
, which turned to be really useful at this stage.