K8S Network Test Daemonset

K8S Network Test Daemonset

Github Project: https://github.com/atkaper/k8s-network-test-daemonset


An on-premise K8S (kubernetes) cluster needs a proper working virtual network to connect all masters and nodes to each other. In our situation, the host machines (vmware redhat), are not all 100% the same, and can not easily be wiped clean on new K8S and OS upgrades. Therefor we sometimes experienced issues in which the nodes or masters could not always reach each other. We did use the flannel network, which often caused weird issues. We recently switched to using calico, which seems much more stable.

To detect if the network is functioning correct, we have created a piece of test software. Nothing fancy, just a simple shell script. This script runs as a daemonset in the cluster, on both masters and nodes. It tries to ping all members of the daemonset, and ping the host machines, and tests if the k8s nameservers are reachable. If all this works, it is a nice indication of the health of the cluster’s network.

You can look at the test results by looking at the log’s of each pod. The test runs every minute.

Another way to look at it, is by enabling prometheus to poll the data. The data is available in prometheus format on url /prometheus.report (port 8123). We have added prometheus.io annotations in the k8s.yml file, which trigger prometheus to poll all daemonset pods for this data. This way you can create a graph or counter on your dashboard with the network health. Note: this graph/counter is on our TODO list, so there’s no example here. You should probably create an expression to calculate the number of kube-dns instances + 1, and the number of nodes and masters (compute to power of 2), and subtract the dns/node-OK counters from that to get to zero errors on your dashboard. For now we use a query which adds all error counts: “sum(networktest_total_error_count)”. Disadvantage of this; if a test pod is down, it does not report error’s itself. Of course all other test pods will mark it as error, so it shows up anyway as non-zero errors 😉


docker build -t repository-url/k8s-network-test-daemonset:0.1 . docker push repository-url/k8s-network-test-daemonset:0.1


In k8s.yml, replace “##DOCKER_REGISTRY##/k8s-network-test-daemonset:##VERSION##” by the image name/version. In our environment, the jenkins build pipeline takes care of that. The set will run in the kube-system namespace. It add’s the needed rbac (security) information also. If you do not have rbac enabled, you might need to strip down the k8s.yml file a bit.

kubectl apply -f k8s.yml
Code language: CSS (css)

Example output

[master1]$ kubectl get pods -n kube-system -o wide | grep k8s-network-test-daemonset k8s-network-test-daemonset-2c7mk 1/1 Running 0 39d ahlt1828 k8s-network-test-daemonset-brxh6 1/1 Running 0 39d ahlt1827 k8s-network-test-daemonset-k6s9b 1/1 Running 0 39d ahlt1825 k8s-network-test-daemonset-kwsjp 1/1 Running 0 38d ahlt1625 k8s-network-test-daemonset-l47w7 1/1 Running 1 39d ahlt1826 k8s-network-test-daemonset-lsgn5 1/1 Running 1 39d ahlt1627 k8s-network-test-daemonset-mzw2z 1/1 Running 0 39d ahlt1799 k8s-network-test-daemonset-rwncl 1/1 Running 1 4d ahlt1626 k8s-network-test-daemonset-tmbbt 1/1 Running 0 39d ahlt1628 k8s-network-test-daemonset-tqxmx 1/1 Running 0 39d ahlt1569 k8s-network-test-daemonset-tvh57 1/1 Running 0 39d ahlt1632 k8s-network-test-daemonset-vzd4f 1/1 Running 0 39d ahlt1798 k8s-network-test-daemonset-wgn9j 1/1 Running 0 39d ahlt1630 k8s-network-test-daemonset-zvbfb 1/1 Running 0 39d ahlt1629 [master1]$ kubectl logs --tail 30 k8s-network-test-daemonset-2c7mk -n kube-system ... chopped some lines ... Tue Jan 16 11:57:01 CET 2018 Tests running on node: ahlt1828, host: k8s-network-test-daemonset-2c7mk DNS: kube-dns.kube-system.svc.cluster.tst.local. DNS: kube-dns-5977b8689-2qbmq. DNS: kube-dns-5977b8689-xmh6h. Testing 14 nodes Checking: ahlt1828 Running - k8s-network-test-daemonset-2c7mk; host-ping: 0.00 pod-ping: 0.00 - OK Checking: ahlt1827 Running - k8s-network-test-daemonset-brxh6; host-ping: 0.00 pod-ping: 0.00 - OK Checking: ahlt1825 Running - k8s-network-test-daemonset-k6s9b; host-ping: 0.00 pod-ping: 0.00 - OK Checking: ahlt1625 Running - k8s-network-test-daemonset-kwsjp; host-ping: 0.00 pod-ping: 0.00 - OK Checking: ahlt1826 Running - k8s-network-test-daemonset-l47w7; host-ping: 0.00 pod-ping: 0.00 - OK Checking: ahlt1627 Running - k8s-network-test-daemonset-lsgn5; host-ping: 0.00 pod-ping: 0.00 - OK Checking: ahlt1799 Running - k8s-network-test-daemonset-mzw2z; host-ping: 0.00 pod-ping: 0.00 - OK Checking: ahlt1626 Running - k8s-network-test-daemonset-rwncl; host-ping: 0.00 pod-ping: 0.00 - OK Checking: ahlt1628 Running - k8s-network-test-daemonset-tmbbt; host-ping: 0.00 pod-ping: 0.00 - OK Checking: ahlt1569 Running - k8s-network-test-daemonset-tqxmx; host-ping: 0.00 pod-ping: 0.00 - OK Checking: ahlt1632 Running - k8s-network-test-daemonset-tvh57; host-ping: 0.00 pod-ping: 0.00 - OK Checking: ahlt1798 Running - k8s-network-test-daemonset-vzd4f; host-ping: 0.00 pod-ping: 0.00 - OK Checking: ahlt1630 Running - k8s-network-test-daemonset-wgn9j; host-ping: 0.00 pod-ping: 0.00 - OK Checking: ahlt1629 Running - k8s-network-test-daemonset-zvbfb; host-ping: 0.00 pod-ping: 0.00 - OK No status changes since previous test run.
Code language: JavaScript (javascript)


This daemonset has been tested on kubernetes 1.6.x (using flannel) and 1.8.4 (using calico). The last one being much more stable than the first one 😉



One thought on “K8S Network Test Daemonset

Leave a Reply

Your email address will not be published.