How Pods access Kubernetes DNS in Docker EE, part one

Service discovery is one of the important benefits of using a container/Pod orchestrator. When you create a Service in Kubernetes, controllers running behind the scenes create an entry in Kubernetes DNS records. Then other applications deployed in the cluster can look up the Service using its name. Kubernetes also configures routing within the cluster to send traffic for the Service to the Service’s ephemeral endpoint Pods.

Understanding Kubernetes DNS configuration and related traffic flow will help you troubleshoot problems accessing the cluster’s DNS from Pods. This is part one of a two-part deep-dive into how Kubernetes does this under the hood. In part one of this blog, we will look at how Kubernetes sets up DNS resolution for containers in Pods. In part two, we will look at how network traffic flows from containers in Pods for user workloads to the Pods providing DNS functionality. We’re going to use Kubernetes running under Docker Enterprise Edition for our examples in this blog.

Kubernetes DNS in Docker EE

First, let’s look at how Docker EE UCP sets up the Kubernetes cluster’s DNS. Expanding the acronyms, that’s Docker Enterprise Edition Universal Control Plane, and you can find more information about it here. This example is based on Docker EE 2.1 with UCP 3.1.7. There are several things to be aware of regarding the Kubernetes DNS implementation in this environment:

  • Docker EE UCP uses a Kubernetes Deployment to create kube-dns replicas and keep them running. UCP creates a Deployment named kube-dns in the kube-system namespace. The kube-dns Deployment and its underlying ReplicaSet(s) create kube-dns Pods and keep the desired number of the Pods running. Technically, the Pods are named kube-dns-<REPLICASET_SUFFIX>-<RANDOM_STRING>.

  • The implementation uses a Kubernetes Service to access the ephemeral kube-dns Pods. UCP deploys a Service named kube-dns in the kube-system namespace to load-balance the Pods created by the kube-dns Deployment. UCP sets the VIP for the Service to 10.96.0.10 by default.

  • In this implementation, Kubernetes configures Pods to use kube-dns for DNS name resolution. When you create Pods for application workloads, the kubelet on the node where the Pod is scheduled sets up /etc/resolv.conf in the containers. In particular, it sets the value of nameserver to the VIP of the kube-dns Service. The value of the –cluster-dns flag in the kubelet’s startup arguments determines the VIP of the kube-dns Service. Since kubelet runs as a container in the Docker EE environment, you can inspect that container to see those arguments. To see the startup arguments, inspect the ucp-kubelet container with the docker container inspect <CONTAINER_NAME/ID> command.

A basic cluster with Kubernetes DNS

From a high-level conceptual perspective, a simple two-node cluster looks like this diagram before deploying any “user” workloads.

Conceptual cluster view for Kubernetes DNS

There are a few things worth pointing out here:

  • Docker EE implements Kubernetes control plane components such as the API Server, the Controller Manager, and the Scheduler as Docker containers. Docker EE UCP manages those control plane containers.
  • The kube-dns Deployment and ReplicaSet are virtual in nature. There are no containers for these objects; they are really just controller processes/threads running as part of the kube-controller-manager container.
  • The kube-dns Pod and its IP address are ephemeral. This means that the IP address will change whenever the kube-dns Pod dies and a new replica is created. That is why DNS resolution in containers points to the kube-dns Service, and not directly to the kube-dns Pods.
  • The kube-dns Service is virtual in nature. It is implemented using iptables entries (or using IPVS) on every node. Kube-proxy configures iptables entries based on the state of cluster resources that kube-proxy watches via the kube-apiserver. We’ll take a look at the data in the iptables and talk more about kube-proxy a little later.

Name resolution traffic from Pods

Next, let’s look at how Pods for a typical workload access kube-dns. We will use the following environment:

  • Docker EE 2.1, with UCP 3.1.x and its default Kubernetes configuration:
    • Calico CNI plugin
    • Default Pod CIDR of 192.168.0.0/16
  • Ubuntu 16.04
  • Using VMs on-prem or in AWS

Kubernetes under Docker EE UCP behave similarly in most environments, except when installed on virtual machines in Azure. This is because Azure does not allow IP-IP encapsulation. Because of this, Docker EE sets up Azure networking and Azure IPAM instead of Calico networking and Calico IPAM in this case. Docker EE still uses Calico to enforce Kubernetes NetworkPolicies when installed in Azure.

Deploy an example Pod

Let’s add a label to a node to make sure that the Pod we use for our demo work will run only on that node. We are only doing this to keep things consistent and easy to keep track of as we work through our demo and explanation:

kubectl label node ip-172-30-4-42.us-east-2.compute.internal project=dns-test

Now let’s create a Deployment that runs a Pod on that node. We’ll use the nicola/netshoot image so the container in the Pod includes some useful network utilities that we can use later. Our Pod definition YAML file (netshoot-deploy.yaml) looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: netshoot
  name: netshoot-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      run: netshoot
  template:
    metadata:
      labels:
        run: netshoot
    spec:
      containers:
      - command: ["tail"]
        args: ["-f", "/dev/null"]
        image: nicolaka/netshoot
        name: netshoot-pod
      nodeSelector:
        project: dns-test

Create the Deployment:

kubectl apply -f netshoot-deploy.yaml

Verify that the resulting Pod is running on the correct node:

kubectl get po -o wide
NAME                                      READY   STATUS    RESTARTS   AGE   IP                NODE                                          NOMINATED NODE
netshoot-deploy-5f8cfd5d94-xxlxp          1/1     Running   0          5s    192.168.200.156   ip-172-30-4-42.us-east-2.compute.internal     <none>

A simplified version of our diagram now looks like this:

Pod-deployed and Kubernetes DNS

Check the iptables for Kubernetes DNS references

Next, we will look at the iptables on the node where our Pod is running. We will dump the iptables to a file with the command sudo iptables-save > iptables.txt. Then we will search for the kube-dns IP address (10.96.0.10) in the iptables.txt file. For a good diagram of the control flow of packets using the Kubernetes iptables see this diagram in Google Docs.

Our first set of matches looks like this:

-A KUBE-SERVICES ! -s 192.168.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU

-A KUBE-SERVICES ! -s 192.168.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment -- "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4

Let’s break down what these entries mean at a high level:

  • The first entry: This rule only applies to UDP packets that do not come from the IP addresses of Pods inside the cluster. It sends those packets to be processed by a rule that marks them for later masquerading. Masquerading is a variation of SNAT where packets will appear to use the IP address of their outbound interface as their source address. We are not interested in this rule for the purposes of this blog.
  • The second entry: Send UDP packets with a destination of the ClusterIP:port of the kube-dns Service to the KUBE-SVC-TCOU7JCQXEZGVUNU chain. We’ll look at that chain a little later.
  • The next two entries do the same thing for TCP packets, and again we are only interested in the second of these two entries. Note that TCP packets destined for the kube-dns Service are sent to a different target chain, KUBE-SVC-TCOU7JCQXEZGVUNU. Again, we will look at that chain later.

How is traffic sent to the KUBE-SERVICES chain to start with?

You may be wondering about how and when packets get sent to the KUBE-SERVICES chain. If we search for rules that send traffic to the KUBE-SERVICES chain, we find:

-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES

-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES

At a high level, this means that packets from local processes on the node (the OUTPUT chain in this case) and packets entering the node from the network (the PREROUTING chain in this case) will be sent to the KUBE-SERVICES chain.

Follow the traffic through jump targets in iptables

Getting back to tracking traffic from the netshoot Pod to the kube-dns Service, if we search for the jump targets KUBE-SVC-TCOU7JCQXEZGVUNU and KUBE-SVC-ERIFXISQEP7F7OF4, we find these entries:

-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-SEP-SKI5LDIQRRMBYDWW

-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-VMNICV4GUKMZMBOY

These rules just send the UDP and TCP packets destined for the kube-dns Service to the next step of processing by iptables.

Searching again for the next jump targets (KUBE-SEP-SKI5LDIQRRMBYDWW and KUBE-SEP-VMNICV4GUKMZMBOY), we find:

-A KUBE-SEP-SKI5LDIQRRMBYDWW -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 192.168.237.9:53

-A KUBE-SEP-VMNICV4GUKMZMBOY -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 192.168.237.9:53

These rules modify the UDP and TCP packets originally destined for the kube-dns Service so that their destination is port 53 on the IP address of the Pod1 where kube-dns is running. Since Pods are ephemeral, this Pod can die and another Pod will be started to take its place. If that happens, kube-proxy will change the iptables rules to point to the IP address of the new Pod.

1 Technically there can be multiple Pods running kube-dns, but in the small lab cluster used for this example there is only one kube-dns Pod.

Here is our diagram with a simplified perspective of the iptables added. We’ll only look at UDP traffic in the diagram, but TCP traffic uses a similar set of entries in iptables:

iptables

Some more background on iptables

If you would like to see a bigger picture of how iptables is structured, here is a diagram from Wikipedia that provides an overview of iptables packet flow: iptables-packet-flow

How resolv.conf, iptables and the kube-dns Service work together

Our netshoot Pod that needs to access DNS services has an IP address of 192.168.200.15. If we look at the /etc/resolv.conf file in the Pod we can see that the nameserver value is set to 10.96.0.10.

kubectl exec -it netshoot-deploy-5f8cfd5d94-xxlxp cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local us-east-2.compute.internal
options ndots:5

The netshoot container in the Pod uses the standard DNS port of 53 when doing name resolution. Thus, DNS queries will be sent to 10.96.01.10:53. Following the iptables rules we discussed earlier, we can see that after iptables processing, traffic from the Pod to 10.96.01.10:53 ends up being sent to 192.68.237.9:53.

But what modifies iptables to add the entries for 10.96.01.10 and 192.68.237.9? Kube-proxy does that work, and watches for creation and deletion of Services via the kube-apiserver. Kube-proxy also watches for changes to the endpoints for those Services. The endpoints for a Service are the Pods in a ready state that are selected by the Service’s selector. When a Service’s endpoints change, the kube-proxy on each node modifies iptables to correctly direct traffic to the new endpoints. If you are curious, first create a Service or force changes to the endpoint Pods for an existing Service. Then, take a look at the logs of the ucp-kube-proxy container with the docker container logs <CONTAINTER_NAME/ID>.

Traffic finally gets to the Service endpoints

Let’s see what the current endpoints for the kube-dns Service are:

kubectl -n kube-system describe svc kube-dns
Name:              kube-dns
Namespace:         kube-system
Labels:            k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=KubeDNS
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"k8s-app":"kube-dns","kubernetes.io/cluster-service":"true","ku..."
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP:                10.96.0.10
Port:              dns  53/UDP
TargetPort:        53/UDP
Endpoints:         192.168.237.9:53
Port:              dns-tcp  53/TCP
TargetPort:        53/TCP
Endpoints:         192.168.237.9:53
Session Affinity:  None
Events:            <none>

There are two endpoints for the kube-dns Service in our cluster; one for TCP traffic and one for UDP traffic. However, both endpoints use the IP address and port 192.168.237.0:53.

So far, so good; the DNS traffic from our netshoot Pod will be sent to a Pod that is an endpoint for the kube-dns Service. But exactly how does the traffic get from our netshoot Pod to the kube-dns Pod from a networking infrastructure and routing perspective? We will look at the details of how that happens in part two of this blog.

Summary

The details we covered in this post should give you some ideas about how to check that DNS resolution on Pods for user workloads is correctly configured. You should also be able to check that Pods and Services providing DNS functionality are correctly configured.

If you have questions or feel like you need help with Kubernetes, Docker or anything related to running your applications in containers, get in touch with us at Capstone IT.

Dave Thompson
Solution Architect
Docker Accredited Consultant
Certified Kubernetes Administrator
Certified Kubernetes Application Developer

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.