How Pods access Kubernetes DNS in Docker EE, part two

This is part two of a two-part blog about Kubernetes DNS resolution and network access by Pods in Kubernetes. In part one we looked at internal Kubernetes DNS and how DNS resolution is configured for containers. In this part, we look at how network traffic gets from the containers in user workload Pods to Pods providing DNS functionality. We’re using Kubernetes running under Docker EE UCP (Docker Enterprise Edition Universal Control Plane) in this example. You can find more information about Docker EE here. Docker EE uses the Calico network plugin for Kubernetes, so some of the details are specific to Calico.

Continuing from part one

In part one of this blog we deployed a Pod using the nicola/netshoot image, and we examined the details of that Pod’s iptables and DNS resolution configuration. We saw how they work together to direct DNS traffic from containers in that Pod to a Pod providing DNS functionality. Now we will look at how that traffic actually traverses the network from the netshoot Pod to the DNS Pod. From part one of this blog, we know that the DNS Pod is a kube-dns Pod in the kube-system namespace.

In most production environments, user workloads run on worker nodes while kube-dns Pods run on master/manager nodes. This means that the traffic from user workload Pods need to cross at least three different networks address spaces. It must first get from the network the pod is in to the network connecting the nodes, Then it must get across that network to a node where a kube-dns Pod is running. Finally, it must get from the node to the network address space where the kube-dns pod is running. We are using Pods running on Linux nodes in this example, so we will be looking at Linux networking. From a networking perspective, Pods are connected to nodes using links. To to be more accurate, the Pod’s network namespace is connected to the network namespace for one of the node’s network interfaces using a link.

We’ll use the following diagram for our discussion. While we will first look at the netshoot Pod on the worker node, the diagram also shows the master node and the kube-dns Pod that we will look at later. We can obtain the node and container IP addresses using commands like: kubectl get pods -o wide and kubectl get nodes -o wide.

links

Our netshoot Pod connects to the node it is running on using a link with the VETH (Virtual Ethernet) interface caliea9ab1b00b2. If you are following along on your local system, your VETH name will be different. We can figure out the link that is used and the VETH name using several approaches. Let’s look at two of those approaches here:

Identify links by correlating the interface indexes

We can execute a command on the Pod to list the interfaces on the Pod. Next, we find the index of the interface on the node that the Pod’s eth0 interface is connected to, Finally, we identify the interface on the node with that index.
List the interfaces on the Pod:

kubectl exec -it netshoot-deploy-5f8cfd5d94-n5n8p ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP mode DEFAULT group default
    link/ether a2:06:8d:89:75:d6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
  • Note that eth0 is associated with interface index 34 on the node: 4: eth0@if34.
  • Also, note that eth0 has index 4 in the Pod.

On the node, list the interfaces:

ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 02:aa:53:dc:75:a4 brd ff:ff:ff:ff:ff:ff
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:af:2a:ed:f5 brd ff:ff:ff:ff:ff:ff
4: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:51:d9:b5:45 brd ff:ff:ff:ff:ff:ff
6: veth7c66c3c@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
    link/ether 72:86:7f:1c:12:8b brd ff:ff:ff:ff:ff:ff link-netnsid 0
9: veth788481f@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
    link/ether f6:bf:5e:39:a7:e2 brd ff:ff:ff:ff:ff:ff link-netnsid 2
14: vethe7dd055@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP mode DEFAULT group     default
    link/ether aa:1f:29:3b:a2:47 brd ff:ff:ff:ff:ff:ff link-netnsid 3
15: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
33: veth9608edc@if32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
    link/ether 56:b7:94:ae:63:fa brd ff:ff:ff:ff:ff:ff link-netnsid 4
34: caliea9ab1b00b2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 5
  • Note that the interface with index 34 is caliea9ab1b00b2, which is associated with interface 4 in the Pod: 34: caliea9ab1b00b2@if4.

By correlating the respective indexes, we can deduce which link and interfaces connect the netshoot Pod and the node.

Identify links using the kubelet logs

Another approach is to look at some of the Pod creation details in the kubelet log. First identify the docker container where the kublet is running. Next we output its log to a file so that we can easily search the log data.

$ docker container ls |grep ucp-kubelet
2a8a6745fc06        docker/ucp-hyperkube:3.1.7   "/bin/kubelet_entryp…"   3 weeks ago         Up 11 hours

$ docker container logs 2a8a6745fc06  > kubelet_log.txt 2>&1

Searching in kubelet_log.txt we find a line showing the IP address assignment for eth0 in the Pod (192.168.200.156). We also find a line setting the host side VETH name to caliea9ab1b00b2:

2019-06-18 21:46:14.040 [INFO][28272] k8s.go 362: Calico CNI using IPs: [192.168.200.156/32]    ContainerID="7be2fdbecf847a2887665508ae4fb3ede5fa71ab721665aa3fe3250ab2dd249f" Namespace="default"    Pod="netshoot-deploy-5f8cfd5d94-xxlxp"   WorkloadEndpoint="ip--172--30--4--42.us--east--2.compute.internal-k8s-netshoot--deploy--5f8cfd5d94--xxlxp-eth0"
2019-06-18 21:46:14.040 [INFO][28272] network_linux.go 76: Setting the host side veth name to caliea9ab1b00b2     ContainerID="7be2fdbecf847a2887665508ae4fb3ede5fa71ab721665aa3fe3250ab2dd249f" Namespace="default"    Pod="netshoot-deploy-5f8cfd5d94-xxlxp"    WorkloadEndpoint="ip--172--30--4--42.us--east--2.compute.internal-k8s-netshoot--deploy--5f8cfd5d94--xxlxp-eth0"

To clear up some confusion if you are doing some hands-on work while reading this blog: If you exec into the netshoot Pod and check the routing table, you will notice that the default route is default via 169.254.1.1 dev eth0. This works because Calico configures an ARP proxy on the node side VETH of the link. Any ARP requests sent through eth0 on the Pod end up getting back the MAC Address of the VETH on the other end of the link. So, all outgoing traffic from the Pod gets sent to the other end of the link. That side of the link is in the node’s default network namespace . Traffic arriving there is forwarded based on the iptables, routing tables and interfaces in the node’s default namespace.

Traffic from the worker node to the master node

Next, the traffic from the netshoot Pod needs to get to the node where the kube-dns Pod is running. Let’s first look at the routing table on the worker node where the netshoot Pod is running:

$ ip route
default via 172.30.0.1 dev eth0
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1
172.18.0.0/16 dev docker_gwbridge  proto kernel  scope link  src 172.18.0.1
172.30.0.0/20 dev eth0  proto kernel  scope link  src 172.30.4.42
192.168.27.192/26 via 172.30.45.243 dev tunl0  proto bird onlink
192.168.52.64/26 via 172.30.13.169 dev tunl0  proto bird onlink
192.168.142.128/26 via 172.30.16.55 dev tunl0  proto bird onlink
192.168.178.0/26 via 172.30.2.206 dev tunl0  proto bird onlink
blackhole 192.168.200.128/26  proto bird
192.168.200.155 dev cali2467a3ac419  scope link
192.168.237.0/26 via 172.30.15.184 dev tunl0  proto bird onlink

We can see that traffic to our kube-dns Pod (traffic with a destination of 192.168.237.9) will use the route: 192.168.237.0/26 via 172.30.15.184 dev tunl0 proto bird onlink. The tunl0 interface is an NBMA (Non-Broadcast Multi-Access) type tunnel with a wild-card remote endpoint. This means that the route must specify the destination IP address for the traffic. In this case, the destination address is 172.30.15.184.

If we look at the node addresses in our cluster, we can see that 712.30.15.184 is the IP address of the master node in this example:

 kubectl get nodes -o wide
NAME                                          STATUS   ROLES    AGE   VERSION            INTERNAL-IP     EXTERNAL-IP     OS-IMAGE             KERNEL-VERSION   CONTAINER-RUNTIME
ip-172-30-13-169.us-east-2.compute.internal   Ready    <none>   26d   v1.11.9-docker-1   172.30.13.169   13.58.202.228   Ubuntu 18.04.2 LTS   4.4.0-1084-aws   docker://18.9.6
ip-172-30-15-184.us-east-2.compute.internal   Ready    master   26d   v1.11.9-docker-1   172.30.15.184   18.188.233.44   Ubuntu 18.04.2 LTS   4.4.0-1084-aws   docker://18.9.6
ip-172-30-16-55.us-east-2.compute.internal    Ready    <none>   26d   v1.11.9-docker-1   172.30.16.55    13.58.108.209   Ubuntu 18.04.2 LTS   4.4.0-1084-aws   docker://18.9.6
ip-172-30-2-206.us-east-2.compute.internal    Ready    <none>   26d   v1.11.9-docker-1   172.30.2.206    13.58.129.67    Ubuntu 18.04.2 LTS   4.4.0-1084-aws   docker://18.9.6
ip-172-30-4-42.us-east-2.compute.internal     Ready    <none>   26d   v1.11.9-docker-1   172.30.4.42     3.17.174.253    Ubuntu 18.04.2 LTS   4.4.0-1084-aws   docker://18.9.6
ip-172-30-45-243.us-east-2.compute.internal   Ready    <none>   26d   v1.11.9-docker-1   172.30.45.243   18.222.17.244   Ubuntu 18.04.2 LTS   4.4.0-1084-aws   docker://18.9.6

And we can verify that the kube-dns Pod is running on that master node:

kubectl -n kube-system get po -o wide
NAME                                      READY   STATUS    RESTARTS   AGE   IP               NODE                                          NOMINATED NODE
calico-kube-controllers-d875ff8d7-h7njn   1/1     Running   1          23d   172.30.15.184    ip-172-30-15-184.us-east-2.compute.internal   <none>
calico-node-5nvcg                         2/2     Running   2          23d   172.30.45.243    ip-172-30-45-243.us-east-2.compute.internal   <none>
calico-node-8pbqj                         2/2     Running   2          23d   172.30.15.184    ip-172-30-15-184.us-east-2.compute.internal   <none>
calico-node-9rg79                         2/2     Running   2          23d   172.30.13.169    ip-172-30-13-169.us-east-2.compute.internal   <none>
calico-node-bm9fq                         2/2     Running   2          23d   172.30.2.206     ip-172-30-2-206.us-east-2.compute.internal    <none>
calico-node-ks2z7                         2/2     Running   2          23d   172.30.4.42      ip-172-30-4-42.us-east-2.compute.internal     <none>
calico-node-z76dc                         2/2     Running   22         23d   172.30.16.55     ip-172-30-16-55.us-east-2.compute.internal    <none>
compose-5f54d6bf6b-crk6t                  1/1     Running   1          23d   192.168.237.11   ip-172-30-15-184.us-east-2.compute.internal   <none>
compose-api-d948f54d8-q2xfv               1/1     Running   1          23d   192.168.237.10   ip-172-30-15-184.us-east-2.compute.internal   <none>
kube-dns-6d96c4d9c6-xf7zs                 3/3     Running   3          23d   192.168.237.9    ip-172-30-15-184.us-east-2.compute.internal   <none>
ucp-metrics-gltfs                         3/3     Running   3          23d   192.168.237.12   ip-172-30-15-184.us-east-2.compute.internal   <none>

Tunneling Pod traffic between nodes

So how does the traffic from the netshoot Pod get to the node with the kube-dns Pod? The simple answer is that it travels “inside” a tunnel between the two nodes. Quoting Wikipedia: “IP tunnels are often used for connecting two disjoint IP networks that don’t have a native routing path to each other, via an underlying routable protocol across an intermediate transport network.” In this case, the two disjoint IP networks are the networks used for Pods on the two different nodes. One bit of information that is not shown on in the diagram is the CIDR block used for the Pods on each node. We can view this information by using the kubectl describe <NODE_NAME> command and looking for the PodCIDR line. If we ran that command against the two nodes, we would see that the two different networks used for Pods are 192.168.237.0/24 and 192.168.200.0/24.

Traffic is sent across the tunnel by encapsulating IP datagrams destined for one IP address (the DNS Pod’s IP address) inside IP datagrams sent to a different IP address (the manager node’s IP address). The “inner” datagrams from the netshoot Pod are encapsulated on the worker node end of the tunnel. They are then un-encapsulated on the manager/master node end of the tunnel.

This is our diagram showing the IP-IP tunnel:

IP-IP tunnel

Traffic from the master node to the kube-dns Pod

At this point, the datagrams from the netshoot Pod have reached the master node where the kube-dns Pod is running. They still need to be sent to the kube-dns Pod with the IP address of 192.168.237.9. If we look at the route table on the master node we see that traffic to 192.168.237.9 will be routed using the link: 192.168.237.9 dev cali2fdfb45461f scope link :

ip route
default via 172.30.0.1 dev eth0
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1
172.18.0.0/16 dev docker_gwbridge  proto kernel  scope link  src 172.18.0.1
172.30.0.0/20 dev eth0  proto kernel  scope link  src 172.30.15.184
192.168.27.192/26 via 172.30.45.243 dev tunl0  proto bird onlink
192.168.52.64/26 via 172.30.13.169 dev tunl0  proto bird onlink
192.168.142.128/26 via 172.30.16.55 dev tunl0  proto bird onlink
192.168.178.0/26 via 172.30.2.206 dev tunl0  proto bird onlink
192.168.200.128/26 via 172.30.4.42 dev tunl0  proto bird onlink
blackhole 192.168.237.0/26  proto bird
192.168.237.9 dev cali2fdfb45461f  scope link
192.168.237.10 dev cali0ab030104ba  scope link
192.168.237.11 dev cali9e9f97ae5bd  scope link
192.168.237.12 dev calia8afbdd294b  scope link

The corresponding line in the output from the ip link command on the master node is:

43: cali2fdfb45461f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 15

We can correlate this to the eth0 interface in the kube-dns Pod by executing a command against the Pod to list the interfaces.

kubectl -n kube-system exec kube-dns-6d96c4d9c6-xf7zs -- ip link
Defaulting container name to ucp-kubedns.
Use 'kubectl describe pod/kube-dns-6d96c4d9c6-xf7zs -n kube-system' to see all of the containers in this pod.
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if43: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1480 qdisc noqueue state UP
    link/ether ee:ee:17:e5:d0:a3 brd ff:ff:ff:ff:ff:ff
4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
  • Interface eth0 in the Pod has index 3 and is connected to the interface with index 43 on the node. This tells us that cali2fdfb45461f is the node interface that connects to the kube-dns Pod.

Thus any traffic routed to the cali2fdfb45461f interface is sent to the kube-dns Pod. So in this example, the Kubernetes DNS query traffic that originated from the netshoot Pod on a worker node finally arrives at the kube-dns Pod on the manager node. The traffic for the reply from the kube-dns Pod uses the same basic mechanism to get back to the netshoot Pod.

Summary

Usually Kubernetes networking “just works”. But sometimes misconfigurations in the cluster, firewall rules in the network fabric, or other configuration issues can cause connectivity problems. The details we covered in this post should give you some ideas about where to start troubleshooting. While we specifically talked about accessing the kube-dns Service, these concepts are applicable to any Service in Kubernetes.

If you have questions or feel like you need help with Kubernetes, Docker or anything related to running your applications in containers, get in touch with us at Capstone IT.

Dave Thompson
Solution Architect
Docker Accredited Consultant
Certified Kubernetes Administrator
Certified Kubernetes Application Developer

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.