This is part two of a two-part blog about Kubernetes DNS resolution and network access by Pods in Kubernetes. In part one we looked at internal Kubernetes DNS and how DNS resolution is configured for containers. In this part, we look at how network traffic gets from the containers in user workload Pods to Pods providing DNS functionality. We’re using Kubernetes running under Docker EE UCP (Docker Enterprise Edition Universal Control Plane) in this example. You can find more information about Docker EE here. Docker EE uses the Calico network plugin for Kubernetes, so some of the details are specific to Calico.
Service discovery is one of the important benefits of using a container/Pod orchestrator. When you create a Service in Kubernetes, controllers running behind the scenes create an entry in Kubernetes DNS records. Then other applications deployed in the cluster can look up the Service using its name. Kubernetes also configures routing within the cluster to send traffic for the Service to the Service’s ephemeral endpoint Pods.
Understanding Kubernetes DNS configuration and related traffic flow will help you troubleshoot problems accessing the cluster’s DNS from Pods. This is part one of a two-part deep-dive into how Kubernetes does this under the hood. In part one of this blog, we will look at how Kubernetes sets up DNS resolution for containers in Pods. In part two, we will look at how network traffic flows from containers in Pods for user workloads to the Pods providing DNS functionality. We’re going to use Kubernetes running under Docker Enterprise Edition for our examples in this blog.
In a previous post, What is Container Orchestration?, I explained container orchestration using some examples based on Docker Swarm. While Docker Swarm is undeniably easier to both use and explain, Kubernetes is by far the most prevalent container orchestrator today. So, I’m going to go through the same examples from that previous post but, this time, use Kubernetes. One of the great things about Docker Enterprise is it supports both Swarm and Kubernetes so I didn’t have to change my infrastructure at all.
Managing a Kubernetes cluster with one user is easy. Once you go beyond one user, you need to start using Role-Based Access Control (RBAC). I’ve delved into this topic several times in the past with posts on how to Create a Kubernetes User Sandbox in Docker Enterprise and Functional Kubernetes Namespaces in Docker Enterprise. But, once you get beyond a couple of users and/or teams and a few namespaces for them, it quickly becomes difficult to keep track of who can do what and where. And, as time goes on and more and more people have a hand in setting up your RBAC, it can get even more confusing. You can and should have your RBAC resource definitions in source control but it’s not easy to read and is hard to visualize. Enter the open source who-can kubectl plugin from the folks at Aqua Security. It gives you the ability to show who (subjects) can do what (verbs) to what (resources) and where (namespaces).
So a colleague of mine was helping his client configure Interlock and wanted to know more about how to configure Interlock Service Clusters. So I referred him to my previous blog – Interlock Service Clusters. While that article conceptually helps someone understand the capabilities of Interlock, it does not show any working code examples.
Let’s review what Docker Enterprise UCP Interlock provides. And then I will show you how to configure Interlock to support multiple ingresses each of which are tied to its own environment.
The Interlock ingress provides three services.
Interlock (I) – an overall manager of all things ingress and a listener to Swarm events. It spawns both the extension and proxy services.
Interlock Extension (IE) – When Interlock notices Swarm service changes it will then notify the Interlock Extension to create a new Nginx configuration file. That file is returned to the Interlock.
Interlock Proxy (IP) – the core ingress listener that routes traffic based on http host header to appropriate application services. It receives its Nginx configuration from Interlock whenever there are service changes that the Interlock Proxy needs to handle.
The Interlock services containers are represented in the diagram below as I for Interlock, IE for Interlock Extension, and IP for Interlock Proxy.
The shaded sections represent Docker Collections for dev, test, and prod environments; all managed within the single cluster. Integrating Interlock Service clusters into this approach provides a benefit in that of isolating problems to a single collection. This is a much more fault tolerant and ensures downstream test and prod ingress traffic is unaffected. The second benefit is that this provides greater ingress capacity for each environment. The production Interlock Proxies are dedicated for production use only and therefore does not share its capacity with dev and test ingress traffic.
We will establish 3 Interlock Service Clusters and have it deploy one ucp-interlock-proxy replica to each node that has the label of com.docker.interlock.service.cluster.
The overall process we work thru entails the following steps.
pulling down Interlock’s configuration toml
configuring three service clusters
upload a new configuration with a new name
restart the interlock service
The code that I will show you below is going to be applied to my personal cluster in AWS. In my cluster I have 1 manager, 1 dtr, and 3 worker nodes. Each worker node is assigned to one of 3 collections named /dev, /test, and /prod. I will setup a single dedicated interlock proxy on each of these environments to segregate ingress traffic for dev, test, and prod.
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
ziskz8lewtzu7tqtmx ip-127-13-5-3.us-west-2.compute.internal Ready Active 18.09.7
5ngrzymphsp4vlwww7 ip-127-13-6-2.us-west-2.compute.internal Ready Active 18.09.7
qqrs3gsq6irn9meho2 * ip-127-13-7-8.us-west-2.compute.internal Ready Active Leader 18.09.7
5bzaa5xckvzi4w84pm ip-127-13-1-6.us-west-2.compute.internal Ready Active 18.09.7
kv8mocefffu794d982 ip-127-13-1-5.us-west-2.compute.internal Ready Active 18.09.7
Step 1 – Verify Worker Nodes in Collections
Let’s examine the Let’s examine a worker node to determine its collection.
The default configuration for Interlock is to have two interlock proxies running anywhere in the cluster. The proxies configuration resides in a section named Extensions.default. This is the heart of an interlock service cluster. We will duplicate this section two times for a total of three sections and then rename them to suit our needs.
Step 5 – Edit Interlock Configuration
Copy the config.orig.toml file to config.new.toml. Then, using your favorite editor (vi of course) duplicate the Extensions.default section two more times. Rename each of the three Extension.defaults to Extension.dev, Extensions.test, and Extensions.prod. Each Extensions.<env> section has other sub-sections that include the same name plus a qualifier (e.g. Extensions.default.Config). These too will need to be renamed.
Now we have 3 named extensions for each of dev, test, and prod. Next, you will search for the PublishedSSLPort and change it to 8445 for dev, and 8444 for test and leave the value 8443 for prod. These 3 ports should be the values that the incoming load balancer uses in its back-end pools. For each environment specific VIP (dev, test, prod) the traffic will flow into the load balancer on port 443. The VIP used to access the load balancer will dictate how the traffic will be routed to the appropriate interlock proxy IP address and port.
Add a new property called ServiceCluster under each of the extensions sections and give it the name of dev, test, or prod.
You can also specify the constraint labels that will dictate where both the Interlock Extension and Interlock Proxies will run. Start by changing the Constraints and ProxyConstraints to use your new node labels.
The ProxyReplicas indicates how many container replicas to run for the interlock proxy service. We will set ours to 2. The ProxyServiceName is the name of the service as it is deployed into Swarm for this service. We will name ours ucp-interlock-proxy-dev which is specific to the environment it is supporting.
Of course you will do this for all three sections within the new configuration file. Below is a snippet of only the changes that I have made for the dev ingress configuration. You will want to repeat this for test and prod as well.
Step 7- Restart Interlock Service with New Configuration
docker service update --update-failure-action rollback \
--config-rm $CURRENT_CONFIG_NAME \
--config-add source=$NEW_CONFIG_NAME,target=/config.toml \
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
Note: in the above scenario the service update worked smoothly. Other times, such as when there are errors in your configuration, the service will rollback. In those cases you will want to do a docker ps -a | grep interlock and look for the recently exited docker/ucp-interlock container. Once you have its container id you can perform a docker logs <container-id> to see what went wrong.
Step 8 – Verify Everything is Working
We need to make sure that everything started up properly and are listening on their appropriate ports.
docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
y3jg0mka0w7b ucp-agent global 4/5 docker/ucp-agent:3.1.9
xdf9q5y4dev4 ucp-agent-win global 0/0 docker/ucp-agent-win:3.1.9
k0vb1yloiaqu ucp-auth-api global 0/1 docker/ucp-auth:3.1.9
ki8qeixu12d4 ucp-auth-worker global 0/1 docker/ucp-auth:3.1.9
nyr40a0zitbt ucp-interlock replicated 0/1 docker/ucp-interlock:3.1.9
ewwzlj198zc2 ucp-interlock-extension replicated 1/1 docker/ucp-interlock-extension:3.1.9
yg07hhjap775 ucp-interlock-extension-dev replicated 1/1 docker/ucp-interlock-extension:3.1.9
ifqzrt3kw95p ucp-interlock-extension-prod replicated 1/1 docker/ucp-interlock-extension:3.1.9
l6zg39sva9bb ucp-interlock-extension-test replicated 1/1 docker/ucp-interlock-extension:3.1.9
xkhrafdy3czt ucp-interlock-proxy-dev replicated 1/1 docker/ucp-interlock-proxy:3.1.9 *:8082->80/tcp, *:8445->443/tcp
wpelftw9q9co ucp-interlock-proxy-prod replicated 1/1 docker/ucp-interlock-proxy:3.1.9 *:8080->80/tcp, *:8443->443/tcp
g23ahtsxiktx ucp-interlock-proxy-test replicated 1/1 docker/ucp-interlock-proxy:3.1.9 *:8081->80/tcp, *:8444->443/tcp
You can see there are 3 new ucp-interlock-extension-<env> containers and 3 new ucp-interlock-proxy-<env> containers. You can also verify that they are listening on SSL port 8443 thru 8445. This is fine for a demonstration, but you will more than likely want to set the replica’s somewhere in the 2 to 5 range per environment. And of course you will determine that based on your traffic load.
NOTE: Often times after the update of the Interlock’s configuration you will still see the old ucp-interlock-extension and/or the ucp-interlock-proxy services still running. You can run the following command to remove these as they are no longer necessary.
docker service rm ucp-interlock-extension ucp-interlock-proxy
Step 9 – Deploy an Application
Now let’s deploy a demo service that we can route thru our new ingress. We’re going to take the standard docker demo application and deploy it to our dev cluster. Start by creating the following docker-compose.yml file:
Note that the com.docker.lb.network attribute is set to ingress_dev. I previously created this network outside of the stack. We will now utilize this network for all our ingress traffic from Interlock to our docker-demo container.
Also notice that the com.docker.lb.hosts attribute is set to ingress-demo.lab.capstonec.net. I logged into our DNS server and create a CNAME record with that name pointing to my AWS load balancer for the dev environment.
I also must configure my AWS load balancer to allow traffic to a Target Group of virtual machines. We can talk about your cloud configuration in another article down the road.
Let’s deploy that stack:
docker stack deploy -c docker-stack.yml demodev
Once the stack is deployed, we can verify that the services are running on the correct machine:
docker stack ps demodev
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
i3bght0p5d0j demodev_demo.1 ehazlett/docker-demo:latest ip-127-13-5-3.ec2.internal Running Running 10 hours ago
cyqfu0ormnn8 demodev_demo.2 ehazlett/docker-demo:latest ip-127-13-5-3.ec2.internal Running Running 10 hours ago
Finally we should be able to open a browser to http://ingress-demo.lab.capstonec.net which routes thru the dev interlock service cluster) and see the application running.
Well that was a decent amount of work but now you’re done. You’ve successfully implemented your first interlock service cluster which is highly available and segmented into three environments for dev, test, and prod!
As always if you have any questions or need any help please contact us.
Mark Miller Solutions Architect Docker Accredited Consultant
In the 1980s there was a funny television commercial for an insurance company that was debauching many other insurance companies. These hideous competitors trained their agents to “Say NO, deny the Claim!” thereby denying customers the benefits of the insurance policy they had purchased. It always made me chuckle and I still remember the chant to this day. I want to show you how you can do this, “Say no, deny pod access!” in Kubernetes using NetworkPolicies applied to your application deployments.
Recently while working with a customer who is quite new to Docker and the world of Kubernetes, they were inquiring about how to isolate their applications from each other in a shared Kubernetes cluster.
In a previous blog post entitled Kubernetes Workload Isolation I discussed how customers have segmented their cluster by using a combination of VLAN’s, Collections, and Namespaces. But if you are not utilizing VLAN’s to segment your networking among VM’s and if you are not using Collections to separate VM’s into different RBAC groups then you will need a different approach.
NetworkPolicies to the Rescue
Kubernetes namespaces provide isolation for administration purposes but are not sufficient to prevent network traversal. Kubernetes NetworkPolicies, however, provide the guardrails needed to restrict East-West traffic between pods and services in the cluster as well as North-South traffic between the pod and external resources.
David Thompson posted how to use Kubernetes NetworkPolicies in Docker Enterprise Edition to isolate applications from each other. He provides a tutorial-based approach around complete application isolation, including intra-pod isolation, and moves towards a final solution that opens up traffic from your ingress controller. He talks about how to:
deny all traffic to or from your pods
allow traffic between pods inside a namespace
allow ingress traffic from external sources to your pods
I want to take that technical insight and discuss how we can apply that information to your environment and discuss a practice around applying NetworkPolicies from a governance perspective.
It gets interesting when the development team is ready to deploy their application as Kubernetes pods. Security folks want the application as secure as possible including network access. Network folks want to ensure that networking follows enterprise standards and includes appropriate firewalls to isolate traffic. While developers just want their application to communicate properly amongst its various containers. I will now show you one approach to solving these lofty goals with NetworkPolicies.
Step 1 – Deny All
Like the television commercial chant “Say no, deny the claim!”, I would generally recommend starting with the most restrictive network permissions. New application teams should have a default NetworkPolicy that denies all ingress and egress traffic to all pods within their namespace.
Using the previous NetworkPolicy specification you can achieve total isolation by invoking: kubectl apply -f deny-all-np.yaml.
Let’s break this down. In the previous NetworkPolicy, you will notice that the podSelector is an empty set which means it applies to all pods. Kubernetes NetworkPolicies take a white-list approach. The policyTypes list both Ingress and Egress but does not specify any other attributes which means that it is white-listing nothing; effectively denying everything.
Applied During On-Boarding
This kind of NetworkPolicy would typically be applied to the namespace as it is created during the team’s on-boarding process. The on-boarding process is ideally a self-service portal where a team leader can invoke an automated process which will setup the team members inside of Docker Enterprise with appropriate permission to build, publish, and deploy their Kubernetes application. This should be controlled via automation or by an individual you trust to manually do the right thing. No apps are deployed at this time, but if someone subsequently does deploy an app then the default policy is to deny all traffic.
Step 2 – Allow Egress and Intra-Pod Access
In this second step, there are two parts. First, we remove the egress restriction because our applications will often need to access the outside world. So, in the following specification, you will see Egress has been removed from the list of policyTypes. This will grant your pods network access outside of the pod.
The second part of Step 2 is to allow intra-pod network access. Without this access your pods will not be able to communicate with each other in order to collaborate and produce the desired result. To do this we need to include some selectors under a new label of ingress:. We are essentially allowing ingress traffic from: the namespace selector and the pod selector. This will allow the pods in ns-app1 namespace to communicate with other pods in the same namespace and these pods must have a label of project-app1.
Keep in mind that NetworkPolicies are cumulative and inclusive. So this second NetworkPolicy will ride on top of the first and only open up what is required through the white-listing of ingress rules.
Applied by Pipeline During Deployment
Intra-pod network access could be enabled by the pipeline prior to deployment of the application. It could also have been done as part of the on-boarding process if you know what all your namespaces will be. Either way, it is not in the developers’ hands to establish network policies. Rather, it is a security/network concern and it is typically implemented as part of an automation process.
Step 3 – Allow Ingress
The type of application you are developing will determine if you need ingress traffic from a source external to the cluster. A typical web application has front-end websites that need ingress traffic and the supporting back-end micro-services do not. A mobile application may need ingress access to the API but that would often be handled by an application gateway in which case only the gateway needs ingress traffic.
This NetworkPolicy is adding an inclusive white-list rule which allows traffic from the ingress-nginx namespace. This NetworkPolicy is very specific about the matchLabels of the podSelector to target an individual set of pods that represent the web app that should handle ingress traffic.
Applied by Pipeline or Portal
The sample NetworkPolicy is very specific to this application. It is not generally applied to all applications.
The application of this NetworkPolicy should be handled by either the CI/CD pipeline or by a self-service portal. In both scenarios, the template is pre-defined and controlled outside of developer hands. The pipeline/portal will fill in the appropriate namespace and project name into the template while the developers have the flexibility to specify the ingress namespace.
In this three-step process to achieve application network isolation, I have prescribed that in no way are the developers responsible for or have any control over the NetworkPolicy. Rather, they can have these pre-defined and trusted templates applied to their project in an automated fashion. In the end, the application is isolated and yet has the proper source for ingress traffic.
This will require security, network, firewall, operations, CI/CD, and development teams to all be involved in some coordinated effort to standardize on a corporate strategy for application network isolation. That strategy must involve automation using both CI/CD and self-service portal to apply these NetworkPolicies.
As always, Capstone IT is here to help you with your Kubernetes needs.
Mark Miller Solution Architect Docker Accredited Consultant
Over the last two or three years I’ve given a similar presentation on containers to operations groups at clients, potential clients, conferences and meetups. Generally, they’re just getting started with containers and are wondering what orchestration is and how it impacts them. In this post, I will talk about what container orchestration is and provide several videos with simple examples of what it means.
There are many images of ships with pin-wheel colored containers in a myriad of stacked configurations. In the featured image above you can clearly see three ships at dock loaded with containers. These ships have unique destination port cities across the globe each one carrying a distinct set of product for a discreet set of customers. These containers carry a payload.
Our virtual docker containers carry a workload. So, the ships vary in what containers they carry, where they are transporting it, and for whom it belongs to. We will talk about how to get our virtual containers loaded into a particular ship and entertain one solution to VM and container isolation.
Over the years Capstone has work in many vertical industries. Several of Capstone’s customers have extremely regulated environments such as the banking, insurance, and financial investment industries. These industry verticals typically need to comply with numerous governing standards and often have unique ways of interpreting and applying those regulations to there IT infrastructure. All of these regulations are aimed at restricting, or at least minimizing, covert intrusion.
Traditional Application Isolation
One traditional approach to thwarting intrusion is to create virtual local area networks (VLAN’s) which separate and isolate sets of virtual machines (VM’s) using firewall rules. These sets of VM’s are placed into VLAN’s based on business oriented Application Groups (AG). The diagram below shows three AG’s for the Test environment and three more for Staging environment. This is typically handled by the enterprise network, security, and firewall teams.
This approach helps ensure that if any particular VM was compromised by a bad actor that they would not easily break into other important machines outside of their current VLAN. By using firewall rules and strict enforcement of only opening necessary ports between the VLAN’s you can achieve a high level of confidence in thwarting pervasive intrusion.
Docker Enterprise Collections
With the Docker Enterprise platform you can easily deploy work among worker VM’s using Docker Collections. Collections are a native enterprise feature that groups worker nodes and supports some RBAC restrictions. These Collections are named and support role based access control (RBAC) for restricting users from accessing and processing within each particular collection. This allows separation of workloads but does not necessarily guarantee network isolation.
This approach is similar to VLAN’s in that VM’s are separated into distinct groups. While the containerized applications are isolated and protected via RBAC, the VM’s are not isolated from each other. A rogue actor could still potentially hop from one VM to another across VLAN’s.
But we could combine the VLAN separation with the Collection separation and gain the benefits of both approaches.
There are other ways to slice and dice your platform. Your approach should follow your requirements. Some customers want isolation to happen at the environment level (e.g. dev versus test) to ensure that a breach in one environment does not affect another. In this example you might have 2 VLAN’s with 3 collections each. The collections still allow for individual AG ownership and placement.
Docker’s inclusion of Kubernetes into the Enterprise platform has a strong focus on integration. Kubernetes uses namespaces to organize deployments and pods while Swarm leverages Collections. Docker integrated its enterprise class RBAC model into Kubernetes for ensuring security amongst namespace scoped deployments. But namespaces do not directly allow targeting of any particular node or set of nodes in the cluster. Rather, namespaces are groupings of containers which potentially span across all the VM’s in a cluster.
However, if you want the benefits of Collections within the Kubernetes realm, Docker has the answer in the linking kubernetes namespace to a collection. The following screen-shot shows how it is done via the UCP web interface. You simply navigate to your namespace and then choose “Link Nodes in Collection”. This effectively pins your namespace to the collection you choose and therefore pins the workload to the set of VM’s within that collection.
Now we can combine VLAN’s, Collections, and Namespaces all together into the cluster’s configuration to obtain firewall enforced isolation of VM’s, grouping of VM’s based on Collection, and Kubernetes namespaces linked with collections.
There are several benefits to this approach and they include the following:
VM isolation within the VLAN’s enforced by firewall
Container deployments to Collections enforces workload placement
Docker Enterprise supports industry acclaimed Kubernetes scheduler
Kubernetes namespace linking to Collections provides placement of pod deployments on Collection based VMs
Kubernetes RBAC enforces access to pods/applications
All of this sounds great until you want to implement. We have great VM isolation thru VLANs, but UCP managers must be able to communicate with and manage each of the worker nodes in the cluster across all the VLAN’s. This means that firewall rules must be implemented to enable traffic over numerous docker ports that must be opened between the UCP management VLAN and each of your AG VLAN’s. In addition IP in IP traffic (IP Protocol 4) must be enabled on your firewall between Management VLAN and AG VLAN. These all must be factored into your rollout.
Using the Kubernetes orchestrator within the Docker Enterprise platform has great advantages including enterprise security and workload separation. In addition, you can apply traditional VLAN isolation of your VM’s in conjunction with Docker to enable VM isolation.
At Capstone we have a wide variety of experiences. But some of those experiences tend to have common architectural goals. Hopefully you have gained insight into how VLAN’s can be incorporated with your Docker Enterprise platform.
If you want more information or assistance you can contact me on linkedin or thru our Capstone site.
Mark Miller Solutions Architect Docker Accredited Consultant
Rather than using an external load balancer as the AWS and Azure cloud providers do for the LoadBalancer service type, an ingress uses an Ingress Controller to provide load balancing, SSL termination and other services within a Kubernetes cluster. A big advantage of using an ingress is its portability across all clusters regardless of the underlying infrastructure, i.e. cloud, virtualized or bare metal. Until recently, a disadvantage was an ingress only supported HTTP and HTTPS and you would need to use a NodePort service type for other protocols. However, NGINX has added support for other protocols to their ingress controller.
Recently I was troubleshooting a customer problem in their on-premise cluster. But I was not sure where the problem lay. So I switched over to using my colleagues Docker Enterprise demo cluster that is running in Azure. In this heterogeneous cluster are 1 Universal Control Plan (UCP) manager, 1 Docker Trusted Registry (DTR), 2 Windows workers, and 1 Linux worker.
I was attempting to reproduce my customer’s problem. However, what should have been easy turned into a problem; or else I wouldn’t be writing about it. I could not even get to my customer’s problem until I resolved an issue with simply building a linux image against a heterogeneous (Windows and Linux workers) cluster. At the time, it felt rather silly and frustrating all at the same time. All I could do was wring my hands and groan.
I had downloaded my client bundle and sourced it in my bash shell.
$ source env.sh
The next thing I needed was to build the docker image from my custom Dockerfile. The Dockerfile was based on nginx and had a custom nginx.conf loaded into the image.
$ cd ~/my-pp
$ docker build -t my-app:1.0 .
Sending build context to Docker daemon 4.096kB
worker-win-2: Step 1/3 : FROM nginx:1.15.2
worker-win-2: Pulling from library/nginx
Failed to build image: no matching manifest for unknown in the manifest list entries
Ok, based on the last line of the log output it is not obvious what the issue is. However, if you look at the machine name that the build command was sent to, it becomes quite obvious what the problem is. I cannot build a linux based image on a windows machine. But how do I specify the target operating system on the command line?
I knew my friend Chuck had already encountered this problem. So this is what he told me to do; add the following option –build-arg ‘constraint:ostype==linux’ to my build command.
In a heterogeneous cluster my builds are now targeting linux machines and not windows. Of course you can alternate the ostype to windows if that is your goal. Good luck and contact us at https://capstonec.com/contact-us/
Mark Miller Solutions Architect Docker Accredited Consultant