This is part two of a two-part blog about Kubernetes DNS resolution and network access by Pods in Kubernetes. In part one we looked at internal Kubernetes DNS and how DNS resolution is configured for containers. In this part, we look at how network traffic gets from the containers in user workload Pods to Pods providing DNS functionality. We’re using Kubernetes running under Docker EE UCP (Docker Enterprise Edition Universal Control Plane) in this example. You can find more information about Docker EE here. Docker EE uses the Calico network plugin for Kubernetes, so some of the details are specific to Calico.
Service discovery is one of the important benefits of using a container/Pod orchestrator. When you create a Service in Kubernetes, controllers running behind the scenes create an entry in Kubernetes DNS records. Then other applications deployed in the cluster can look up the Service using its name. Kubernetes also configures routing within the cluster to send traffic for the Service to the Service’s ephemeral endpoint Pods.
Understanding Kubernetes DNS configuration and related traffic flow will help you troubleshoot problems accessing the cluster’s DNS from Pods. This is part one of a two-part deep-dive into how Kubernetes does this under the hood. In part one of this blog, we will look at how Kubernetes sets up DNS resolution for containers in Pods. In part two, we will look at how network traffic flows from containers in Pods for user workloads to the Pods providing DNS functionality. We’re going to use Kubernetes running under Docker Enterprise Edition for our examples in this blog.
In a previous post, What is Container Orchestration?, I explained container orchestration using some examples based on Docker Swarm. While Docker Swarm is undeniably easier to both use and explain, Kubernetes is by far the most prevalent container orchestrator today. So, I’m going to go through the same examples from that previous post but, this time, use Kubernetes. One of the great things about Docker Enterprise is it supports both Swarm and Kubernetes so I didn’t have to change my infrastructure at all.
Managing a Kubernetes cluster with one user is easy. Once you go beyond one user, you need to start using Role-Based Access Control (RBAC). I’ve delved into this topic several times in the past with posts on how to Create a Kubernetes User Sandbox in Docker Enterprise and Functional Kubernetes Namespaces in Docker Enterprise. But, once you get beyond a couple of users and/or teams and a few namespaces for them, it quickly becomes difficult to keep track of who can do what and where. And, as time goes on and more and more people have a hand in setting up your RBAC, it can get even more confusing. You can and should have your RBAC resource definitions in source control but it’s not easy to read and is hard to visualize. Enter the open source who-can kubectl plugin from the folks at Aqua Security. It gives you the ability to show who (subjects) can do what (verbs) to what (resources) and where (namespaces).
I recently worked with a customer to customize all of the default Classless Interdomain Routing (CIDR) ranges used for IP address allocation by Docker Enterprise Edition 3.0 (Docker EE 3.0). The customer primarily wanted to document the customization process for future use. However, there is often a real need to change some of the default CIDR ranges to avoid conflicts with existing private IP addresses already in use within a customer’s network. Typically such a conflict will make it impossible for applications running in containers or pods to access external hosts in the conflicting CIDR range.
I once heard a story from someone who worked at ConAgra. They produce and sell a variety of food products that you and I eat all the time. The most notorious is the ButterBall turkey. ConAgra owned Butterball from 1990 to 2006. Every Thanksgiving holiday, so I am told, ButterBall would have to scale up their call-center as well as their website to a couple hundred web servers to handle the demand for “how to cook my turkey?” That’s a lot of hardware!
We are only a couple months away from Thanksgiving. So, what do you call a turkey on the day after Thanksgiving? Lucky. #dadjoke
So, I decided to look into how to automatically scale up my Docker worker nodes when the average CPU threshold of 80% utilization is crossed. It turns out to be pretty straight forward and can be accomplished with a three step process.
Create a Launch Template
Create an Auto Scaling Group
Create a cpu load greater than my threshold
Let’s be clear here. I am not talking about Kubernetes dynamically scaling out my pods. I am talking about measuring the average cpu utilization across all of my worker nodes and creating a brand new virtual machine (VM) that is added to my cluster. So when ALL my containers get a bit punchy, then it’s time to expand their world.
Before we get started with this 3 step approach, let’s talk about my cluster. Currently my demo cluster contains 1 UCP manager, 1 DTR, and 1 Worker. By the time we are done, I will show you how it dynamically grows by one worker node.
Create a Launch Template
Logging into my AWS account using the left navigation to find Instances and select the Launch Templates.
You can see in the diagram above that I already have a launch template created. Let’s see what it takes to create this. By selecting the Create Launch Template button you will get the following form to fill out. There are three critical things to fill out here. First you need to select an AMI by it’s ID and it should be a known AMI that you will use for all your Docker nodes. Second, you want to choose an appropriately sized VM that suits the needs of your worker nodes. I chose a t2.medium for demonstration purposes only. Third, be sure to include your key pair so that you can login to your VM. Next you will need to choose the appropriate security groups to apply to this VM so that you can SSH into the machine as needed and allow http/https traffic.
Next you will want to scroll down to the bottom of the form and open the Advanced Details. This section allows you to execute a one-time only post-create-vm script. This is the secret sauce of how my new VM joins the cluster.
This script includes the download and install of the docker engine as well as the command to join the VM to the Docker cluster. Copy the script from below.
This will immediately spin up 2 programs to stress each cpu on this VM. As you’ll notice below stress is running at a hefty pace on both cores for a total system load of 98.8% cpu utilization.
This load will certainly surpass our 75% threshold for our auto scaling group. And looking at the Activity History you can see the event triggered a new VM to be provisioned.
This is very exciting to me. Now we just need to wait until it is totally up and running and verify if the Docker engine was installed correctly and see if it properly joins the cluster.
Well, there you have it. I now have 2 Kubernetes worker nodes in my cluster!
The beauty of Auto Scaling Groups is that you can tune many parameters including the instance type of the VM you are creating. You can even choose to use Spot Instances and get up to 90% reduced cost for your AWS VM’s.
When I have a customer facing containerized web application that will get hammered over the holidays, then I now have a solution to automatically scale out my VM’s using an AWS feature called Auto Scaling Groups. There is so much more to talk about regarding scaling including placement of new vm’s, scaling policies, scaling back, etc.
My family told me to stop telling Thanksgiving jokes, but I told them I couldn’t quit “cold turkey”. Feel free to contact us at Capstone IT especially if you want more #dadjokes. See you next time.
Mark Miller Solutions Architect Docker Accredited Consultant
So a colleague of mine was helping his client configure Interlock and wanted to know more about how to configure Interlock Service Clusters. So I referred him to my previous blog – Interlock Service Clusters. While that article conceptually helps someone understand the capabilities of Interlock, it does not show any working code examples.
Let’s review what Docker Enterprise UCP Interlock provides. And then I will show you how to configure Interlock to support multiple ingresses each of which are tied to its own environment.
The Interlock ingress provides three services.
Interlock (I) – an overall manager of all things ingress and a listener to Swarm events. It spawns both the extension and proxy services.
Interlock Extension (IE) – When Interlock notices Swarm service changes it will then notify the Interlock Extension to create a new Nginx configuration file. That file is returned to the Interlock.
Interlock Proxy (IP) – the core ingress listener that routes traffic based on http host header to appropriate application services. It receives its Nginx configuration from Interlock whenever there are service changes that the Interlock Proxy needs to handle.
The Interlock services containers are represented in the diagram below as I for Interlock, IE for Interlock Extension, and IP for Interlock Proxy.
The shaded sections represent Docker Collections for dev, test, and prod environments; all managed within the single cluster. Integrating Interlock Service clusters into this approach provides a benefit in that of isolating problems to a single collection. This is a much more fault tolerant and ensures downstream test and prod ingress traffic is unaffected. The second benefit is that this provides greater ingress capacity for each environment. The production Interlock Proxies are dedicated for production use only and therefore does not share its capacity with dev and test ingress traffic.
We will establish 3 Interlock Service Clusters and have it deploy one ucp-interlock-proxy replica to each node that has the label of com.docker.interlock.service.cluster.
The overall process we work thru entails the following steps.
pulling down Interlock’s configuration toml
configuring three service clusters
upload a new configuration with a new name
restart the interlock service
The code that I will show you below is going to be applied to my personal cluster in AWS. In my cluster I have 1 manager, 1 dtr, and 3 worker nodes. Each worker node is assigned to one of 3 collections named /dev, /test, and /prod. I will setup a single dedicated interlock proxy on each of these environments to segregate ingress traffic for dev, test, and prod.
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
ziskz8lewtzu7tqtmx ip-127-13-5-3.us-west-2.compute.internal Ready Active 18.09.7
5ngrzymphsp4vlwww7 ip-127-13-6-2.us-west-2.compute.internal Ready Active 18.09.7
qqrs3gsq6irn9meho2 * ip-127-13-7-8.us-west-2.compute.internal Ready Active Leader 18.09.7
5bzaa5xckvzi4w84pm ip-127-13-1-6.us-west-2.compute.internal Ready Active 18.09.7
kv8mocefffu794d982 ip-127-13-1-5.us-west-2.compute.internal Ready Active 18.09.7
Step 1 – Verify Worker Nodes in Collections
Let’s examine the Let’s examine a worker node to determine its collection.
The default configuration for Interlock is to have two interlock proxies running anywhere in the cluster. The proxies configuration resides in a section named Extensions.default. This is the heart of an interlock service cluster. We will duplicate this section two times for a total of three sections and then rename them to suit our needs.
Step 5 – Edit Interlock Configuration
Copy the config.orig.toml file to config.new.toml. Then, using your favorite editor (vi of course) duplicate the Extensions.default section two more times. Rename each of the three Extension.defaults to Extension.dev, Extensions.test, and Extensions.prod. Each Extensions.<env> section has other sub-sections that include the same name plus a qualifier (e.g. Extensions.default.Config). These too will need to be renamed.
Now we have 3 named extensions for each of dev, test, and prod. Next, you will search for the PublishedSSLPort and change it to 8445 for dev, and 8444 for test and leave the value 8443 for prod. These 3 ports should be the values that the incoming load balancer uses in its back-end pools. For each environment specific VIP (dev, test, prod) the traffic will flow into the load balancer on port 443. The VIP used to access the load balancer will dictate how the traffic will be routed to the appropriate interlock proxy IP address and port.
Add a new property called ServiceCluster under each of the extensions sections and give it the name of dev, test, or prod.
You can also specify the constraint labels that will dictate where both the Interlock Extension and Interlock Proxies will run. Start by changing the Constraints and ProxyConstraints to use your new node labels.
The ProxyReplicas indicates how many container replicas to run for the interlock proxy service. We will set ours to 2. The ProxyServiceName is the name of the service as it is deployed into Swarm for this service. We will name ours ucp-interlock-proxy-dev which is specific to the environment it is supporting.
Of course you will do this for all three sections within the new configuration file. Below is a snippet of only the changes that I have made for the dev ingress configuration. You will want to repeat this for test and prod as well.
Step 7- Restart Interlock Service with New Configuration
docker service update --update-failure-action rollback \
--config-rm $CURRENT_CONFIG_NAME \
--config-add source=$NEW_CONFIG_NAME,target=/config.toml \
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
Note: in the above scenario the service update worked smoothly. Other times, such as when there are errors in your configuration, the service will rollback. In those cases you will want to do a docker ps -a | grep interlock and look for the recently exited docker/ucp-interlock container. Once you have its container id you can perform a docker logs <container-id> to see what went wrong.
Step 8 – Verify Everything is Working
We need to make sure that everything started up properly and are listening on their appropriate ports.
docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
y3jg0mka0w7b ucp-agent global 4/5 docker/ucp-agent:3.1.9
xdf9q5y4dev4 ucp-agent-win global 0/0 docker/ucp-agent-win:3.1.9
k0vb1yloiaqu ucp-auth-api global 0/1 docker/ucp-auth:3.1.9
ki8qeixu12d4 ucp-auth-worker global 0/1 docker/ucp-auth:3.1.9
nyr40a0zitbt ucp-interlock replicated 0/1 docker/ucp-interlock:3.1.9
ewwzlj198zc2 ucp-interlock-extension replicated 1/1 docker/ucp-interlock-extension:3.1.9
yg07hhjap775 ucp-interlock-extension-dev replicated 1/1 docker/ucp-interlock-extension:3.1.9
ifqzrt3kw95p ucp-interlock-extension-prod replicated 1/1 docker/ucp-interlock-extension:3.1.9
l6zg39sva9bb ucp-interlock-extension-test replicated 1/1 docker/ucp-interlock-extension:3.1.9
xkhrafdy3czt ucp-interlock-proxy-dev replicated 1/1 docker/ucp-interlock-proxy:3.1.9 *:8082->80/tcp, *:8445->443/tcp
wpelftw9q9co ucp-interlock-proxy-prod replicated 1/1 docker/ucp-interlock-proxy:3.1.9 *:8080->80/tcp, *:8443->443/tcp
g23ahtsxiktx ucp-interlock-proxy-test replicated 1/1 docker/ucp-interlock-proxy:3.1.9 *:8081->80/tcp, *:8444->443/tcp
You can see there are 3 new ucp-interlock-extension-<env> containers and 3 new ucp-interlock-proxy-<env> containers. You can also verify that they are listening on SSL port 8443 thru 8445. This is fine for a demonstration, but you will more than likely want to set the replica’s somewhere in the 2 to 5 range per environment. And of course you will determine that based on your traffic load.
NOTE: Often times after the update of the Interlock’s configuration you will still see the old ucp-interlock-extension and/or the ucp-interlock-proxy services still running. You can run the following command to remove these as they are no longer necessary.
docker service rm ucp-interlock-extension ucp-interlock-proxy
Step 9 – Deploy an Application
Now let’s deploy a demo service that we can route thru our new ingress. We’re going to take the standard docker demo application and deploy it to our dev cluster. Start by creating the following docker-compose.yml file:
Note that the com.docker.lb.network attribute is set to ingress_dev. I previously created this network outside of the stack. We will now utilize this network for all our ingress traffic from Interlock to our docker-demo container.
Also notice that the com.docker.lb.hosts attribute is set to ingress-demo.lab.capstonec.net. I logged into our DNS server and create a CNAME record with that name pointing to my AWS load balancer for the dev environment.
I also must configure my AWS load balancer to allow traffic to a Target Group of virtual machines. We can talk about your cloud configuration in another article down the road.
Let’s deploy that stack:
docker stack deploy -c docker-stack.yml demodev
Once the stack is deployed, we can verify that the services are running on the correct machine:
docker stack ps demodev
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
i3bght0p5d0j demodev_demo.1 ehazlett/docker-demo:latest ip-127-13-5-3.ec2.internal Running Running 10 hours ago
cyqfu0ormnn8 demodev_demo.2 ehazlett/docker-demo:latest ip-127-13-5-3.ec2.internal Running Running 10 hours ago
Finally we should be able to open a browser to http://ingress-demo.lab.capstonec.net which routes thru the dev interlock service cluster) and see the application running.
Well that was a decent amount of work but now you’re done. You’ve successfully implemented your first interlock service cluster which is highly available and segmented into three environments for dev, test, and prod!
As always if you have any questions or need any help please contact us.
Mark Miller Solutions Architect Docker Accredited Consultant
One of the customers I support is using Kubernetes under Docker EE UCP (Enterprise Edition Universal Control Plane) and has been very impressed with its stability and ease of management. Recently, however, a worker node that had been very stable for months started evicting Kubernetes pods extremely frequently, reporting inadequate CPU resources. Our DevOps team was still experimenting with determining resource requirements for many of their containerized apps, so at first, we thought the problem was caused by resource contention between pods running on the node.
In the 1980s there was a funny television commercial for an insurance company that was debauching many other insurance companies. These hideous competitors trained their agents to “Say NO, deny the Claim!” thereby denying customers the benefits of the insurance policy they had purchased. It always made me chuckle and I still remember the chant to this day. I want to show you how you can do this, “Say no, deny pod access!” in Kubernetes using NetworkPolicies applied to your application deployments.
Recently while working with a customer who is quite new to Docker and the world of Kubernetes, they were inquiring about how to isolate their applications from each other in a shared Kubernetes cluster.
In a previous blog post entitled Kubernetes Workload Isolation I discussed how customers have segmented their cluster by using a combination of VLAN’s, Collections, and Namespaces. But if you are not utilizing VLAN’s to segment your networking among VM’s and if you are not using Collections to separate VM’s into different RBAC groups then you will need a different approach.
NetworkPolicies to the Rescue
Kubernetes namespaces provide isolation for administration purposes but are not sufficient to prevent network traversal. Kubernetes NetworkPolicies, however, provide the guardrails needed to restrict East-West traffic between pods and services in the cluster as well as North-South traffic between the pod and external resources.
David Thompson posted how to use Kubernetes NetworkPolicies in Docker Enterprise Edition to isolate applications from each other. He provides a tutorial-based approach around complete application isolation, including intra-pod isolation, and moves towards a final solution that opens up traffic from your ingress controller. He talks about how to:
deny all traffic to or from your pods
allow traffic between pods inside a namespace
allow ingress traffic from external sources to your pods
I want to take that technical insight and discuss how we can apply that information to your environment and discuss a practice around applying NetworkPolicies from a governance perspective.
It gets interesting when the development team is ready to deploy their application as Kubernetes pods. Security folks want the application as secure as possible including network access. Network folks want to ensure that networking follows enterprise standards and includes appropriate firewalls to isolate traffic. While developers just want their application to communicate properly amongst its various containers. I will now show you one approach to solving these lofty goals with NetworkPolicies.
Step 1 – Deny All
Like the television commercial chant “Say no, deny the claim!”, I would generally recommend starting with the most restrictive network permissions. New application teams should have a default NetworkPolicy that denies all ingress and egress traffic to all pods within their namespace.
Using the previous NetworkPolicy specification you can achieve total isolation by invoking: kubectl apply -f deny-all-np.yaml.
Let’s break this down. In the previous NetworkPolicy, you will notice that the podSelector is an empty set which means it applies to all pods. Kubernetes NetworkPolicies take a white-list approach. The policyTypes list both Ingress and Egress but does not specify any other attributes which means that it is white-listing nothing; effectively denying everything.
Applied During On-Boarding
This kind of NetworkPolicy would typically be applied to the namespace as it is created during the team’s on-boarding process. The on-boarding process is ideally a self-service portal where a team leader can invoke an automated process which will setup the team members inside of Docker Enterprise with appropriate permission to build, publish, and deploy their Kubernetes application. This should be controlled via automation or by an individual you trust to manually do the right thing. No apps are deployed at this time, but if someone subsequently does deploy an app then the default policy is to deny all traffic.
Step 2 – Allow Egress and Intra-Pod Access
In this second step, there are two parts. First, we remove the egress restriction because our applications will often need to access the outside world. So, in the following specification, you will see Egress has been removed from the list of policyTypes. This will grant your pods network access outside of the pod.
The second part of Step 2 is to allow intra-pod network access. Without this access your pods will not be able to communicate with each other in order to collaborate and produce the desired result. To do this we need to include some selectors under a new label of ingress:. We are essentially allowing ingress traffic from: the namespace selector and the pod selector. This will allow the pods in ns-app1 namespace to communicate with other pods in the same namespace and these pods must have a label of project-app1.
Keep in mind that NetworkPolicies are cumulative and inclusive. So this second NetworkPolicy will ride on top of the first and only open up what is required through the white-listing of ingress rules.
Applied by Pipeline During Deployment
Intra-pod network access could be enabled by the pipeline prior to deployment of the application. It could also have been done as part of the on-boarding process if you know what all your namespaces will be. Either way, it is not in the developers’ hands to establish network policies. Rather, it is a security/network concern and it is typically implemented as part of an automation process.
Step 3 – Allow Ingress
The type of application you are developing will determine if you need ingress traffic from a source external to the cluster. A typical web application has front-end websites that need ingress traffic and the supporting back-end micro-services do not. A mobile application may need ingress access to the API but that would often be handled by an application gateway in which case only the gateway needs ingress traffic.
This NetworkPolicy is adding an inclusive white-list rule which allows traffic from the ingress-nginx namespace. This NetworkPolicy is very specific about the matchLabels of the podSelector to target an individual set of pods that represent the web app that should handle ingress traffic.
Applied by Pipeline or Portal
The sample NetworkPolicy is very specific to this application. It is not generally applied to all applications.
The application of this NetworkPolicy should be handled by either the CI/CD pipeline or by a self-service portal. In both scenarios, the template is pre-defined and controlled outside of developer hands. The pipeline/portal will fill in the appropriate namespace and project name into the template while the developers have the flexibility to specify the ingress namespace.
In this three-step process to achieve application network isolation, I have prescribed that in no way are the developers responsible for or have any control over the NetworkPolicy. Rather, they can have these pre-defined and trusted templates applied to their project in an automated fashion. In the end, the application is isolated and yet has the proper source for ingress traffic.
This will require security, network, firewall, operations, CI/CD, and development teams to all be involved in some coordinated effort to standardize on a corporate strategy for application network isolation. That strategy must involve automation using both CI/CD and self-service portal to apply these NetworkPolicies.
As always, Capstone IT is here to help you with your Kubernetes needs.
Mark Miller Solution Architect Docker Accredited Consultant
Over the last two or three years I’ve given a similar presentation on containers to operations groups at clients, potential clients, conferences and meetups. Generally, they’re just getting started with containers and are wondering what orchestration is and how it impacts them. In this post, I will talk about what container orchestration is and provide several videos with simple examples of what it means.