When configuring a Docker environment before a big rollout to production, it’s important to understand what “zero downtime deployment” really means and if it is what it claims. This requires an understanding of the intricacies of how a rolling update works.
So let’s look at what “zero downtime deployment” actually means. Most people don’t want to specify and arrange a maintenance window and have to kick people off just to roll out a new version of an application. This is a problem that Docker Enterprise Edition solves by providing an approach referred to as “rolling updates.”
Consider a simple service that runs a business application with version 1.2.2 and the service is running with three instances as to handle high traffic volume and also survive crash instances in one or two. A simple Docker stack file would look like the following:
A Docker stack deploy command, for each service in the stack, and for each container in the service, Docker will:
- Stop a container
- Remove the container
- Re-create the container with the new image
Questions and possible trouble caused by timing and sessions attached to the Tomcat server are addressed by the stop_grace_period option. The default time frame is ten seconds. First, Docker stops traffic going to the container and then send a signal to the root process container telling it to shutdown gracefully. Docker will wait for the container to exit up to the amount of time which is specified in the stop_grace_period. If the time period has elapsed and the container has not exited, then it will terminate the container. After it is no longer running, Docker will start a new container using the new image.
The new configuration, with a custom grace period, looks like the following:
Based on the simple configuration, this process is repeated two more times before all three containers have been upgraded.
If something more alike a Canary deployment is preferred, or if something goes wrong in the rollout, Docker provides several other configuration options in the Docker reference documentation
To support Canary rollout, all that’s necessary is specifying greater delay, such as an hour, between container rollouts. This allows traffic to proceed as usual to the two original containers as well as the new one with the new business application version 1.2.3. If it all checks out, then Docker will continue after an hour to upgrade the next container.
Also specified within the custom rollout configuration is that Docker should rollback to the previous version (1.2.2) if a container should fail its health check or if it exits upon startup.
Below is an excerpt from the Docker reference documentation which can be found at:
- parallelism: The number of containers to update at a time.
- delay: The time to wait between updating a group of containers.
- failure_action: What to do if an update fails. One of continue, rollback, or pause (default: pause).
- monitor: Duration after each task update to monitor for failure (ns|us|ms|s|m|h) (default 0s).
- max_failure_ratio: Failure rate to tolerate during an update.
- order: Order of operations during updates. One of stop-first (old task is stopped before starting new one), or start-first(new task is started first, and the running tasks will briefly overlap) (default stop-first) Note: Only supported for v3.4 and higher.
Hope this helps build a better understanding of “zero downtime deployments” using Docker’s rolling update capabilities.
Docker Accredited Consultant
Business Solutions Architect at Capstone Consulting