When configuring a Docker environment before a big rollout to production, it’s important to understand what “zero downtime deployment” really means and if it is what it claims. This requires an understanding of the intricacies of how a rolling update works.
So let’s look at what “zero downtime deployment” actually means. Most people don’t want to specify and arrange a maintenance window and have to kick people off just to roll out a new version of an application. This is a problem that Docker Enterprise Edition solves by providing an approach referred to as “rolling updates.”
Are you expected to own innovation? Our IT solutions team can share your challenges. We empower businesses to evolve, release projects faster, and enhance performance. Tell us how we can help.
Consider a simple service that runs a business application with version 1.2.2 and the service is running with three instances as to handle high traffic volume and also survive crash instances in one or two. A simple Docker stack file would look like the following:
version: '3.4' services: busy: image: acme/busyapp:1.2.2 deploy: replicas: 3
A Docker stack deploy command, for each service in the stack, and for each container in the service, Docker will:
- Stop a container
- Remove the container
- Re-create the container with the new image
Questions and possible trouble caused by timing and sessions attached to the Tomcat server are addressed by the stop_grace_period option. The default time frame is ten seconds. First, Docker stops traffic going to the container and then send a signal to the root process container telling it to shutdown gracefully. Docker will wait for the container to exit up to the amount of time which is specified in the stop_grace_period. If the time period has elapsed and the container has not exited, then it will terminate the container. After it is no longer running, Docker will start a new container using the new image.
The new configuration, with a custom grace period, looks like the following:
version: '3.4' services: busy: image: acme/busyapp:1.2.3 stop_grace_period: 30s deploy: replicas: 3 update_config: delay: 1h parallelism: 1 failure_action: rollback
Based on the simple configuration, this process is repeated two more times before all three containers have been upgraded.
If something more alike a Canary deployment is preferred, or if something goes wrong in the rollout, Docker provides several other configuration options in the Docker reference documentation
To support Canary rollout, all that’s necessary is specifying greater delay, such as an hour, between container rollouts. This allows traffic to proceed as usual to the two original containers as well as the new one with the new business application version 1.2.3. If it all checks out, then Docker will continue after an hour to upgrade the next container.
Also specified within the custom rollout configuration is that Docker should rollback to the previous version (1.2.2) if a container should fail its health check or if it exits upon startup.
Below is an excerpt from the Docker reference documentation which can be found at:
- parallelism: The number of containers to update at a time.
- delay: The time to wait between updating a group of containers.
- failure_action: What to do if an update fails. One of continue, rollback, or pause (default: pause).
- monitor: Duration after each task update to monitor for failure (ns|us|ms|s|m|h) (default 0s).
- max_failure_ratio: Failure rate to tolerate during an update.
- order: Order of operations during updates. One of stop-first (old task is stopped before starting new one), or start-first(new task is started first, and the running tasks will briefly overlap) (default stop-first) Note: Only supported for v3.4 and higher.
Hope this helps build a better understanding of “zero downtime deployments” using Docker’s rolling update capabilities.
Mark Miller
Docker Accredited Consultant
Business Solutions Architect at Capstone Consulting
Docker Accredited Consultant
Business Solutions Architect at Capstone Consulting
Know someone who would be a great fit for one of our opportunities? Refer them to Capstone IT.

Hi Mark, my question is:
I have 1 container running an app, but when I nedd to update my stack, the oldest one stops and the new one is ready but starting the app. How can I put a command where the oldest one gets down after xx seconds (the time needed for the new one start the app).
I have the order start-first, and my stop_grace_period: 50s even with version: 3.4 on stack deploy gave me stop_grace_period Additional property stop_grace_period is not allowed. Tks
we have a Global mode service running and if we add “order: start-first”” option in stack file how at the same time both the containers will access the same port. Because as per our understanding when the new task or container become ready in our case curl that port then old task or container will be shutdown. So at one moment there would be port conflict happened. Or the theory is different.
Interesting question. Deploying 3 replicas or deploying in global mode only differs in the count. In replica mode it is still possible that 2 replicas could end up on the same machine (but typically are spread across machines). In global mode each machine gets only 1 container. That is, until such time as a rolling update with order:start-first causes a new container to spin up first before destroying the old container.
In both scenarios, each container is assigned its own unique IP address on the overlay network. With replicas you would have 3 ip addresses and for global mode you would have 1 IP address for each worker node. When you deploy new image in global mode with order:start-first, then the new container is started with its new ip address and will coexist for a short time with the old container on a separate ip address. Docker continues to manage the virtual IP (VIP) and routes traffic to both old and new containers until such time that it removes the old container and of course updates the list of containers behind the VIP.