Lab 3
Overview
So far you have learned how to run applications using docker on your local machine, but what about running dockerized applications in production? There are a number of problems that come with building an application for production: scheduling services across distributed nodes, maintaining high availability, implementing reconciliation, scaling, and logging... just to name a few.
There are several orchestration solutions out there that help you solve some of these problems. One example is the IBM Kubernetes Service which uses Kubernetes to run containers in production.
Before we introduce you to Kubernetes, we will teach you how to orchestrate applications using Docker Swarm. Docker Swarm is the orchestration tool that comes built-in to the Docker Engine.
We will be using a few Docker commands in this lab. For full documentation on available commands check out the Docker official documentation.
Prerequisites
In order to complete a lab about orchestrating an application that is deployed across multiple hosts, you need... well, multiple hosts. To make things easier, for this lab we will be using the multi-node support provided by Play with Docker. This is the easiest way to test out Docker Swarm, without having to deal with installing docker on multiple hosts yourself.
Step 1: Create your first swarm
In this step, we will create our first swarm using play-with-docker.
Navigate to Play with Docker
Click "add new instance" on the lefthand side three times to create three nodes
Our first swarm cluster will have three nodes.
Initialize the swarm on node 1
You can think of docker swarm as a special "mode" that is activated by the command:
docker swarm init
. The--advertise-addr
specifies the address in which the other nodes will use to join the swarm.This
docker swarm init
command generates a join token. The token makes sure that no malicious nodes join our swarm. We will need to use this token to join the other nodes to the swarm. For convenience, the output includes the full commanddocker swarm join
which you can just copy/paste to the other nodes.On both node2 and node3, copy and run the
docker swarm join
command that was outputted to YOUR console by the last command.You now have a three node swarm!
Back on node1, run
docker node ls
to verify your 3 node cluster.This command outputs the three nodes in our swarm. The * next to the ID of the node represents the node that handled that specific command (
docker node ls
in this case).Our node consists of 1 manager node and 2 workers nodes. Managers handle commands and manage the state of the swarm. Workers cannot handle commands and are simply used to run containers at scale. By default, managers are also used to run containers.
All
docker service
commands for the rest of this lab need to be executed on the manager node (Node1).Note: While we will control the Swarm directly from the node in which its running, you can control a docker swarm remotely by connecting to the docker engine of the manager via the remote API or activating a remote host from your local docker installation (using the
$DOCKER_HOST
and$DOCKER_CERT_PATH
environment variables). This will become useful when you want to control production applications remotely instead of ssh-ing directly into production servers.
Step 2: Deploy your first service
Now that we have our 3 node Swarm cluster initialized, let's deploy some containers. To run containers on a Docker Swarm, we want to create a service. A service is an abstraction that represents multiple containers of the same image deployed across a distributed cluster.
Let's do a simple example using Nginx. For now we will create a service with just 1 running container, but we will scale up later.
Deploy a service using Nginx
This above statement is declarative, and docker swarm will actively try to maintain the state declared in this command unless explicitly changed via another
docker service
command. This behavior comes in handy when nodes go down, for example, and containers are automatically rescheduled on other nodes. We will see a demonstration of that a little later on in this lab.The
--mount
flag is a neat trick to have nginx print out the hostname of the node it's running on. This will come in handy later in this lab when we start load balancing between multiple containers of nginx that are distributed across different nodes in the cluster, and we want to see which node in the swarm is serving the request.We are using nginx tag "1.12" in this command. We will demonstrate a rolling update with version 1.13 later in this lab.
The
--publish
command makes use of the swarm's built in routing mesh. In this case port 80 is exposed on every node in the swarm. The routing mesh will route a request coming in on port 80 to one of the nodes running the container.Inspect the service
You can use
docker service ls
to inspect the service you just created.Check out the running container of the service
To take a deeper look at the running tasks, you can use
docker service ps
. A task is yet another abstraction using in docker swarm that represents the running instances of a service. In this case, there is a 1-1 mapping between a task and a container.If you happen to know which node your container is running on (you can see which node based on the output from
docker service ps
), you can usedocker container ls
to see the container running on that specific node.Test the service
Because of the routing mesh, we can send a request to any node of the swarm on port 80. This request will be automatically routed to the one node that is running our nginx container.
Try this on each node:
Curling will output the hostname where the container is running. For this example, it is running on "node1", but yours might be different.
Step 3: Scale your service
In production we may need to handle large amounts of traffic to our application. So let's scale!
Update your service with an updated number of replicas
We are going to use the
docker service
command to update the nginx service we created earlier to include 5 replicas. This is defining a new state for our service.As soon as this command is run the following happens:
The state of the service is updated to 5 replicas (which is stored in the swarms internal storage).
Docker swarm recognizes that the number of replicas that is scheduled now does not match the declared state of 5.
Docker swarm schedules 5 more tasks (containers) in an attempt to meet the declared state for the service.
This swarm is actively checking to see if the desired state is equal to actual state, and will attempt to reconcile if needed.
Check the running instances
After a few seconds, you should see that the swarm did its job, and successfully started 9 more containers. Notice that the containers are scheduled across all three nodes of the cluster. The default placement strategy that is used to decide where new containers are to be run is "emptiest node", but that can be changed based on your need.
Send a bunch of requests to localhost:80
The
--publish 80:80
is still in effect for this service, that was not changed when we randocker service update
. However, now when we send requests on port 80, the routing mesh has multiple containers in which to route requests to. The routing mesh acts as a load balancer for these containers, alternating where it routes requests to.Let's try it out by curling multiple times. Note, that it doesn't matter which node you send the requests. There is no connection between the node that receives the request, and the node that that request is routed to.
You should see which node is serving each request because of the nifty --mount
command we used earlier.
Limits of the routing Mesh The routing mesh can only publish one service on port 80. If you want multiple services exposed on port 80, then you can use an external application load balancer outside of the swarm to accomplish this.
Check the aggregated logs for the service
Another easy way to see which nodes those requests were routed to is to check the aggregated logs. We can get aggregated logs for the service using
docker service logs [service name]
. This aggregates the output from every running container, i.e. the output fromdocker container logs [container name]
.Based on these logs we can see that each request was served by a different container.
In addition to seeing whether the request was sent to node1, node2, or node3, you can also see which container on each node that it was sent to. For example
nginx1.5
means that request was sent to container with that same name as indicated in the output ofdocker service ps nginx1
.
Step 4: Rolling Updates
Now that we have our service deployed, let's demonstrate a release of our application. We are going to update the version of Nginx to version "1.13". To do this update we are going to use the docker service update
command.
This will trigger a rolling update of the swarm. Quickly type in docker service ps nginx1
over and over to see the updates in real time.
You can fine tune the rolling update using these options:
--update-parallelism
will dictate the number of containers to update at once. (defaults to 1)--update-delay
will dictate the delay between finishing updating a set of containers before moving on to the next set.
After a few seconds, run docker service ps nginx1
to see all the images have been updated to nginx:1.13.
You have successfully updated your app to the latest version of nginx!
Step 5: Reconciliation
In the previous step, we updated the state of our service using docker service update
. We saw Docker Swarm in action as it recognized the mismatch between desired state and actual state, and attempted to solve this issue.
The "inspect->adapt" model of docker swarm enables it to perform reconciliation when something goes wrong. For example, when a node in the swarm goes down it might take down running containers with it. The swarm will recognize this loss of containers, and will attempt to reschedule containers on available nodes in order to achieve the desired state for that service.
We are going to remove a node, and see tasks of our nginx1 service be rescheduled on other nodes automatically.
For the sake of clean output, first create a brand new service by copying the line below. We will change the name, and the publish port to avoid conflicts with our existing service. We will also add the
--replicas
command to scale the service with 5 instances.On Node1, use
watch
to watch the update from the output ofdocker service ps
. Note "watch" is a linux utility and might not be available on other platforms.This should result in a window that looks like this:
Click on Node3, and type the command to leave the swarm cluster.
This is the "nice" way to leave the swarm, but you can also kill the node and the following behavior will be the same.
Click on Node1 to watch the reconciliation in action. You should see that the swarm will attempt to get back to the declared state by rescheduling the containers that were running on node3 to node1 and node2 automatically.
Number of nodes
In this lab, our Docker Swarm cluster consists of one master, and two worker nodes. This configuration is not highly available. The manager node contains the necessary information to manage the cluster, so if this node goes down, the cluster will cease to function. For a production application, you will want to provision a cluster with multiple manager nodes to allow for manager node failures.
For manager nodes you want at least 3, but typically no more than 7. Managers implement the raft consensus algorithm, which requires that more than 50% of the nodes agree on the state that is being stored for the cluster. If you don't achieve >50%, the swarm will cease to operate correctly. For this reason, the following can be assumed about node failure tolerance.
3 manager nodes tolerates 1 node failure
5 manager nodes tolerates 2 node failures
7 manager nodes tolerates 3 node failures
It is possible to have an even number of manager nodes, but it adds no value in terms of the number of node failures. For example, 4 manager nodes would only tolerate 1 node failure, which is the same tolerance as a 3 manager node cluster. The more manager nodes you have, the harder it is to achieve a consensus on the state of a cluster.
While you typically want to limit the number of manager nodes to no more than 7, you can scale the number of worker nodes much higher than that. Worker nodes can scale up into the 1000's of nodes. Worker nodes communicate using the gossip protocol, which is optimized to be highly performant under large traffic and a large number of nodes.
If you are using Play with Docker, you can easily deploy multiple manager node clusters using the built in templates. Click the templates icon in the upper left to see what templates are available.
Summary
In this lab, you got an introduction to problems that come with running container with production such as scheduling services across distributed nodes, maintaining high availability, implementing reconciliation, scaling, and logging. We used the orchestration tool that comes built-in to the Docker Engine- Docker Swarm, to address some of these issues.
Key Takeaways:
The Docker Swarm schedules services using a declarative language. You declare the state, and the swarm attempts to maintain and reconcile to make sure the actual state == desired state
Docker Swarm is composed of manager and worker nodes. Only managers can maintain the state of the swarm and accept commands to modify it. Workers have high scability and are only used to run containers. By default managers can run containers as well.
The routing mesh built into swarm means that any port that is published at the service level will be exposed on every node in the swarm. Requests to a published service port will be routed automatically to a container of the service that is running in the swarm.
Many tools out there exist to help solve problems with orchestration containerized applications in production, include Docker Swarm, and the IBM Kubernetes Service.
Last updated