Home Next

6 Months Of Kubernetes In Production

At Movio we have been running Kubernetes in production for more than six months. It promises to simplify management of your container deployments while providing a fast-growing set of features. In this blog post, I will summarize the main features we use in Kubernetes and assess the benefits. However, I’ll not cover all parts of Kubernetes, but only reflect our own usage of the system. Any comments or suggestions are welcomed below.

Movio’s Docker journey started about two years ago when the first containerized applications and microservices started to land on our servers. My previous blog post, ‘Rundeck for Docker builds deployments’, laid out how we used to deploy containers. It’s no surprise that we were using the battle-proven tools such as Puppet, Rundeck, or Jenkins and it all worked fine. However, those tools are not primarily focused on container deployments and as time progressed we encountered difficulties managing the growing number of containers. For example, when scaling up our deployments we had to manage lists of used ports on servers to avoid conflicts, our resource utilization wasn’t the best, and distributing containers across hosts has become quite challenging.

This is where we started to investigate new container deployment tools. At Movio we prefer not to use hosted solutions, so AWS ECS and GCE were out of the game. After spending a few weeks testing Nomad and Kubernetes, Kubernetes came up as the winner. At that time (early 2016) Nomad didn’t provide features such as volume mounting or centralized logging. On the other hand, the simplicity of Nomad setup was its biggest advantage. That is something you should be aware of if you decide to run your own hosted version of Kubernetes you will probably want to understand its internals. This is quite an extensive job, but the added value of Kubernetes features is worth it.

Let’s go through the main features as laid out on the Kubernetes website, discussing their ease of use and our experience with them.

Deploy your applications quickly and predictably

We use self-describing YAML (or JSON) files and Kubernetes CLI tool (kubectl) for deploying all pieces of our applications in Kubernetes. This makes it easy to version control our deployments as well as integrate them with other tools. Kubernetes ships with a management UI but that makes it harder to keep track of changes. One nice feature of Kubernetes is that its CLI can return the exact specification of any running deployment in a format you specify, so you can easily check which version is currently deployed.

Scale your applications on the fly

This builds on top of the previous feature. An easy change in your deployment file from replica=1 to replica=5 does the trick. No need to worry about port conflicts or load balancing traffic between the new replicas, as the Kubernetes service layer handles that for you. Of course your application must be prepared for running multiple instances; for example, it must gracefully respond to changes in data storage or application state during restarts. You need to be aware of some restrictions when scaling up and down, especially if you are using volumes with your containers - some types of volumes can not be mounted to more hosts at the same time. But once your application is ready to be scaled this feature makes it really easy.

Seamlessly roll out new features

As with scaling, rolling updates consist of a simple change in your deployment files (eg changing image tag) and applying those changes against the Kubernetes cluster. The command to initiate a rolling update is simple:  kubectl apply -f <your-file>.yaml. Kubernetes then takes care of gradually restarting your containers with the new version. You can adjust many settings of rolling updates such as wait intervals between individual upgrades or the minimum number of running instances during a rolling update.

Optimize use of your hardware by using only the resources you need

This might get a little bit tricky. On our US cluster we are running over 160 containers on 8 hosts. So yes, we get much better density of running containers per host, and if you are properly using resource requests and limits when specifying your deployments then Kubernetes will ensure equal distribution of the load across your cluster and also refuse to deploy new containers when there is not enough resources. However, with the increasingly more ephemeral nature of hosts in the cloud environments you want to make sure there is always enough spare resources in your cluster to handle a host failure,or even better to handle a whole availability zone failure. But, running so many spare resources quickly becomes expensive, so you will want to use some automated processes to scale up and down your Kubernetes cluster based on resource utilization. You can achieve that through other tools and cloud providers services, but hopefully such a feature will soon make it into Kubernetes itself.

Portable

In many features I mention in this post I am assuming Kubernetes will handle the integration with the cloud provider for me, providing host or disk provisioning, firewalls, etc. However, I still want to be able to move my Kubernetes cluster between these providers. These requirements seem contradictory, yet I think Kubernetes provides a very nice balanced approach, where the core functionality and most common features will work on any provider. Only a specific set of features require a supported provider, and you can always choose if you want to rely on these integrations in your deployments or not.

Extensible

The philosophy of replaceable and extensible parts of the system seems to work well. You can even write your own scheduler or controller and replace the default one. However, it’s more likely you will be only replacing, extending, or adjusting the add-ons which you want to use within your cluster. There are new community add-ons appearing every day and we are using a bunch of them (DNS, monitoring, logging) which we have customized for our needs. For example, we customize the Fluentd log collector. By default, it collects logs from containers stdout. We have customized it to also serve as a syslog server for applications which require a syslog server inside the Kubernetes cluster. All the logs now show up in one centralized place. Another significant customization we have done is on traffic ingestion from outside the cluster. Kubernetes offers several ways of handling incoming traffic either by integrating with a cloud provider’s load-balancers or through the built-in ingress module. However, we have found those options not flexible enough for our usage, thus we have deployed our own traffic ingestion module based on Nginx which plays nicely with other parts of the system.

Mounting storage

Kubernetes supports a wide range of storage solutions which you can mount into your containers as volumes (AWS EBS, GCE PD, NFS, GlusterFS, Git repo etc). Our clusters run on AWS so we use EBS. The EBS integration works fine most of the time, but there are still some edge cases which need polishing, such as re-mounting an EBS from a failed node. I expect the integrations to become mature enough over the time, but for now I would suggest your applications don’t use local storage and instead leverage a distributed data store outside the cluster.

Distributing secrets

This feature is in its infancy. You can create “secrets” (eg your DB passwords) in Kubernetes and mount them into your containers as files or environment variables. However, the secrets are currently just base64 encoded, so anyone with access to the cluster can read the content of the secret and decode the value. I am sure this feature will evolve further soon.

Health checking and self-healing

The internal health checking feature inside the cluster works very well. You can currently choose between HTTP and TCP checks and adjust the frequency and other settings. Kubernetes will then periodically check your container. Kubernetes will automatically restart your container if a health check fails. If restarting doesn’t help, Kubernetes will try to restart again while increasing the wait time between restarts. The feature lets you see the reason for the failed health check, which helps debugging. When a whole node fails in the cluster Kubernetes will redistribute all containers across the remaining nodes.

Load balancing

As mentioned in the scaling feature, Kubernetes introduces a virtual network layer inside the cluster called services. Services, together with the DNS add-on, create a powerful way of load balancing internal traffic between containers within the cluster. Containers can reference other containers with a service name (provided the containers are part of a service) without the need to change this reference when you scale the referenced service up and down.

Load balancing traffic from outside the cluster is a different story. Kubernetes can automatically provision a load balancer for each of your services in your cloud environment. Yet in my opinion creating a load balancer for every service in your cluster might become costly and hard to keep track of. That is why we ended up creating our own service inside the cluster which serves as an entry point into the cluster and routes the traffic to all other services in the cluster. I believe this part of Kubernetes should get more attention in future and offer an easier way for ingressing traffic in the cluster, together with features like throttling requests or routing requests based on SSL certificates.

Resource monitoring

The default resource monitoring add-on using Heapster, InfluxDB, and Grafana works pretty well, although it doesn’t provide very detailed insights nor does it allow you to query the collected stats in many ways. Fortunately, as with all other parts of Kubernetes it is replaceable. Using a Prometheus exporter bundled with our containers and having a Prometheus instance running inside the cluster collecting those stats is working quite well for us.

Log access and ingestion

The default logging add-on using Fluentd, Elasticsearch, and Kibana works well, although it takes some effort to customize it for your needs. Depending on your underlying infrastructure you will need to tell Fluentd where to look for interesting log files and how to parse them. By default, all logs from containers stdout are ingested. All the logs are also labeled with Kubernetes metadata (namespace, container name, etc) which makes them easy to filter through. Also, making your applications log in JSON format gives you the great advantage of Elasticsearch being able to parse your logs and allows you to run detailed queries. If, for example, your application requires a syslog server to send logs to, you might want to extend the Fluentd container with a syslog plugin where you can then send your logs from anywhere in the cluster. As you can see getting centralized logging up and running and fine-tuning it might become time-consuming. One feature you might also like is the kubectl logs command which allows you to get stdout logs of a container running anywhere in the cluster straight from your machine.

Debugging

Debugging issues in an ever changing environment might prove challenging. Getting resource monitoring and centralized logging running helps a lot. Kubernetes provides you with a couple of other options for debugging your containers. One of them is the kubectl exec command which allows you to run commands inside your containers remotely from your machine, much like docker exec does. Or the kubectl proxy and kubectl port-forward commands let you access your applications directly from your workstation.

Identity and authorization

This feature is paying the price for Kubernetes being a relatively young product. Authentication based on client certificates when accessing the cluster works fine. Yet authorization is still in an experimental phase. This is probably fine for small teams where the separation doesn’t need to be so strict, but might be an issue for larger companies.

Summary

Having gone through the main features of Kubernetes it delivers on most of them in my opinion. As with every new and complex product there are many areas which need improvement, but the benefits in faster “time to production” are significant. If you want to learn more about using Kubernetes, see some real-world examples and live demos, come to my talk at  DevOps Days conference in Wellington 29 September.

Find out more about joining the Movio Dev Team.

Share This Story, Choose Your Platform!

blog comments powered by Disqus