What does Kubernetes (EKS) production cluster look like ?

Jeewan Sooriyaarachchi
7 min readJul 2, 2022

--

Having worked on AWS EKS K8s production systems for few companies, I would like to share my experience on this interesting technology.

I am trying to include all the components within this diagram but It does look very complex. Perhaps K8s is complex :) but it has many upsides than other platforms around.

Main purpose of this article is providing overview of all the components glued together to make a production Kubernetes platform. This is focusing on AWS EKS managed K8s cluster but it is applicable for other platforms as well. Its a more of introducing all the tools running on the cluster instead of deep dive into any of them.

Architecture Diagram

Architecture Diagram

Deploying Infrastructure

IaC

You can use any IaC tool for creating AWS infrastructure. I have used Terraform in this case to bringup AWS

  • Application Load Balancer
  • Network Load Balancer
  • Node Groups with Autoscaling groups and Target groups
  • IAM Roles for K8s service accounts
  • EKS cluster
  • KMS Key

This is a typical way of bringing up AWS infrastructure using Terraform which I am not going to explain in detail here. Apart from above infrastructure, deploying ArgoCD and SOPS operator as part of cluster bootstrap is ideal. Because, ArgoCD is the K8s deployment tool used for creating all other K8s resources within EKS. Hence, getting the ArgoCD up and running during the cluster bootstrap will greatly ease the deployment of other resources. ArgoCD and SOPS operator’s K8s manifests files can be converted to Terraform HCL format using tools like k2tf and deploy them using Terraform. Then use terraform to deploy them.

External Interfaces

  • All the Web applications are exposed to internet using the Cloudfront
  • AWS Web access firewall is attached to Cloudfront with flood protection, IP whitelisting and other security features
  • All API calls are routed through the AWS API Gateway which is again attached with WAF for above mentioned security features
  • There is a Application Load Balancer behind the Cloudfront which connects the node groups of EKS cluster
  • Similarly, there is a Network Load balancer behind the API Gateway which also connects to the node group of EKS cluster

Node Groups

You can notice multiple node groups represented in the bottom of the diagram.

  • Edge nodes — Running Ingress services such as Istio Ingress daemonset
  • System nodes — Deploying supporting services for entire cluster such as autoscaler, monitoring, etc
  • Worker nodes — This is where most of the application workloads will be running which are typically K8s deployments
  • GPU nodes — If you need K8s nodes with accelerated GPUs, then you can create separate node group for GPU nodes
  • Storage nodes — Having separate node group for statefulsets is not necessary but it has some benefits doing so. For example, you don’t have to run EBS CSI driver on all the node groups. Also you can restrict which node groups can accessstorage node group.

Each node group is created with 3 availability zone (AZ) for high availabilty. However, EBS storages cannot mounted across multiple zones, hence it is configured with single AZ.

  • Services running on each node group cannot access other node groups unless specifically we allow it using security groups
  • ALB and NLB is only connected to Edge node group so that other node groups are further hidden from Load balancers for security reasons

If you need further isolations between frontend and backend services, you may create separate node groups for that as well. It all depend on your workloads and how far you want to fine tune it. If you don’t have big workloads, you might want to run it in single node group to keep it simple.

Persistent Storages

Recommended to use fully managed datasore like RDS instead of running databases within K8s cluster. However, there are scenarios, you may want to host datasotores within K8s cluster for performance or cost saving purposes. In that case, you will have to install EBS CSI Driver or EFS CSI Driver to faciliate AWS EBS or EFS storages to the K8s cluster. Those datastores should be run as statefulsets and host them on different node group if required.

ArgoCD

This is the continuous delivery tool playing major role witin the K8s cluster. It allows to continuously monitor a SCM/GIT repository and deploy those K8s manifests to the K8s cluster. What you all have to do is update the Git repository and rest will taken care by the ArgoCD. As mentioned before, thats why it is very important to bring up Argo during cluster bootstrap so that rest of the tools can be installed using Argo.

Argo configured to install all the supporting tools like Istio, Cluster autoscaler, monitoring and etc as well as all other applications running on the cluster

Helm

Helm is most popualar templating tool to arrange K8s manifests in much more simpler and developer friendly format. You can use other templating tools such as Kustomize, jsonnet as well.

Supporting tools

These are the supporting tools running within the cluster which you can notice on top of the system nodes of above diagram. Although they are shown on top of system nodes in above diagram, some of them running as daemonset on all the node groups.

Metrics server

Collects the CPU and memory utilization of pods running on K8s cluster and provides it for cluster autoscaler.

Cluster Autoscaler

Cluster Autoscaler calculate all the resources required for pods running in the cluster and adjust the number of nodes running on the cluster. It is configured with a IAM role that has permission to adjust the desired number of nodes to run on AWS autoscaling group. It will increase and decrease number of nodes running on each autoscaling group based on cluster workloads.

EBS CSI Driver

This is a daemonset that has to run on nodegroups that will be hosting statefulsets. It allows access to AWS EBS storages from pods running in the cluster. Equivalent for EFS is EFS CSI dirver.

Node Termination handler

If you are planning to run AWS SPOT instances to save the cost, you will requires a way to gracefully shutdown pods before AWS reclaim back spot instances. AWS provides 2 mins notice to drain the SPOT instance. This tool exactly does that. This is a daemonset running on all the node groups.

Monitoring tools

This is another important tool for observability of the EKS workloads. It can be Grafana, Datadog, splunk or any other tool. It basically provides logs and metrics forwarding to a dashboard. This is typically a daemonset running on all the node groups.

SOPS Operator

Since we were following GITOPs for all the deployments and configurations within EKS cluster, we wanted to use the same for managing Secrets / Credentials of applications running in the EKS cluster. SOPS allows to manage secrets in GITOPs way and check in them to Git repository without exposing actual values behind it. SOPS use AWS KMS for encrypting and decrypting secrets.

Istio

Install istio daemon for installing rest of the istio configurations. And Istio ingress gateway proxy that will be running on Edge nodes listening for incoming traffic. Istio can be used for enabling encryption for pod to pod communication within the cluster. It help to build up zero trust network.

Nvidia device plugin

This driver should be installed on nodes that has GPUs installed. It allows the pods to access GPU on those nodes.

Application workloads

Application workloads are configured to run on worker and gpu nodes. Diagram illustrate example for single application running on the cluster and all the other K8s resources required to run the application.

Deployments

Typically application workloads run as K8s deployments.

Services

Deployments are exposed using cluster IP services

Virtual Services

Configure this if service has to be exposed to external. It can be a TCP or HTTP traffict

Destination rule

Group the different versions of same application for canary deployments. This is part of Istio Ingress configuration.

Gateway

Istio GW that will be listening for the incoming traffic for that particular service/s. Its part of the Istio Ingress configuration

HPA

Horizontal pod autoscaler allows you to configure min and max number of pods you should be running for each deployment / application. Then HPA automatically increase and decrease number of pods based on CPU and memory utilization.

GPU workloads

These are application workloads that require GPUs. It should be running on nodes with GPU and GPU driver (nvidia-device-plugin) installed.

Gotchas

  • Make sure you have enough workloads to host on EKS cluster. As explained above, there are lots of supporting tools to bring up and maintain for EKS cluster. So make sure you have enough workloads to host on K8s. The more workloads you have will save more cost and provides benefits of using EKS.
  • Make sure Pod resource requests and limits configured for all the workloads including supporting tools. If you don’t do this, K8s cannot schedule pods based on resource requirements and you will run into CPU/memory issues of nodes.
  • Be cautious about daemonsets running on the cluster. Because, if you run many daemonsets on all the nodes, you will be wasting resources for them. For an example, if you have only 5 statefulsets that require EBS storage but you run your EBS CSI driver daemonset on all the ndoes (ex 40 nodes). its best to create separate node group with around 3–5 nodes and run CSI driver daemonset on those nodes.
  • Configure Taints for different node groups. Then add node Affinity rules and tolerations to make sure pods are running in intended node groups.
  • EBS storage class is only accessible of single availabilty zone. So try to use fully managed high available solutions like RDS for datastores.
  • You can save a lot by running spot instances but it depends on your workloads too. Better to use mix of ondemand and spot instances for cost vs availabilty balance.
  • Continuously monitor the cluster and adjust the pods resource requests and limits and fine tune them. Also, decide the best type of instances for each node group to optimize cost
  • Configure tags and budget alerts to ensure you are within the budge limits.
  • Apart from AWS costs, take into consideration of other costs such as cost for monitoring tool. Some of those monitoring tools charge by number of nodes and it can be expensive than AWS instance cost.

That concludes the overview of production EKS cluster setup. Please note there are other tools and other ways to achieve the same goal. I would like to hear your feedback, other alternatives and suggestions on this topic. Thanks for reading.

--

--