After the setup of several Kubernetes clusters, I would like to share how we do it. I hope this helps people to get started with kubernetes. But also I’m keen to read your feedback and improvement proposals.

Terraform examples updated in may 2021

#Versions used:
Terraform v0.15.3 
provider registry.terraform.io/hashicorp/aws v3.40.0
provider registry.terraform.io/hashicorp/cloudinit v2.2.0
provider registry.terraform.io/hashicorp/kubernetes v2.2.0
...

Why Terraform

From time to to time I do explore terraform, in long term since it appearance (See my exploration on AWS automation). Basically, I think this tool (as many other from Hashicorp) just had the right idea at the right time. Let me explain: The nature of the IT infrastructure is more or less static. No surprise, that the declarative approach is very suitable for it. And here terraform creators seem to have a clear vision on this when they evolve terraform language design and elaborated tooling around it. Terraform has become a favorite tool for cloud resource provisioning in many teams.

Meanwhile the concept of “state” finally evolved and found a place in the new hashicorp terraform cloud (with a free tier for small or mid-size projects). It’s very convenient and improves teamwork. Btw. I’m not affiliated with Hashicorp.

Bootstrapping

Ok, how to start with it. Well, I usually start with AWS Sub Account for the project. It makes sense anyway especially if there is no connection to other projects or parts of your systems. See AWS Organization. New Organization has at least one user with enough rights. However, for terraform, I do additional user ’terraform user’ that has sufficient rights to create my environments. Basically, I give him Administrator Access

From this point on we can bootstrap terraform.

Terraform config

# See the reference  https://learn.hashicorp.com/tutorials/terraform/cloud-workspace-configure
provider "aws" {
  region = "us-west-1" 
}

terraform {

  required_providers {
    aws = {
      version = "~> 3.0"
    }

    default_tags {
      tags = {
        managedby = "terraform"
        owner   = "CrazyServiceTeam"
    }
  }
...

This is basically everything for bootstrapping.

This setup assumes you have aws-cli installed and configured1 on your development machine, meaning your local terraform executions are able to connect to AWS API.

Terraform Cloud

Secondly, you have to manage terraform state. In this config it will be hosted and terraform cloud: backend "remote" Provided configuration changes your local terraform workspace to be the remote one.

To be more specific. The terraform cloud can be operated in two ways:

  1. Hosting only your terraform state
  2. Hosting state but also be a single point of the change and the history of that change (full remote operations)

Here I’m talking about the second scenario, where terraform apply is only possible from the cloud UI only. Please also keep in mind that, when using terraform cloud, the terraform plan and other commands will use variable values from the associated Terraform Cloud workspace. So it’s TF-Cloud where you should configure access to your AWS account with the secret key of your terraform IAM user or role.

To sum this up: You need an account at https://app.terraform.io And you’re can enable team members not only to participate in committing terraform code but also for provisioning the infrastructure, without sharing and maintaining admin credentials to the AWS cloud.

The last step is to set up a Workspace and connect your git repository with it. Now you can provision the Infrastructure. In the end, your terraform run of terraform plan and terraform apply will look something like this: and

The final configuration also will look like

# See docu https://learn.hashicorp.com/tutorials/terraform/cloud-workspace-configure
provider "aws" {
  region = "us-west-1" 
}

#https://www.terraform.io/docs/backends/types/remote.html
terraform {

  required_providers {
    aws = {
      version = "~> 3.0"
    }

    default_tags {
      tags = {
        managedby = "terraform"
        owner   = "CrazyServiceTeam"
   }
  }
  # Connenction to the Terraform cloud is happening here
  backend "remote" {
    hostname = "app.terraform.io"
    organization = "YourOrga"

    workspaces {
      name = "your-aws-infa-workspace"
    }
  }
}

Provisioning Network

The very fist step to provision is the Networking layer, here is a example how to do so :

# VPC for Kubernetes and all other cluster related resources

resource "aws_vpc" "main" {
  cidr_block           = "10.100.0.0/16"
  enable_dns_support   = true
  enable_dns_hostnames = true

# I'll started to make use of default tags for all AWS resources 
  tags = {
    Name      = "main"
    managedby = "terraform"
  }
}

The subnets. The number of your Subnets should correspond to the number of AZ in the Region.

## ==== Kubernetes subnets =====

#us-west-2a
resource "aws_subnet" "eks_a" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.100.1.0/24"
  availability_zone = "us-west-2a"
  map_public_ip_on_launch = true

  tags = {
    Name                                               = "EKS, AZ a"
    "kubernetes.io/cluster/${local.cluster_name}"      = "shared"
    "kubernetes.io/role/elb"                           = "1"
    "kubernetes.io/role/internal-elb"                  = "1"
  }
}

#us-west-2b
resource "aws_subnet" "eks_b" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.100.2.0/24"
  availability_zone = "us-west-2b"
  map_public_ip_on_launch = true

  tags = {
    Name                                               = "EKS, AZ b"
    "kubernetes.io/cluster/${local.cluster_name}" = "shared"
    "kubernetes.io/role/elb"                           = "1"
    "kubernetes.io/role/internal-elb"                  = "1"
  }
}

#us-west-2c
resource "aws_subnet" "eks_c" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.100.3.0/24"
  availability_zone = "us-west-2c"
  map_public_ip_on_launch = true

  tags = {
    Name                                               = "EKS, AZ c"
    "kubernetes.io/cluster/${local.cluster_name}" = "shared"
    "kubernetes.io/role/elb"                           = "1"
    "kubernetes.io/role/internal-elb"                  = "1"
  }

}

This is pretty forward, for details consult Terraform Docu on Resource: aws_subnet, for the Kubernetes cluster the provided tags are of interest. The tags are used by AWS EKS to understand where to put automatically requested LoadBalancers. ESK requires special subnet tagging kubernetes.io/role/elb with cluster name. The rest of it is up to you and not many pitfalls here except: map_public_ip_on_launch = true. This is needed because in this scenario I use public sub-nets. EKS Master nodes are managed by AWS and are deployed outside of my VPC while workers inside my VPC need to access their masters. So they need to have public IP addresses. The Fine grained access to the worker nodes is defined by Security Groups later. If it is not suitable for you there is a more defensive option to use private Workers with private sub-nets (not covered here).

Well, and while we establishing the communication with Public Addresses, in the AWS Universe we need the Internet Gateway.

resource "aws_internet_gateway" "igw1" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "main-igw1"
  }
}

resource "aws_route" "route_to_igw1" {
  route_table_id            = "rtb-someId" #haven't found better way than hard coding so far.
  destination_cidr_block    = "0.0.0.0/0"
  gateway_id                =  aws_internet_gateway.igw1.id
  
}

With that basic networking is in place and it’s time for Kubernetes.

Provisioning EKS Cluster

We have good experiences with the official Terraform EKS module. Later versions of it utilize special Terraform Kubernetes provider for the provisioning of cluster Users and roles. So my examples show it.


## 1 Cluster Module starts here.

module "eks_cluster" {
  source                 = "terraform-aws-modules/eks/aws"
  version                = "15.2.0"
  cluster_name           = "${local.cluster_name}"
  cluster_version        = "1.19"
  subnets                = ["${aws_subnet.eks_a.id}", "${aws_subnet.eks_b.id}", "${aws_subnet.eks_c.id}"]
  vpc_id                 = "${aws_vpc.main.id}"
  cluster_create_timeout = "30m" # need to increase module defaults
  write_kubeconfig       = false # Disabled permanent writing of config files
  providers = {
    # Reference to kuberntes provider, see below
    kubernetes = kubernetes.eks_cluster
  }

  manage_aws_auth = true //TODO enable it https://github.com/terraform-aws-modules/terraform-aws-eks/issues/699

  node_groups_defaults = {
    ami_type  = "AL2_x86_64" #alternative is e.g. AL2_x86_64_GPU
    disk_size = 50
  }
 
 
  node_groups = {
    # EKS managed Nodes group with name prefix "ram"
    ram = {
      desired_capacity = 3
      max_capacity     = 10
      min_capacity     = 1

      public_ip = true

      instance_types = ["r5.large"]
      #Labels for nodes and tags
      k8s_labels = {
        node_type = "default"
      }
      # Resource tags are not labels
      additional_tags = {
        environment = "dev"
        managedby   = "terraform"
      }
    }
  }

  #Users
  # Can be checked with: kubectl describe configmap -n kube-system aws-auth
  map_users = [
    {
      userarn  = "${aws_iam_user.my_addtionaluser.arn}"
      username = "${aws_iam_user.my_addtionaluser.name}"
      groups   = ["system:masters"]
    }
  ]

}

## Configuration of kubernetes provides starts here
data "aws_eks_cluster" "eks-cluster" {
  name = module.eks_cluster.cluster_id
}

data "aws_eks_cluster_auth" "eks-cluster" {
  name = module.eks_cluster.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.eks-cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks-cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.eks-cluster.token
  alias                  = "eks_cluster"
}

I think this configuration is more or less self-explanatory. It starts with EKS module, which takes a list of arguments like version, list of users, and List of node groups with details to machines inside of such group. Also, a reference to the Kubernetes provider is present. The configuration is the Kubernetes provider has to be placed here but is basically referring to the cluster. See EKS module documentation for Details.

EKS module alternatives

Of course, there are alternatives to the EKS module. Meanwhile, the AWS EKS has grown (and got a bit simpler), and also terraform *native" resource eks_cluster has evolved as well. I’m interested in your experience with it… Please share if you use it as is without the EKS module.

Updating EKS cluster versions

I’m pretty sure, that I’ve been on EKS with above approach since Kubernetes 1.16 or maybe even 1.15. Currently working with the latest available as EKS (1.19). Upgrades are done with the help of the following step by step checklist. It works for me and also was useful to my colleagues already.

  1. At first, consult AWS ESK Upgrade guide - Additional steps may arise from time to time.
  2. Check for deprecated kubernetes APIs in your deployments. If needed change your YAML files to be compliant with the new APIs.
  3. Having stages. Start with DEV, continue with BETA and LIVE at last
  4. Update EKS Cluster Resource (Control Plane) version via Terraform
    • First check the availability of new EKS Terraform plugin first.
    • Check Changelogs! update ESK module plugin without changing k8s version first. Fix potential problems.
    • Now change Kubernetes version, apply and wait until it’s there
  5. Upgrade CNI Plugin if needed. (see 1.)
  6. Upgrade kube-proxy version if needed (see 1.)
  7. Update coredns if needed (see 1.)
  8. Upgrade Node Groups. You can do it via AWS UI console or create new node group in terraform and delete the old

Recapitulation

Thank you for reading. Now you have some idea how to install the Kubernetes cluster on AWS (AWS EKS). You know how to prepare to terraform for it and what are the major AWS Resources to create. You have even a working code that will provide a cluster if you define Variable ${local.cluster_name} and the additional user correctly… Happy to read about your experiences and improvement proposals!