Overcoming Difficulties with a Google Kubernetes Engine (GKE) Cluster in Terraform

Google Kubernetes Engine (GKE) is a powerful platform for managing and deploying containerised applications. It provides a managed Kubernetes environment, making it easy to run, manage, and scale your applications. With Terraform, you can automate creating and managing your GKE cluster, making it easier and faster to get started.

However, working a GKE cluster with Terraform can be challenging, especially if you are new to either tool or trying to do something more advanced. We’ll walk you through and help you overcome some common difficulties with a GKE cluster with Terraform.

Deploying a GKE cluster with Terraform

The Terraform configuration itself for deploying a simple GKE cluster is relatively straightforward. You’ll need to define a provider for GCP, a GCP project, and a resource for the GKE cluster. Here’s an example configuration of a simple GKE cluster in Terraform :

Google Terraform Provider Config

provider "google" {
    project = "my-gcp-project-id"
    region  = "europe-west6"
    zone    = "europe-west6a"
}

GKE Cluster

resource "google_container_cluster" "test-cluster" {
    name                     = "gke-ch-zh-test-cluster"
    location                 = "europe-west6a"
    remove_default_node_pool = true
    initial_node_count       = 1
}

resource "google_container_node_pool" "gke-test-cluster-e2m" {
    name       = "test-cluster-node-e2m"
    location   = "europe-west6a"
    cluster    = google_container_cluster.test-cluster.name
    node_count = 3

    node_config {
        machine_type = "e2-medium"
        oauth_scopes = [
            "https://www.googleapis.com/auth/cloud-platform"
        ]
    }
}

Difficulties with the GCP Terraform provider

kubeconfig

The GCP Terraform provider is quite limited in some aspects, partially due to GCP itself. This is where difficulties can be encountered.

Firstly, the google_container_cluster resource itself cannot output a kubeconfig file, unlike, for example, an Azure Kubernetes Service Terraform resource. This can be annoying for multiple reasons, often if you want to configure a GitOps tool such as ArgoCD or FluxCD as part of your Terraform code or use the Terraform kubectl, helm and kubernetes providers.

The way around this is to use the gke_auth Terraform module, which is then able to export a kubeconfig file for your GKE cluster. However, the kubeconfig file created by gke_auth is only valid for an hour, leading to further issues when Terraform tries to refresh its state and the kubeconfig authentication has expired. You can configure the gke_auth module as such:

module "gke_auth" {
    source               = "terraform-google-modules/kubernetes-engine/google//modules/auth"
    version              = "24.0.0"
    project_id           = "my-gcp-project-id"
    cluster_name         = google_container_cluster.test-cluster.name
    location             = "europe-west6"
    depends_on           = [
        google_container_cluster.test-cluster
    ]
}

You can then configure Terraform providers such as kubectl, kubernetes and helm using attributes from the gke_auth module as such:

provider "kubectl" {
    host                   = module.gke_auth.host
    cluster_ca_certificate = module.gke_auth.cluster_ca_certificate
    token                  = module.gke_auth.token
    load_config_file       = false
}

provider "kubernetes" {
    cluster_ca_certificate = module.gke_auth.cluster_ca_certificate
    host                   = module.gke_auth.host
    token                  = module.gke_auth.token
}

provider helm {
    kubernetes {
        cluster_ca_certificate = module.gke_auth.cluster_ca_certificate
        host                   = module.gke_auth.host
        token                  = module.gke_auth.token
    }
}

If you need the kubeconfig file for anything other than configuring other Terraform providers or resources, it may be easier to create one using the gcloud cli tool rather than dealing with the expiring kubeconfig file created by gke_auth. However, you can renew a gke_auth-provided kubeconfig file by running terraform apply again.

Load balancer

Unless stated otherwise, a GKE cluster deployed with Terraform also deploys a load balancer on GCP. However, there is no way of getting information about a load balancer from the GKE cluster to which it is connected. This can be especially frustrating if you want to use Terraform to update a DNS record containing the load balancer’s external IP address, for example. In fact, there isn’t even a load balancer resource within the GCP Terraform provider! So you cannot easily create a load balancer separately from the GKE cluster and configure the GKE cluster to use it.

The best way that we have found around this, which is unfortunately quite hacky, is to deploy the ingress-nginx Helm chart to a GKE cluster using Terraform, and then load the external load balancer’s IP address into Terraform from ingress-nginx’s Kubernetes service resource, as such:

resource "helm_release" "ingress-nginx" {
    name             = "ingress-nginx"
    repository       = "https://kubernetes.github.io/ingress-nginx"
    chart            = "ingress-nginx"
    version          = "4.2.5"
    namespace        = "ingress-nginx"
    create_namespace = true
    wait             = true
}

data "kubernetes_service" "ingress-nginx-controller" {
    metadata {
        name      = "ingress-nginx-controller"
        namespace = "ingress-nginx"
    }
    depends_on = [
        helm_release.ingress_nginx
    ]
}

The load balancer’s external IP address can then be used in other Terraform resources, such as DNS records, as such:

resource "google_dns_record_set" "subdomain-wildcard" {
    name         = "*.${data.google_dns_managed_zone.test.dns_name}"
    managed_zone = data.google_dns_managed_zone.test.name
    type         = "A"
    ttl          = 300
    rrdatas      = [ data.kubernetes_service.ingress-nginx-controller.status.0.load_balancer.0.ingress.0.ip ]
}

Although this approach works, it would be much better if the GCP Terraform provider allowed you to read information about the associated load balancer from a google_container_cluster resource.

Other Common Difficulties Encountered

Authentication Issues: One of the common difficulties you may encounter when deploying a GKE cluster with Terraform is authentication issues, especially when using GCP Service Accounts. Ensure you have the correct credentials for your Google Cloud account and set up Terraform correctly.

Incorrect Configuration: Another common issue is an incorrect Terraform configuration. Double-check your configuration and make sure that all the required parameters are set correctly.

Resource Limit Exceeded: If you exceed the resource limits for your Google Cloud account, you may encounter errors when deploying the GKE cluster. Ensure you have enough resources available for your cluster.

GCP APIs: You may be prompted to enable various APIs for GCP through your browser after running terraform apply.

Networking Issues: Networking issues can also cause difficulties when deploying a GKE cluster with Terraform. Check that your network is configured correctly and that no firewalls block access to the Google Cloud APIs.

In this blog post, we’ve walked you through some common issues encountered when deploying a GKE cluster with Terraform and helped you overcome some of the common difficulties. If you know of any better workarounds or want to tell us about other common issues you have encountered, let us know on Twitter at @cloudcoverch.