Overcoming Difficulties with a Google Kubernetes Engine (GKE) Cluster in Terraform
Google Kubernetes Engine (GKE) is a powerful platform for managing and deploying containerised applications. It provides a managed Kubernetes environment, making it easy to run, manage, and scale your applications. With Terraform, you can automate creating and managing your GKE cluster, making it easier and faster to get started.
However, working a GKE cluster with Terraform can be challenging, especially if you are new to either tool or trying to do something more advanced. We’ll walk you through and help you overcome some common difficulties with a GKE cluster with Terraform.
Deploying a GKE cluster with Terraform
The Terraform configuration itself for deploying a simple GKE cluster is relatively straightforward. You’ll need to define a provider for GCP, a GCP project, and a resource for the GKE cluster. Here’s an example configuration of a simple GKE cluster in Terraform :
Google Terraform Provider Config
provider "google" {
project = "my-gcp-project-id"
region = "europe-west6"
zone = "europe-west6a"
}
GKE Cluster
resource "google_container_cluster" "test-cluster" {
name = "gke-ch-zh-test-cluster"
location = "europe-west6a"
remove_default_node_pool = true
initial_node_count = 1
}
resource "google_container_node_pool" "gke-test-cluster-e2m" {
name = "test-cluster-node-e2m"
location = "europe-west6a"
cluster = google_container_cluster.test-cluster.name
node_count = 3
node_config {
machine_type = "e2-medium"
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
}
}
Difficulties with the GCP Terraform provider
kubeconfig
The GCP Terraform provider is quite limited in some aspects, partially due to GCP itself. This is where difficulties can be encountered.
Firstly, the google_container_cluster
resource itself cannot output a kubeconfig
file, unlike, for example, an Azure Kubernetes Service Terraform resource. This can be annoying for multiple reasons, often if you want to configure a GitOps tool such as ArgoCD or FluxCD as part of your Terraform code or use the Terraform kubectl
, helm
and kubernetes
providers.
The way around this is to use the gke_auth Terraform module, which is then able to export a kubeconfig
file for your GKE cluster. However, the kubeconfig
file created by gke_auth
is only valid for an hour, leading to further issues when Terraform tries to refresh its state and the kubeconfig
authentication has expired. You can configure the gke_auth
module as such:
module "gke_auth" {
source = "terraform-google-modules/kubernetes-engine/google//modules/auth"
version = "24.0.0"
project_id = "my-gcp-project-id"
cluster_name = google_container_cluster.test-cluster.name
location = "europe-west6"
depends_on = [
google_container_cluster.test-cluster
]
}
You can then configure Terraform providers such as kubectl
, kubernetes
and helm
using attributes from the gke_auth
module as such:
provider "kubectl" {
host = module.gke_auth.host
cluster_ca_certificate = module.gke_auth.cluster_ca_certificate
token = module.gke_auth.token
load_config_file = false
}
provider "kubernetes" {
cluster_ca_certificate = module.gke_auth.cluster_ca_certificate
host = module.gke_auth.host
token = module.gke_auth.token
}
provider helm {
kubernetes {
cluster_ca_certificate = module.gke_auth.cluster_ca_certificate
host = module.gke_auth.host
token = module.gke_auth.token
}
}
If you need the kubeconfig
file for anything other than configuring other Terraform providers or resources, it may be easier to create one using the gcloud
cli tool rather than dealing with the expiring kubeconfig
file created by gke_auth
. However, you can renew a gke_auth
-provided kubeconfig
file by running terraform apply
again.
Load balancer
Unless stated otherwise, a GKE cluster deployed with Terraform also deploys a load balancer on GCP. However, there is no way of getting information about a load balancer from the GKE cluster to which it is connected. This can be especially frustrating if you want to use Terraform to update a DNS record containing the load balancer’s external IP address, for example. In fact, there isn’t even a load balancer resource within the GCP Terraform provider! So you cannot easily create a load balancer separately from the GKE cluster and configure the GKE cluster to use it.
The best way that we have found around this, which is unfortunately quite hacky, is to deploy the ingress-nginx
Helm chart to a GKE cluster using Terraform, and then load the external load balancer’s IP address into Terraform from ingress-nginx
’s Kubernetes service resource, as such:
resource "helm_release" "ingress-nginx" {
name = "ingress-nginx"
repository = "https://kubernetes.github.io/ingress-nginx"
chart = "ingress-nginx"
version = "4.2.5"
namespace = "ingress-nginx"
create_namespace = true
wait = true
}
data "kubernetes_service" "ingress-nginx-controller" {
metadata {
name = "ingress-nginx-controller"
namespace = "ingress-nginx"
}
depends_on = [
helm_release.ingress_nginx
]
}
The load balancer’s external IP address can then be used in other Terraform resources, such as DNS records, as such:
resource "google_dns_record_set" "subdomain-wildcard" {
name = "*.${data.google_dns_managed_zone.test.dns_name}"
managed_zone = data.google_dns_managed_zone.test.name
type = "A"
ttl = 300
rrdatas = [ data.kubernetes_service.ingress-nginx-controller.status.0.load_balancer.0.ingress.0.ip ]
}
Although this approach works, it would be much better if the GCP Terraform provider allowed you to read information about the associated load balancer from a google_container_cluster
resource.
Other Common Difficulties Encountered
Authentication Issues: One of the common difficulties you may encounter when deploying a GKE cluster with Terraform is authentication issues, especially when using GCP Service Accounts. Ensure you have the correct credentials for your Google Cloud account and set up Terraform correctly.
Incorrect Configuration: Another common issue is an incorrect Terraform configuration. Double-check your configuration and make sure that all the required parameters are set correctly.
Resource Limit Exceeded: If you exceed the resource limits for your Google Cloud account, you may encounter errors when deploying the GKE cluster. Ensure you have enough resources available for your cluster.
GCP APIs: You may be prompted to enable various APIs for GCP through your browser after running terraform apply
.
Networking Issues: Networking issues can also cause difficulties when deploying a GKE cluster with Terraform. Check that your network is configured correctly and that no firewalls block access to the Google Cloud APIs.
In this blog post, we’ve walked you through some common issues encountered when deploying a GKE cluster with Terraform and helped you overcome some of the common difficulties. If you know of any better workarounds or want to tell us about other common issues you have encountered, let us know on Twitter at @cloudcoverch.