Provision a secure Amazon EKS cluster using Terraform and GitHub Actions

Amazon EKS is a managed Kubernetes service from AWS that closely follows the open source Kubernetes release cycle and eliminates the operational overhead of running control plane components. While cloud engineers maintain full control over worker nodes and applications, AWS handles the control plane infrastructure, scaling, and high availability — but the engineering team controls when updates occur.

In this note, I demonstrate how to build an Amazon EKS cluster using Terraform. The solution deploys a production-ready EKS cluster within a secure, multi-AZ VPC architecture. Worker nodes operate in private subnets with outbound access through a NAT gateway, while VPC endpoints keep AWS API traffic off the public internet. Custom security groups control communication between the control plane and worker nodes, and customer-managed KMS keys encrypt data across all services — EKS secrets, CloudWatch logs, ECR images, and SSM parameters. An ECR repository provides secure container image storage, and infrastructure outputs are bundled into an encrypted SSM parameter for consumption by subsequent platform stacks.

Solution Overview

At a high level, this use case can be divided into the following steps:
1. Create the network stack to host the EKS cluster
2. Create security groups to enable secure communication
3. Create IAM roles
4. Create a CloudWatch Log group for control plane logs
5. Create an EKS cluster
6. Create EKS worker nodes to host the container workloads
7. Configure EKS add-ons (VPC CNI, CoreDNS, kube-proxy, EBS CSI driver, Pod Identity agent)
8. Create an ECR repository for container images
9. Store infrastructure outputs in SSM Parameter Store

You can find the complete implementation in my GitHub repository: kunduso-org/aws-eks-terraform (branch: create-eks-cluster). The code includes Terraform configurations, GitHub Actions CI/CD, and security scanning with Checkov.

Prerequisites

This use case requires the following prerequisites. These are:
PreReq-1: An AWS account with an IAM role configured for GitHub Actions via OIDC. This eliminates long-lived credentials. I covered this setup in this note. Then store the role ARN as a GitHub secret named IAM_ROLE
PreReq-2: An Amazon S3 bucket for Terraform remote state
(Optional) PreReq-3: An Infracost API key stored as a GitHub secret named INFRACOST_API_KEY, which is used for cost estimation on pull requests

Implementation

I will now walk through these steps in detail.

Step 1: Create the network stack to host the EKS cluster
An EKS cluster runs within an Amazon Virtual Private Cloud (VPC), which provides network isolation and security.

1.1 Create a Virtual Private Cloud for the EKS Cluster

I used my VPC module to create the foundational networking components. These are:

– Two public subnets spread over two availability zones for NAT gateways and load balancers
– Two private subnets spread over two availability zones for EKS worker nodes (security best practice)
– Internet Gateway to enable outbound internet access
– NAT Gateway to allow private subnet resources to reach the internet, and
– Route tables for direct traffic between subnets and gateways.

Amazon EKS also requires specific subnet tags to ensure proper integration with AWS Load Balancer Controller and other EKS services. This is also managed via the module. Please read Subnet requirements for nodes for more information.

1.2 Create VPC Endpoints for secure AWS service access
After creating the VPC, I added VPC endpoints to enable secure, private communication with AWS services. Without VPC endpoints, worker nodes in private subnets would need to route all AWS API calls (ECR pulls, CloudWatch logs, etc.) through the NAT Gateway and public internet, which increases both costs and attack surface. I added the following endpoints to the VPC:

– EC2, for instance metadata and management
– ECR API & DKR, for container image registry access
– S3, for pulling container images and artifacts
– STS, for IAM token exchange and Pod Identity
– CloudWatch Logs, for centralized logging
– SSM, for parameter store and session management, and
– ELB, for load balancer management

Here is a screenshot of the ec2 and ecr_api VPC endpoints.

Along with the endpoints, I created a dedicated security group that allows inbound HTTPS traffic (port 443) from the VPC and permits all outbound traffic for the endpoints to function properly.

These endpoints reduce NAT Gateway costs and improve security by keeping AWS API calls within the AWS network backbone.

Step 2: Create security groups to enable secure communication
This use case requires two security groups: one for the EKS cluster control plane and one for the worker nodes; in addition to the VPC endpoint security group created in step 1.

The EKS cluster security group controls traffic to and from the control plane. It allows inbound HTTPS (port 443) from the worker nodes so they can communicate with the Kubernetes API server. For outbound traffic, it permits HTTPS to 0.0.0.0/0 so the control plane can reach AWS APIs, and ports 443-65535 to the worker nodes for kubelet communication, webhook callbacks, log collection, and kubectl exec sessions.

The worker node security group controls traffic to and from the data plane. It allows all inbound traffic between nodes (required for pod-to-pod networking and cluster DNS). It also allows inbound traffic on ports 443-65535 from the cluster security group so the control plane can reach the kubelet API and admission webhooks running on the nodes. For outbound traffic, it permits all protocols to 0.0.0.0/0. While the VPC endpoints from step 1 handle most AWS API calls over private connectivity, the broad egress rule covers any traffic not routed through an endpoint.

Step 3: Create IAM roles
EKS uses IAM roles at two levels: service-level roles for AWS services to manage cluster resources, and pod-level roles for applications to access AWS services securely.

I created three IAM roles with specific purposes:

EKS Cluster Role to allow the EKS control plane to manage AWS resources on your behalf, enabled via the managed IAM policy AmazonEKSClusterPolicy (creates/manages worker nodes, security groups, and networking). Details at AmazonEKSClusterPolicy.

EKS Node Role to enable worker nodes to join the cluster and pull container images using the following managed policies: AmazonEKSWorkerNodePolicy (allows nodes to describe EC2 resources required for cluster bootstrapping and enables Pod Identity credential retrieval), AmazonEKS_CNI_Policy (allows the VPC CNI plugin to manage elastic network interfaces and assign private IP addresses to pods), AmazonEC2ContainerRegistryReadOnly (grants read-only access to pull container images from Amazon ECR). Details at AmazonEKSWorkerNodePolicy, AmazonEKS_CNI_Policy and AmazonEC2ContainerRegistryReadOnly.

EBS CSI Driver Role, which uses the Pod Identity mechanism to allow the CSI driver pods to manage EBS volumes for persistent storage. Attached policy: AmazonEBSCSIDriverPolicy (create/attach/delete EBS volumes). Details at AmazonEBSCSIDriverPolicy.

The cluster role is assumed by the EKS service, the node role is assumed by EC2 worker instances (nodes), and the EBS CSI role is assumed by the EBS CSI driver pods to manage persistent volumes through Pod Identity authentication.

Step 4: Create a CloudWatch Log group for control plane logs
EKS automatically creates a CloudWatch log group if you don’t specify one, but with default settings of indefinite retention and no customer-managed KMS encryption.

I pre-create the log group to set a defined retention period (365 days) and enable KMS encryption for security compliance.

Step 5: Create an EKS cluster
In this step, I created the Amazon EKS cluster — the Kubernetes control plane. I attached the eks_cluster IAM role from step 3 so the control plane can manage AWS resources on your behalf. Under vpc_config, I provided the eks_cluster security group from step 2 and the private subnet IDs from step 1. EKS uses these subnets to place elastic network interfaces (ENIs) for control plane-to-node communication. I chose private subnets per AWS recommendations, since the VPC endpoints created in step 1 already provide the necessary connectivity to AWS services.

Setting both endpoint_private_access and endpoint_public_access to true means the Kubernetes API server is reachable from within the VPC and from the internet. The public access isn’t required for building the cluster itself — it’s needed later for external tools or CI/CD pipelines to deploy workloads into the cluster. Note that the public endpoint is currently open to all traffic (0.0.0.0/0); this access must be restricted to specific CIDRs as a recommended next step for production environments.

The encryption_config specifies a KMS key for encrypting Kubernetes Secrets at rest in etcd, and enabled_cluster_log_types defines which control plane log types are sent to the CloudWatch log group created in step 4. I also pinned the cluster to Kubernetes version 1.30 and added explicit depends_on references to ensure the IAM policy attachment and CloudWatch log group are created before the cluster.

Step 6: Create EKS worker nodes to host the container workloads
This step creates the worker nodes where Kubernetes schedules pods. The configuration uses two Terraform resources — aws_launch_template and aws_eks_node_group — which separate instance-level settings from cluster-level node management.

The aws_launch_template defines the instance configuration: the worker node security group from step 2, encrypted gp3 EBS volumes, and metadata options with IMDSv2 enforced (http_tokens = "required") to prevent SSRF-based credential theft.

The aws_eks_node_group ties everything together — it references the launch template, attaches the node IAM role from step 3, and places the nodes in private subnets from step 1. The scaling configuration starts with 2 t3.medium ON_DEMAND instances and allows scaling between 1 and 3 nodes. EKS manages the underlying Auto Scaling Group, which launches and terminates EC2 instances based on this configuration.

Step 7: Configure EKS add-ons (VPC CNI, CoreDNS, kube-proxy, EBS CSI driver, Pod Identity agent)
EKS add-ons are AWS managed Kubernetes components that extend the cluster’s capabilities. AWS handles versioning, security patches, and compatibility for these components.

The five add-ons are:
– vpc-cni: Pod networking within the VPC to manage ENIs and assign IP addresses to the pods
– coredns: DNS resolution for Kubernetes services and pods
– kube-proxy: Network proxy that maintains network rules for Kubernetes services
– aws-ebs-csi-driver: Persistent volume support using Amazon EBS, authenticated via the Pod Identity role created in step 3
– eks-pod-identity-agent: to enable the Pod Identity authentication mechanism for pods to access AWS services

All add-ons depend on the node group from step 6 since they need running nodes to schedule onto. The EBS CSI driver additionally depends on its Pod Identity association being created first.

Step 8: Create an ECR repository for container images
Amazon Elastic Container Registry (ECR) is a managed container registry for storing Docker images. When pods are scheduled, the worker nodes pull container images from this repository using the AmazonEC2ContainerRegistryReadOnly policy attached to the node role in step 3, along with the ECR VPC endpoints created in step 1.

The repository is configured with immutable image tags to prevent overwriting existing images — once a tag is pushed, it cannot be reused. This ensures deployments are reproducible and prevents accidental or malicious image replacement. Images are encrypted at rest using a dedicated KMS key, and scan-on-push is enabled to automatically scan images for known vulnerabilities when they are pushed to the repository.

A lifecycle policy manages image retention with two rules: untagged images are removed after 7 days, and only the latest 10 tagged images are kept. This prevents unbounded storage growth while retaining enough history for rollbacks.

Step 9: Store infrastructure outputs in SSM Parameter Store
The final step bundles key infrastructure outputs into a single SSM parameter so that subsequent Terraform stacks can consume them without direct cross-stack references. The parameter is stored as a SecureString encrypted with a dedicated KMS key, since it contains sensitive values, such as the cluster endpoint and certificate authority data.

The JSON payload includes the cluster name, endpoint, and CA data needed to authenticate with the Kubernetes API, along with VPC and subnet IDs, the ECR repository URL, and KMS key ARNs. This creates a clean handoff point — a platform stack (covered in a future article) can read this single parameter to get everything it needs to deploy Helm charts and configure Kubernetes components on top of this infrastructure.

Deployment

The infrastructure is deployed via GitHub Actions using the workflow defined in .github/workflows/terraform.yml. The workflow runs terraform apply only when changes are merged into the main branch, using OIDC authentication to obtain secure, temporary AWS credentials. This ensures that all infrastructure modifications are reviewed via pull requests before deployment.

This repository also includes a code-scanning pipeline (.github/workflows/code-scan.yml) using Checkov to scan Terraform configurations against security best practices before deployment. For detailed implementation of Checkov with GitHub Actions, see: automate-terraform-configuration-scan-with-checkov-and-github-actions.

Validation

After deployment, validate the cluster is operational by confirming the following in the AWS console:

The EKS cluster status shows as Active with the expected Kubernetes version (1.30).

The node group shows the desired number of nodes in Ready status.

All five add-ons (VPC CNI, CoreDNS, kube-proxy, EBS CSI driver, Pod Identity agent) show as Active. The ECR repository is created with scan-on-push and encryption enabled. The SSM parameter at /${var.name}/output contains the JSON payload with cluster details.

Conclusion

In this note, I walked through provisioning a secure Amazon EKS cluster using Terraform — from the VPC and security groups through the control plane, worker nodes, add-ons, ECR, and SSM parameter handoff. Every service uses customer-managed KMS encryption, worker nodes run in private subnets with VPC endpoints for AWS API access, and the infrastructure outputs are bundled into a single SSM parameter for consumption by subsequent stacks.

This infrastructure forms the foundation for running containerized workloads on Kubernetes. In the next article, I’ll build on this by deploying platform components using Helm and a multi-stack Terraform pattern — starting with the AWS Load Balancer Controller.

If you have any questions or suggestions, feel free to comment or get in touch.

Provision a secure Amazon EKS cluster using Terraform and GitHub Actions

Like this:

Published by sourav kundu

One thought on “Provision a secure Amazon EKS cluster using Terraform and GitHub Actions”

Leave a ReplyCancel reply

Share this:

Like this:

Published by sourav kundu

One thought on “Provision a secure Amazon EKS cluster using Terraform and GitHub Actions”

Leave a ReplyCancel reply

Discover more from My Devops Journal