High Availability EC2 Infrastructure on AWS with Terraform from GitHub Actions

High Availability EC2 Infrastructure on AWS with Terraform from GitHub Actions

Leveraging Terraform and CI/CD pipeline with GitHub Actions we can automate and streamline the provisioning of resources in Cloud platforms. Instead of using AWS Access Keys and Secret Keys, I stored the ARN of the Trusted Entity (OpenID Connect) as a secret in this project’s GitHub repository and accessed it via the secrets.AWS_ROLE construct, to enhance security.

OpenID Connect Provider + Assign Role + Permissions

For the automation process to run without errors, I created an OpenID Connect identity provider in Amazon Identity and Access Management (IAM) that has a trust relationship with this project’s GitHub repository. You can read about it here to get a detailed explanation with steps.

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::YOUR_ACCOUNT_NUMBER:oidc-provider/token.actions.githubusercontent.com"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringLike": {
                    "token.actions.githubusercontent.com:sub": "repo:YOUR_GITHUB_USERNAME/YOUR_REPO_NAME:*"
                }
            }
        }
    ]
}

Ensure that the IAM role (Trusted Entity) whose credentials are being used in this configuration has permissions to access the s3 backend where terraform states will be stored, and EC2 Access, VPC Access, and Elastic Load Balancer Access to create and manage all the resources that make up the EC2 infrastructure.

GitHub Secrets

We will create some secrets in the project’s GitHub Repository, for GitHub Actions to use during execution of the Terraform code.

AWS_BUCKET_NAME: Your Backend Bucket Name

AWS_BUCKET_KEY_NAME: Path For Terraform State

AWS_REGION: Your Region

AWS_ROLE: ARN Of the IAM Role (Trusted Entity)

Terraform Configuration Files

In the setup of a robust and high availability EC2 infrastructure on AWS, we will leverage the power of Infrastructure as Code (IaC) using Terraform. Create an s3 backend to store terraform states and write Terraform configuration files to provision:

1. Virtual Private Cloud (VPC) with two subnets (where the EC2 instances will reside) in different Availability Zones for redundancy.

2. Attach an Internet Gateway (IGW) to the VPC to enable internet traffic and create a Route Table to manage this traffic, associating it with the subnets and adding a route to the IGW.

3. Launch Template with the necessary settings for the instances, including the Amazon Machine Image (AMI), instance type, key name, user data, and security groups.

4. Auto Scaling Group using the Launch Template, to ensure that the desired number of EC2 instances are always running. These instances are automatically launched in the subnets as per the configuration defined in the Launch Template.

5. Application Load Balancer to handle incoming traffic and associate the ALB with a Target Group, which includes the instances managed by the Auto Scaling Group. This setup allows the traffic to be efficiently distributed across the instances.

6. Separate Security Groups for both the Launch Template and the Application Load Balancer. The Security Groups act as a virtual firewall, controlling the inbound and outbound traffic for the instances and the load balancer.

As a best practice in Terraform, create a module from these configuration files to encapsulate and reuse the infrastructure setup. This not only makes our code more organized and manageable, but it also allows us to easily replicate the setup in different environments or across multiple projects.

We will configure all these resources using Terraform and automate the provisioning process with GitHub Actions. This enables us to apply changes to the infrastructure in a consistent and repeatable manner, reducing the possibility of human error and ensuring a robust and highly available EC2 infrastructure on AWS.

                         #### provider.tf ####

terraform {
  required_providers {
    aws = {
      version = ">= 4.0"
      source = "hashicorp/aws"
    }
  }
  backend "s3" {
  }
}
provider "aws" {

}
  ####### Vars.tf ##########

variable "cidr_subnet_1" {
  type        = string
  description = "variable subnet 1"
  default     = "10.0.1.0/24"
}

variable "cidr_subnet_2" {
  type        = string
  description = "variable subnet 2"
  default     = "10.0.2.0/24"
}

variable "cidr_open" {
  type        = string
  description = "Allow traffic over the internet"
  default     = "0.0.0.0/0"
}

variable "cidr_vpc" {
  type        = string
  description = "Allow traffic locally"
  default     = "10.0.0.0/16"
}


variable "instance_type" {
  type        = string
  description = "variable instance type"
  default     = "t2.micro"
}

variable "ami_id" {
  type        = string
  description = "variable AMI ID"
  default     = "ami-011ab7c70f5b5156a"
}

variable "a_z_a" {
  type    = string
  default = "us-east-2a"
}

variable "a_z_b" {
  type    = string
  default = "us-east-2b"
}

variable "key_name" {
  type = string
  default = "ec2login"
}
#### network.tf #######

#Create VPC and two public subnets
resource "aws_vpc" "vpc" {
  cidr_block = var.cidr_vpc
  enable_dns_hostnames = true
  enable_dns_support = true
  instance_tenancy = "default"

  tags = {
    Name = "app-vpc"
  }
}

resource "aws_subnet" "pub_sub_1" {
  vpc_id     = aws_vpc.vpc.id
  cidr_block = var.cidr_subnet_1
  availability_zone = var.a_z_a

  tags = {
    Name = "subnet_1"
  }
}

resource "aws_subnet" "pub_sub_2" {
  vpc_id     = aws_vpc.vpc.id
  cidr_block = var.cidr_subnet_2
  availability_zone = var.a_z_b

  tags = {
    Name = "subnet_2"
  }
}

resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.vpc.id

  tags = {
    Name = "app-vpc-igw"
  }
}

resource "aws_route_table" "pub_rt" {
  vpc_id = aws_vpc.vpc.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.igw.id
  }

  tags = {
    Name = "pub_rt"
  }
}

resource "aws_route_table_association" "sub_1" {
  subnet_id = aws_subnet.pub_sub_1.id
  route_table_id = aws_route_table.pub_rt.id
}

resource "aws_route_table_association" "sub_2" {
  subnet_id = aws_subnet.pub_sub_2.id
  route_table_id = aws_route_table.pub_rt.id
}
 #### security-alb.tf ####

#Create security group for Application LB to allow http and https 
resource "aws_security_group" "alb_sg" {
  name        = "alb-sg"
  vpc_id      = aws_vpc.vpc.id

  ingress {
    description      = "http"
    from_port        = 80
    to_port          = 80
    protocol         = "tcp"
    cidr_blocks      = [var.cidr_open]
  }

  ingress {
    description      = "https"
    from_port        = 443
    to_port          = 443
    protocol         = "tcp"
    cidr_blocks      = [var.cidr_open]
  }

  egress {
    description      = "Outgoing"
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = [var.cidr_open]
  }

  tags = {
    Name = "alb-sg"
  }
}
#### security-lt.tf ####

#Create security group for launch template to allow http and ssh 
resource "aws_security_group" "lt_sg" {
  name        = "launchtemp-sg"
  vpc_id      = aws_vpc.vpc.id

  ingress {
    description      = "http"
    from_port        = 80
    to_port          = 80
    protocol         = "tcp"
    cidr_blocks      = [var.cidr_open]
  }

  ingress {
    description      = "https"
    from_port        = 22
    to_port          = 22
    protocol         = "tcp"
    cidr_blocks      = [var.cidr_open]
  }

  egress {
    description      = "Outgoing"
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = [var.cidr_open]
  }

  tags = {
    Name = "launchtemp-sg"
  }
}
#### lt-asg-alb.tf ####

#Create launch template with Amazon Linux 2 and run script
resource "aws_launch_template" "launch_template" {
  name = "lt-asg"
  image_id = var.ami_id
  instance_type = var.instance_type
  key_name = var.key_name
  user_data = filebase64("${path.root}/userdata/userdata.tpl")

  #copy file from local to ec2 instance
  provisioner "file" {
    source      = "${path.root}/userdata/index.html"
    destination = "/tmp/index.html"

    connection {
      type        = "ssh"
      user        = "ec2-user"
      private_key = file("~/.ssh/id_rsa")
      host        = self.public_ip
    }
  }

  network_interfaces {
    associate_public_ip_address = true
    delete_on_termination = true
    security_groups = [aws_security_group.lt_sg.id]
  }

}

#Create auto scaling group 
resource "aws_autoscaling_group" "asg" {
  name = "app-asg"
  max_size = 5
  min_size = 2
  desired_capacity = 2
  vpc_zone_identifier = [aws_subnet.pub_sub_1.id, aws_subnet.pub_sub_2.id]
  target_group_arns = [aws_lb_target_group.alb_tg.arn]

  launch_template {
    id = aws_launch_template.launch_template.id
    version = "$Latest"
  }
}


#Create application load balancer
resource "aws_alb" "app_lb" {
  name = "app-lb"
  internal = false
  load_balancer_type = "application"
  security_groups = [aws_security_group.alb_sg.id]
  subnets = [aws_subnet.pub_sub_1.id, aws_subnet.pub_sub_2.id]
}


#Create and configure the listener for load balancer
resource "aws_alb_listener" "lb_listener" {
  load_balancer_arn = aws_alb.app_lb.arn
  port = 80
  protocol = "HTTP"

  default_action {
    type = "forward"
    target_group_arn = aws_lb_target_group.alb_tg.arn
  }
}


#Create the application load balancer target group
resource "aws_lb_target_group" "alb_tg" {
  name = "alb-tg"
  target_type = "instance"
  port = 80
  protocol = "HTTP"
  vpc_id = aws_vpc.vpc.id
}


#Display application load balancer dns name
output "dns_name" {
  description = "the DNS name of the alb"
  value = aws_alb.app_lb.dns_name
}
#### userdata.tpl ####

#!/bin/bash
sudo su
yum update -y
yum install -y httpd.x86_64
yum install -y jq
systemctl start httpd.service
systemctl enable httpd.service
cp /tmp/index.html /var/www/html/index.html

Setup GitHub Actions

Create a file in the project’s GitHub repository: “.github/workflows/terraform.yml”

In GitHub Actions, a workflow can be triggered by various events, such as a push to a branch, a pull request being opened, or a release being published. These events are specified in the on section of the workflow file.

 #### terraform.yml ####


name: Automate HA EC2 Infrastructure Provisioning

on:
  push:
    branches:
      - main
  pull_request:

permissions:
      id-token: write # for aws oidc connection
      contents: read   # for actions/checkout
      pull-requests: write # for GitHub bot to comment PR
env:
  TF_LOG: INFO
  AWS_REGION: ${{ secrets.AWS_REGION }}      

jobs:
  deploy:
    name: Terraform
    runs-on: ubuntu-latest
    defaults:
      run:
        shell: bash
        working-directory: .
    steps:

    - name: Checkout Repo
      uses: actions/checkout@v1

    - name: Configure AWS credentials from AWS Account
      uses: aws-actions/configure-aws-credentials@v1
      with:
        role-to-assume: ${{ secrets.AWS_ROLE}}
        aws-region: ${{ secrets.AWS_REGION }}
        role-session-name: GitHub-OIDC-TERRAFORM

    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v2
      with:
        terraform_version: 1.5.5

    - name: Terraform fmt
      id: fmt
      run: terraform fmt

    - name: Terraform Init
      id: init
      env:
        AWS_BUCKET_NAME: ${{ secrets.AWS_BUCKET_NAME }}
        AWS_BUCKET_KEY_NAME: ${{ secrets.AWS_BUCKET_KEY_NAME }}
      run: |
        rm -rf .terraform
        terraform init -backend-config="bucket=${AWS_BUCKET_NAME}" -backend-config="key=${AWS_BUCKET_KEY_NAME}" -backend-config="region=${AWS_REGION}"

    - name: Terraform Validate
      id: validate
      run: terraform validate -no-color 

    - name: Terraform Plan
      id: plan
      run: terraform plan -no-color

    - name: Terraform Apply
      id: apply
      run: terraform apply -auto-approve -input=false
      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
  • GitHub actions showing runs of terraform commands

  • GitHub actions showing run of terraform apply complete

  • Resources provisioned on AWS

  • Launch Template contains the EC2 configuration

  • Target Groups reference the EC2 instances were traffic is forwarded

  • Auto-scaling Group has configurations for high-availability (HA)

  • Application Load Balancer forwards traffic to the defined targets

It’s important to note that managing AWS resources can incur costs, so it’s crucial to keep track of what’s being provisioned and to decommission resources when they’re no longer needed.

We can further enhance the functionality and security of the High Availability EC2 Infrastructure by enabling https and SSL certificates, and create autoscaling policies with cloud-watch alarms to scale the instances based on CPU utilization.

Some Challenges

Load Balancer AuthFailure: AWS was not able to validate the provided access credentials. Use IAM Access Advisor to check the service permissions granted to the IAM role and when those services were last accessed. From the IAM Access Advisor report, the IAM Role had insufficient permissions to perform actions for managing an Application Load Balancer (ALB) and its target groups reason why I encountered AuthFailure, and subsequent 502 Bad Gateway.

Base64 Encoding for User Data Script: User data must be base64-encoded when passed to an instance. This can be tricky if you’re not familiar with base64 encoding. In Terraform, you can use the filebase64()function to encode your user data script, and provide the path to the script.

Reminder

Thank you for following up to this point. I hope you found the article helpful, consider showing your support! 👏👏👏

I look forward to connecting with you! https://www.linkedin.com/in/chenwingu/