Deploying Highly Available Compute Instances in Your Akamai Connected Cloud VLAN

9 min readJul 26, 2024

Overview

One of the most common use cases for cloud-based workloads is a multi-tier web application accessible as a website over a domain such as www.example.com. A best practice from a security perspective is to build your N-tier application into segments with routing rules and firewalls between public and private subnets. Akamai Cloud offers different networking constructs to achieve private networking including VPCs and VLANs. This article will describe some considerations and why you might choose to use a VLAN over a VPC as well as provide a GitHub repo that demonstrates how you can setup your VLAN network segments with high availability should you choose a VLAN as your private networking construct.

Deployment Scenario

To further illustrate the use case we are trying to solve for, let’s look at what a typical multi-tier web application might look like and then discuss the components and decision points for choosing a VLAN for our deployment scenario. The diagram below will be used to represent our multi-tier web application. This example shows our network is segmented into multiple subnets with routing and firewall rules in-between the distinct subnets. We’ll keep this illustration in mind as we work through the rest of this article and discuss why you would choose a VLAN over a VPC as well as how to maintain high availability between VLAN network segments.

Generic example of an N-tier web application with public/private network segmentation

Comparing VLAN and VPC in Akamai Cloud

Before diving into the use case and code for our example, let’s quickly touch on some differences between VPC and VLAN to motivate our decision to focus on VLAN for this use case. Here is a quick table to compare some of the capabilities of the product as of the writing of this article:

Non-exhaustive comparison of VLAN and VPC

Here are some helpful links that go into deeper discussions about the use case considerations as well as regional availability of these networking features:

At this point you’re probably wondering what we mean by some of the terminology used for comparison in the table above, so I will quickly explain some of the key differences to motivate our use case:

Number of Subnets Per — A VPC allows you to create subnets with IPv4 CIDR blocks and you use Cloud Firewall to manage ACLs between subnets. A VLAN by itself is just a single logical layer 2 subnet. Therefore you would create multiple VLANs and use Linux compute instances to manage routing and firewall rules between your VLANS.
Number of Regions Available — This table illustrates that VLAN is currently available in all regions so workloads that need to leverage regions where VPC is not yet supported may be a consideration for using VLAN.
NAT Gateway Capability — As of this writing, neither VPC or VLAN have a cloud NAT gateway service that you can attach to these networks. In the case of VCP you can choose NAT 1:1 so that each instance can be mapped to a unique external IP for outbound and inbound connectivity and then manage access with Cloud Firewall, or alternatively you can create a Layer 7 reverse proxy in your public subnet to facilitate HTTP(S) outbound communication for servers in the private subnet. With VLAN you have the capability of building a NAT gateway in your public subnet using Linux compute instances as a router and firewall as stated previously.

This comparison is not an exhaustive list by any means, just something designed to illustrate where key differences exist in order to make a decision on which networking construct is suitable for our use case.

When to Select a VLAN

Given the above information, when might you want to select a VLAN over a VPC? If your workload has these characteristics, then a VLAN may make more sense for your use case:

You need private networking in regions where VPC is not yet supported.
You have (non-HTTP) TCP or UDP protocols you need to support via the NAT gateway and desire a common egress path for all traffic from your private networking construct
You want more control and customization over your network segments and are comfortable managing routes and network rules with Linux native software
You are using a service such as LKE and want to connect your Kubernetes cluster to additional services on your account via private networking.

For the sake of this article we’ll assume that you have elected to use a VLAN to create your private networking setup on your workload.

Getting Your Feet Wet With VLAN for Private Networking

A colleague of mine, Brent Eiler, went into additional detail in this article about how to setup a VLAN to create your multi-tier isolated network construct for your workload, which also provides a great overview for how you can setup subnets and NAT gateway functionality.

The goal of the remainder of this article is to provide an example of how you can enable high availability of your private network. One of the core components that enables networking communication in this architecture are the Linux instances that sit at the boundary points of the network segments. If they are deployed as standalone instances without redundancy they will create a single point of failure which is undesirable for a highly available production web application.

We will now proceed to demonstrate how you can use IP failover using keepalived within VLANs to create resiliency in your setup. As part of demonstrating this we will also expose you to a number of integration tools and capabilities with Akamai Cloud compute that you may or may not be familiar with which includes:

Please note that the repository provided and execution has plenty of room for improvements and security hardening depending on the context in which it is being used. It is meant to provide an inspirational example and thus it may not be suitable for a production deployment without additional hardening, logging, error handling, etc. so proceed with this in mind.

Pre-Requisites Before Deploying

Here is a checklist of things you will need in order to complete the exercise in this article:

Clone the GitHub repository here — https://github.com/patnordstrom/vlan-ip-failover
You will need a Linode account. There’s a link on this page to register and get free credits to start — https://www.linode.com/docs/products/platform/get-started/
You will need to have Terraform installed and configured with a Personal Access Token from your Linode account to deploy the infrastructure. You can follow the Linode Provider getting started guide on how to configure Terraform with your Personal Access Token. For my setup I have a configuration file on my local computer at ~/.config/linode with my personal access token.
You should have an SSH key created on your cloud account for accessing the nodes to after they are deployed. You will provide your username to the script as part of the deployment but will need the SSH keys setup before deploying.

Exploring the GitHub Repo

Before we deploy I wanted to give a quick overview of the repo. The repo contains 2 folders:

The terraform folder contains the code that will deploy 2 instances labeled as “ha_nodes” These nodes would represent your internal routing / firewall servers within your VLAN network boundaries. Note that this pattern can be used for any services where you want redundancy in a primary / failover fashion within your VLAN (e.g. not just for routing servers). The other instance labeled as “test_client_node” is deployed within your VLAN and can be used to validate the high availability of your nodes.
The cloud-init folder contains User Data Scripts that are used to configure the instances during deployment. The ha-server-config.yaml file installs Nginx and Keepalived and configures the “ha_nodes” in a primary / failover setup. Nginx is installed just for demo purposes so that we can test the failover in a visual way. The test-client-config.yaml just creates a simple curl script that we will use to validate the high availability setup as part of this exercise.

The goal of this repo is to give you a basic working example of VLAN high availability with minimal configuration, simply by running a terraform apply command.

Minimum Configuration Requirements

Here are the minimum configuration requirements to deploy the repo:

You will need to know your username within Linode Cloud Manager that you have SSH keys already setup against. You can find your username by going to your profile page. Be sure that you also have at least one SSH key setup on your account as well.
You will need to know your public IPv4 address that you will be accessing the nodes from. You can find your IP address by using Akamai’s User Agent Tool. You can supply your configuration at runtime when you execute terraform apply or you can create a terraform.tfvars file with those values in the terraform directory.

Here is an example the format of the configuration file (note the trailing /32 for the IP address):

Example terraform.tfvars configuration file for this repo

Deploying the Demo Solution

To deploy the solution all you really need to do is download the GitHub repo, navigate to the terraform directory, and then run terraform apply. If you configured a terraform.tfvars file like demonstrated above all you need to do is approve the deployment. Otherwise it will ask you to supply the required variables. Once your deployment has completed successfully you will need to fetch the IP for the test-client-node to use in the next step.

Testing IP Failover for HA Nodes

Now that the solution is up and running we will test the failover on the VLAN by accessing the test-client-node over SSH. Assuming you supplied the correct username that has at least one SSH key associated you can SSH via the following command ssh -i /path/to/your/private/key root@[ip_address] and then navigate to the /tmp directory where there is a script labeled ping_virtual_ip.sh.

You can run this script in one terminal and open a separate terminal where you can tail the log file it generates at /tmp/ping_virtual_ip.log. You should see by default that the log includes a timestamp and a message “Hello from ha-node-1”.

Now to test the failover you should reboot ha-node-1 within Cloud Manager or the Linode CLI. After several seconds the message should change and you should be seeing “Hello from ha-node-2”. What has happened is that the heartbeat that keepalived is tracking has moved the virtual IP over to the backup node seamlessly when the primary node is no longer available (e.g. because you rebooted it). After another 30 seconds or so when the reboot has completed, you will see the message switch back to “Hello from ha-node-1”. The configuration that was setup in ha-server-config.yaml is setup so that ha-node-1 is considered the “primary node” and will take back over serving traffic from the backup after it becomes healthy. Your testing setup should look similar to the below:

Conclusions, Potential Improvements, and Next Steps

This setup has demonstrated a basic method that allows you to create a segmented network with VLANs while also making the setup highly available at your network boundaries. This ensures that you can keep the network up and running during patches and maintenance as well as minimize downtime for any other issues that might arise within your deployment. Here are some areas for potential improvement with this deployment:

There are a number of hardening and security configurations that can be applied within your cloud-init script that weren’t included in this demo.
The demo makes use of the Jinja template features of cloud-init, but for more complex configuration management requirements, it is recommended to use a dedicated configuration management tool such as Ansible, Chef, Puppet, etc.
We used the “password” authentication method for Keepalived and the password in this case was stored within the repo itself, which you wouldn’t do in a realistic deployment. Integrating a secrets vault or using alternate security setup with Keepalived would be recommended for a production deployment.

Thanks for reading, and I hope at a minimum this article has inspired you to explore a creative solution to solve your next challenge in whatever that may be!