Private Networking for LKE Clusters and Dependent Systems Using VLANs on Akamai Cloud Compute

9 min readMay 29, 2024

Overview

It is a common scenario for organizations that are deploying applications using Kubernetes to have dependent systems such as databases, caches, and messaging systems deployed separately on their own VMs (or even bare metal instances). If you are deploying these systems in the cloud you would then consider the networking construct and configurations that would allow you to connect your Kubernetes cluster to your dependent systems in a private network to minimize latency and promote your security posture. This article will guide you through how to setup your managed Kubernetes cluster deployed on Linode Kubernetes Engine (LKE) so that nodes can automatically join a VLAN so that your application and services pods can communicate with dependent systems.

The figure below provides an illustration of the goal of this article. By the end of the article we will have demonstrated how to connect your LKE cluster to a VLAN, scale the cluster and see new nodes automatically connect, and then test the connectivity by having pods make connections to services outside of the cluster within the VLAN.

The private network between the Kubernetes cluster and other compute instances is needed to enable secure connectivity between pods in the cluster and dependent services running on VMs outside of the cluster

Please note that the information contained in this article is an unofficial capability and the design and execution has plenty of room for improvements and security hardening depending on the context in which it is being used. It is meant to provide an inspirational example and thus it may not be suitable for a production deployment without additional hardening, logging, error handling, etc. so proceed with this in mind.

Pre-Requisites Before Deploying

Here is a checklist of things you will need in order to complete the exercise in this article:

Clone the GitHub repository here — https://github.com/patnordstrom/lke-vlan
You will need a Linode account. There’s a link on this page to register and get free credits to start — https://www.linode.com/docs/products/platform/get-started/
Once you have your account created you will need to spin up an LKE cluster with at least 2 nodes for testing — https://www.linode.com/docs/products/compute/kubernetes/get-started/
You will also need Kubernetes CLI installed locally as well to interact with the cluster. The article above covers how to get this setup.
The DaemonSet we are deploying to the cluster uses the Linode API so you will need a Personal Access Token (PAT) to add to the cluster as a secret — https://www.linode.com/docs/products/tools/api/guides/manage-api-tokens/. NOTE: The only permissions needed for this script is Read/Write on the Linodes object. See example below.

This article does expect some working knowledge of Kubernetes and using the kubectl command line. It also expects some working knowledge of Docker and building containers. Although none of this knowledge is strictly required to complete the steps laid out in the exercise, you will get more out of it with existing experience with containerization solutions.

Exploring the GitHub Repo

Before we deploy I wanted to give a quick overview of the repo. The “dev” folder contains the container definition and Bash script which can be used on a local machine that has Docker so that the script can be developed and changed locally. It hydrates variables that would typically be provided by ConfigMap or Secret resources via vars.sh . To use it you can build and run the container via the Dockerfile, then exec into the container in shell mode, and then execute the script as you develop it if you need to make any changes or extend it moving forward.

The “prod” folder contains the artifacts required to run this script as a Daemonset within an LKE cluster. The main.sh script only slightly differs from the “dev” version in that it doesn’t source the variables (that is the only line of code that is different). The deployment.yaml contains some values that you will need to update for the ConfigMap and Secret so that it applies to your deployment scenario. Also the DaemonSet references the container name and version on my DockerHub account which you can change as well if you build the container and host it yourself.

Deploying the DaemonSet

Before we begin, ensure you have a running LKE cluster with at least 2 nodes (mine has 3 nodes in this example) and that your Kubernetes CLI can reach the cluster as well.

✅ Kubernetes Cluster is running

You can view your cluster within Linode Cloud Manager to check your cluster status

✅ Kubernetes CLI is connected

[pnordstrom]$ kubectl get nodes
NAME                            STATUS   ROLES    AGE   VERSION
lke177234-257574-5667fb420000   Ready    <none>   20d   v1.29.2
lke177234-257574-5fa4b98a0000   Ready    <none>   8d    v1.29.2
lke177234-257574-64a26f810000   Ready    <none>   20d   v1.29.2

Updating the Deployment YAML

Next we need to update the deployment.yaml with the following:

REQUIRED — We need to add our Personal Access Token we generated earlier to the Secret resource. This needs to be a base64 encoding of the access token you generated earlier (see example below). NOTE: the access key below has been revoked before the publishing of this article.

[pnordstrom]$ echo -n c50cc51a2a8a65c6150f7608433d03378553670de466f0d9f905db45a6906eab | base64
YzUwY2M1MWEyYThhNjVjNjE1MGY3NjA4NDMzZDAzMzc4NTUzNjcwZGU0NjZmMGQ5ZjkwNWRiNDVhNjkwNmVhYg==

OPTIONAL — We need to update the ConfigMap with our VLAN name and an RFC 1918 CIDR block we want to use. NOTE: The default CIDR in the deployment.yaml isn’t necessarily the best choice as the Linode Private IP space used for communication between nodes and services within a region use the CIDR 192.168.128.0/17 so using any overlap in this CIDR could cause networking conflicts. For my test deployment detailed in this article I used 10.0.10.0/24 for my vlan_cidr value.
OPTIONAL — We need to update the container image reference in the DaemonSet to point to your image location if you built the container yourself (however, you can use the default one defined for testing without changing this if you like)

Here’s a summary of where you can make the changes to the DaemonSet YAML definition:

Deploy the DaemonSet

Once you have made the changes you can deploy the DaemonSet and you can even check the logs of one of the pods to see the key points in the script being hit and echoed.

# Apply the YAML
[pnordstrom]$ kubectl apply -f deployment.yaml 
configmap/vlan-join-controller-config created
secret/linode-api created
daemonset.apps/vlan-join-controller created

# Get the list of pods
[pnordstrom]$ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
vlan-join-controller-hqc9q   1/1     Running   0          5s
vlan-join-controller-l6fpk   1/1     Running   0          5s
vlan-join-controller-tffvw   1/1     Running   0          5s

# Check the logs from one of the pods
[pnordstrom]$ kubectl logs vlan-join-controller-hqc9q 
All init variables exist, starting script
Adding IP: 10.0.10.99/24 to list of existing VLAN IPs in use
The IP chosen is: 10.0.10.190/24
Node successfully added to VLAN
Node is rebooting

Once the script has updated the pool and the nodes have rebooted the DaemonSet will simply report every minute if it is joined to the VLAN. See example below

# Check for pods after the nodes have rebooted
[pnordstrom]$ kubectl get pods
NAME                         READY   STATUS    RESTARTS        AGE
vlan-join-controller-fgshr   1/1     Running   0               6m56s
vlan-join-controller-vx8jh   1/1     Running   0               4m52s
vlan-join-controller-xrskb   1/1     Running   1 (7m42s ago)   8m22s

# Check the logs for one of the pods to see that it has reported it is connected to the VLAN
[pnordstrom]$ kubectl logs vlan-join-controller-fgshr 
All init variables exist, starting script
compute instance ID not found
Node currently exists in the VLAN lke-private-vlan
Node currently exists in the VLAN lke-private-vlan
Node currently exists in the VLAN lke-private-vlan
Node currently exists in the VLAN lke-private-vlan
Node currently exists in the VLAN lke-private-vlan
Node currently exists in the VLAN lke-private-vlan
Node currently exists in the VLAN lke-private-vlan

Since this deploys as a DaemonSet it will deploy a pod to each new node that is added to the pool as you scale out. While not demonstrated here, you can easily test this out by scaling the pool and verifying that new nodes are registered to the VLAN.

Testing Connectivity Between LKE Cluster and Other VMs

Our original goal was to establish a private network so that we can connect our dependent services to our application and service pods in our Kubernetes cluster. Let’s test that connectivity out now that our nodes have joined the VLAN. For this test I’m deploying a simple NodeJS app that listens on port 8080 that replies with a message Response from [container_name] at [date_time]. You can choose whatever simple app you like to deploy for connectivity tests. In this example, I have joined the test app to the VLAN with an IP of 10.0.10.200. To connect your compute instance to a VLAN using Cloud Manager see our guide here. My example compute instance connected to the VLAN is shown below.

This is the example VM residing outside of our cluster that we will connect to from a pod inside the cluster now that it is joined to the same VLAN

The test below utilizes the busybox image to fetch the URL and the command also tells us the node that the command was run on. We can run it a couple of times to do a quick spot check in this case. Since the tests returned the expected response from the web service on port 8080 using the VLAN IP we can confirm the connectivity is working as expected!

# This example shows a successful response from our NodeJS server on node lke177234-257574-5667fb420000
[pnordstrom]$ kubectl run busybox --image=busybox -it --restart=Never -- wget -qO- 10.0.10.200:8080 && kubectl get pods busybox -o=custom-columns=NODE:.spec.nodeName && kubectl delete pod busybox
Response from 2f8677b0f6bf at Wed May 29 2024 16:16:33 GMT+0000 (Coordinated Universal Time)
NODE
lke177234-257574-5667fb420000
pod "busybox" deleted

# This example shows a successful response from our NodeJS server on node lke177234-257574-5fa4b98a0000 
pnordstrom@kube:/kube$ kubectl run busybox --image=busybox -it --restart=Never -- wget -qO- 10.0.10.200:8080 && kubectl get pods busybox -o=custom-columns=NODE:.spec.nodeName && kubectl delete pod busybox
Response from 2f8677b0f6bf at Wed May 29 2024 16:16:40 GMT+0000 (Coordinated Universal Time)
NODE
lke177234-257574-5fa4b98a0000
pod "busybox" deleted

Conclusions, Potential Improvements, and Next Steps

To wrap this up, let’s recap what we’ve accomplished in this exercise:

We have deployed a managed Kubernetes cluster on Linode with at least 2 nodes.
We have setup an access token and configured it within our deployment.yaml with a VLAN name and CIDR.
We have applied our deployment.yaml to our cluster and have observed the DaemonSet at work connecting the nodes to the VLAN.
We have tested our setup by spot checking that more than one node in the pool can connect to our web service that sits outside of the cluster but within the VLAN
While not shown in this exercise, you can optionally scale up your node pool by following this guide and also confirm that new nodes introduced to the pool will be automatically added to the VLAN.

While we’ve already covered that this method of deployment has room for improvement from a security and operational perspective, there are also some potential limitations if we take this example and build upon it for future scalability. Areas to consider include:

While a DaemonSet works well for ensuring that the state of the VLAN registration is maintained across all nodes the method of fetching existing IP addresses and overall IP address management may not scale for larger clusters. For each node that isn’t registered to the VLAN a request will be made at least N-1 times for a cluster of size N to the configuration profile endpoint to fetch the existing IPs. This could become unwieldy and inefficient with a larger cluster. One suggestion would be to have an intermediate storage to track claimed IPs in a more efficient and performant manner. Using a DHCP server within the VLAN is also another possibility but more complex to secure and configure than the current approach.
The current approach leverages a Bash script in an “infinite loop” to provide a controller-like behavior. An improved approach would be to build a proper controller using the Kubernetes Operator Pattern with their client SDK to hook into the event model and APIs more granularly. This would help improve resilience and observability amongst other things.

Thanks for reading, and I hope at a minimum this article has inspired you to explore a creative solution to solve your next challenge in whatever that may be!