ReadWriteMany (RWX) Volumes on Akamai Cloud Compute LKE Clusters
Overview
While Kubernetes is often associated with stateless workloads and ephemeral storage use cases, deploying stateful applications in Kubernetes is continuing to gain traction and popularity, especially with the availability of Helm charts that help operationalize this pattern. Storage use cases in Kubernetes often focus on highly available database clusters, messaging services, caches, etc. These topologies leverage volumes in a ReadWriteOne (RWO) pattern where each pod accesses it’s own volume for reading and writing and data is replicated to other volumes through the implementation itself (e.g. often through transaction log replication over some type of network port). In this case, while the same data might be accessible from a client that queries against any given node, the access pattern of the underlying storage is still 1:1 between the pod and the underlying volume. Some applications require a different pattern of accessing volumes in a traditional shared storage methodology. This might be a legacy application that is being rearchitected for cloud-native deployments or even a web CMS that uses a database to store content metadata and a filesystem to store media.
Wordpress is a typical example that uses both types of storage. If you deploy Wordpress in a highly available fashion on Kubernetes by default it would require access to a ReadWriteMany (RWX) volume for storing media. Usually with popular applications that have been ported to Kubernetes there are options for configuring media to be stored in a cloud-native medium such as object storage. However, at the same time some application deployments require consistent capabilities in a cloud/on-prem hybrid approach, they may not be customer-facing and used at a lower volume, or maybe they are legacy and difficult to upgrade all integration aspects. In either case, there are plenty of applications in the wild that require access to RWX volumes in Kubernetes. This article will be part of a multi-part series that can provide you with implementation examples using Kubernetes on Akamai Cloud Compute with RWX volumes.
Deployment Scenario
This article specifically will demonstrate an example implementation as shown in the diagram below which is meant to be an analog to the web CMS example described earlier. In this scenario the deployment will create the following:
- An NFS server with an export available to be consumed by the Kubernetes NFS Subdir External Provisioner
- A three-node Linode Kubernetes Engine (LKE) cluster deployed with the helm chart for the NFS Subdir Provisioner and a simple Python Flask app that allows users to upload and view uploaded files. The deployment will leverage the NFS server as a ReadWriteMany volume for persisting and accessing the files. The simple web app is exposed via a LoadBalancer Service.
As part of demonstrating this we will also expose you to a number of integration tools and capabilities with Akamai Cloud that you may or may not be familiar with which includes:
Please note that the repository provided and execution has plenty of room for improvements and security hardening depending on the context in which it is being used. It is meant to provide an inspirational example and thus it may not be suitable for a production deployment without additional hardening, logging, error handling, etc. so proceed with this in mind.
Pre-Requisites Before Deploying
Here is a checklist of things you will need in order to complete the exercise in this article:
- Clone the GitHub repository here — https://github.com/patnordstrom/lke-rwx-examples
- You will need a Linode account. There’s a link on this page to register and get free credits to start — https://techdocs.akamai.com/cloud-computing/docs/getting-started
- You will need to have Terraform installed and configured with a Personal Access Token from your Linode account to deploy the infrastructure. You can follow the Linode Provider getting started guide on how to configure Terraform with your Personal Access Token. For my setup I have a configuration file on my local computer at
~/.config/linode
with my personal access token. - This guide also uses Kubectl and Helm to deploy the NFS Subdir Provisioner. See our guide here for getting started with Helm.
- You should have an SSH key created on your cloud account for accessing the nodes to after they are deployed. You will provide your username to the script as part of the deployment but will need the SSH keys setup before deploying.
Exploring the GitHub Repo
Before we deploy I wanted to give a quick overview of the repo. The repo contains 3 folders:
- The
terraform
folder contains the code that will deploy a compute instance and configure it as the NFS server. It will deploy a three-node LKE cluster and two cloud firewalls as well. The cloud firewalls help ensure that only the required ports are open for communication between the LKE cluster and the NFS server. Additionally port 22 to the NFS server is made available from your IP address (which you can configure) as part of the Terraform variables in case you want to explore it and see how the storage class provisions on the NFS export. - The
cloud-init
folder contains User Data Scripts that are used to configure the NFS server during deployment. - The
flask-app
folder contains the test sample application that we will deploy into the Kubernetes cluster. It has a simple homepage that allows you to upload files as well as see which LKE node the app is being served from. It also has a directory page that shows you the files you have uploaded and also allows you to download them. There is a Dockerfile in the directory in case you want to build the container from source instead of using my container from Docker Hub. - The
k8s-yaml
folder contains sub-folders that contain definitions for deploying the app into the cluster. It includes a deployment with 3 replicas and anti-affinity rules that will spread the pods out across the 3 nodes, a persistent volume claim that will leverage the NSF provisioner to create a RWX volume, and a Service definition that will deploy a Nodebalancer so you can access the Flask app on a public IP on port 80.
The goal of this repo is to give you a basic working example of an app you can use on a RWX volume within LKE with minimal configuration, simply by running a few commands detailed in this document.
Minimum Configuration Requirements
Here are the minimum configuration requirements to deploy the repo:
- You will need to know your username within Linode Cloud Manager that you have SSH keys already setup against. You can find your username by going to your profile page. Be sure that you also have at least one SSH key setup on your account as well.
- You will need to know your public IPv4 address that you will be accessing the nodes from. You can find your IP address by using Akamai’s User Agent Tool.
- You will need to specify a valid Kubernetes version based on what is supported by the platform. As of this writing, 1.30 is the current / latest version supported for LKE. The simplest way to determine the latest version supported is to go into Cloud Manager click the “Create Cluster” button within the Kubernetes panel. It will default to the latest version as shown below.
You can supply your configuration at runtime when you execute terraform apply or you can create a terraform.tfvars file with those values in the terraform directory. Here is an example the format of the Terraform configuration file (note the trailing /32 for the IP address):
NOTE: There are some variables within variables.tf that can be changed from default values, but it’s not necessary for the purposes of the demo. The only required config elements that do not have defaults are as stated above.
Deploying the Demo Solution
Step 1 — Deploy the Infrastructure With Terraform
To deploy the solution the first step is to download the GitHub repo, navigate to the terraform directory, and then run terraform apply
. If you configured a terraform.tfvars
file like demonstrated above all you need to do is approve the deployment. Otherwise it will ask you to supply the required variables.
Step 2 — Configure LKE Cluster with NFS Subdir Provisioner
Once your infrastructure is deployed you can download your kubeconfig file from Cloud Manager and verify that you can access the cluster. The example cluster I deployed for this article is shown below.
[pnordstrom]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
lke213891-308979-1110eb390000 Ready <none> 43s v1.30.3
lke213891-308979-2027dae00000 Ready <none> 39s v1.30.3
lke213891-308979-2e120f590000 Ready <none> 49s v1.30.3
Once you have verified connectivity, I followed the README here for installing the NFS Subdir Provisioner. The one thing you will need to change is the nfs.server IP address. You can fetch the address by going into Cloud Manager, find the server labeled test-lke-rwx-nfs-server
and check the Networking tab to get the private IPv4 address (see example below)
Once you have the IP address you can run the commands below in your command terminal where you have access to kubectl and helm. See example commands and final output below.
[pnordstrom]$ helm repo add nfs-subdir-external-provisioner <https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/>
[pnordstrom]$ helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \\
--set nfs.server=192.168.156.137 \\
--set nfs.path=/nfs/lke
NAME: nfs-subdir-external-provisioner
LAST DEPLOYED: Fri Aug 16 16:26:56 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
Step 3 — Deploy the Flask App
You can now navigate to the k8s-yaml/flask-app-nfs-storage-class
app within the repo and apply the deployment.yaml
. The command and example output is shown below.
[pnorstrom]$ kubectl apply -f deployment.yaml
deployment.apps/flask-file-app created
persistentvolumeclaim/nfs-rwx created
service/flask-app-service created
You can now validate that everything you’ve deployed is running by running the command below and noting that the output should look similar to the below.
[pnordstrom]$ kubectl get pods
NAME READY STATUS RESTARTS AGE
flask-file-app-74b57f78cf-b5rvq 1/1 Running 0 37s
flask-file-app-74b57f78cf-jm6sk 1/1 Running 0 37s
flask-file-app-74b57f78cf-rknbl 1/1 Running 0 37s
nfs-subdir-external-provisioner-5967d8fff5-xdxs4 1/1 Running 0 2m40s
Testing and Validating the Deployment
The goal of this deployment is to validate that all of our replicas can access the shared storage with the ability to read and write to it. Now that the application is deployed, we can access it through the public IP. We can get the IP by running the command below and grabbing the external IP for the flask-app-service
[pnordstrom]$ kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
flask-app-service LoadBalancer 10.128.86.186 172.233.210.98 80:30753/TCP 74s
kubernetes ClusterIP 10.128.0.1 <none> 443/TCP 8m23s
When we load the IP in the browser we will see a simple form that allows us to select a local file and upload it. The one thing to notice as you use this app is that it informs you which application instance is responding to your requests. You can refresh the page several times and see that your requests are round robin distributed to different instances residing on different LKE nodes.
The last step to validate the setup is working as expected is to upload some files, view them, then download them from the web app. You can choose images you may have locally or download some placeholder images as shown below.
[pnordstrom]$ curl -o 800x600.png https://via.placeholder.com/800x600.png
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 6663 100 6663 0 0 11284 0 --:--:-- --:--:-- --:--:-- 11293
[pnordstrom]$ curl -o 1280x720.png https://via.placeholder.com/1280x720.png
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 8801 100 8801 0 0 16018 0 --:--:-- --:--:-- --:--:-- 16001
Once I upload these images I can navigate to the directory page and see them listed. I can also refresh this page and see it served from the different LKE nodes. When you click on the links the files will be downloaded which you can verify as well.
Conclusions, Potential Improvements, and Next Steps
This setup has demonstrated a basic method that allows you to create RWX persistent volume claims within LKE using an external NFS and a storage class implementation designed to provide you with the capability to provision dynamic volumes on top of the shared storage. Here are some areas for potential improvement with this deployment:
- There are a number of hardening and security configurations that can be applied within your cloud-init script that weren’t included in this demo.
- This example does not deploy a highly available NFS server. For a production use case you would want to leverage a cluster topology with replication and possibly volumes on the compute nodes itself backed by something like LVM in order to expand storage, create incremental snapshots for offsite backup, etc.
- Performance sensitive applications that require high IOPS may not be suitable for RWX volumes deployed in this example. Be sure to understand your requirements and test thoroughly before considering a RWX implementation using an NFS server as shown in this example.
Thanks for reading, and I hope at a minimum this article has inspired you to explore a creative solution to solve your next challenge in whatever that may be!