Exploration of Akamai Cloud Pulse Metrics and Observability for Managed Databases
Overview
One of the aspects of my role at Akamai is helping customers understand the capabilities of platform features. If you are like me, it’s one thing to read about a product feature or capability but another thing to dive in and get your hands dirty testing it out yourself. In my experience (and I’m confident that many technologists also agree) there’s no substitute for running a test or experiment to see for yourself how something works in action. No matter how simple something looks on the surface or how much documentation you consume you always learn something new, and often unexpected, when you work with something in a hands-on fashion. Akamai has recently launched a new managed database solution powered by Aiven that supports MySQL and PostgreSQL (with more database engines coming soon), and in tandem, has also released the initial version of our managed metrics and monitoring solution called Akamai Cloud Pulse (currently in beta as of this writing), which provides observability on managed databases that you deploy on your account (more cloud primitives will be added to the monitoring solution in the future).
One of the challenges of trying to explore a monitoring solution and testing its capabilities is that you need to be able to generate activity and traffic on the system you are monitoring to see the results (otherwise your visualizations are flat and uninteresting). Additionally, if you want to test alerting capabilities you also need a way to generate load on the system. Fortunately there are existing tools that do this for databases. We’ll be exploring a PostgresSQL deployment and generating load on it using pgbench. That said, the next challenge when it comes to experimenting is setting up the environment that enables you to execute your tests. Most often this involves setting up compute, storage, networking, applications, libraries, scripts, etc. In this article we’ll explore a GitHub repo that can provide you a starting point for getting up and running quickly with testing our managed database solution and exploring metrics within Akamai Cloud Pulse.
Deployment Scenario
This article will provide a baseline toolset that will allow you to explore the features and capabilities of Akamai’s Cloud Pulse monitoring solution by deploying a compute instance that can generate traffic against and managed PostgreSQL database. The diagram below provides an overview of the elements demonstrated in this exercise.
As part of demonstrating this we will also expose you to a number of integration tools and capabilities with Akamai Cloud that you may or may not be familiar with which includes:
- Linode Terraform Provider
- Deploying a One-Click Marketplace App
- Metadata Service and User Data with Cloud-Init
Please note that the repository provided and execution has plenty of room for improvements and security hardening depending on the context in which it is being used. It is meant to provide an inspirational example and thus it may not be suitable for a production deployment without additional hardening, logging, error handling, etc. so proceed with this in mind.
Pre-Requisites Before Deploying
Here is a checklist of things you will need in order to complete the exercise in this article:
- Clone the GitHub repository here — https://github.com/patnordstrom/aclp-dbaas-pg-test-harness
- You will need a Linode account. There’s a link on this page to register and get free credits to start — https://techdocs.akamai.com/cloud-computing/docs/getting-started
- You will need to have Terraform installed and configured with a Personal Access Token from your Linode account to deploy the infrastructure. You can follow the Linode Provider getting started guide on how to configure Terraform with your Personal Access Token. For my setup I have a configuration file on my local computer at
~/.config/linode
with my personal access token. - You will need to have a PostgreSQL managed database deployed on your account. For my test I deployed a PostgreSQL v17 three-node cluster on Shared CPU (Linode 2GB Plan). This test will work with a 1-node plan as well. NOTE: as of this writing our Terraform provider is in the process of being updated to support deploying managed databases. Otherwise I would have included the ability to deploy the managed database within the Terraform code.
- You should have an SSH key created on your cloud account for accessing the nodes after they are deployed. You will provide your username to the script as part of the deployment but will need the SSH keys setup before deploying.
Exploring the GitHub Repo
Before we deploy, I wanted to give a quick overview of the repo. The repo contains 3 folders:
- The
terraform
folder contains the code that will deploy a compute instance and download the scripts that will be used for testing. One of the unique aspects of this deployment code is that it makes use of our Marketplace One-Click Apps (OCAs) as well as cloud-init for additional configuration. Port 22 is exposed via the cloud firewall from your IP address (which you can configure) as part of the Terraform variables so that you can SSH into the server and run the scripts that are provisioned. - The
cloud-init
folder contains User Data Scripts that are used to configure the compute instance during deployment. - The
scripts
folder contains a docker-compose manifest for running pgbench and an accompanying wrapper shell script that orchestrates the docker-compose commands and initializes the environment variables used.
The goal of this repo is to give you a basic working test harness so that you can start generating load on your managed Postgres with minimal configuration, simply by running a few commands detailed in this document.
Minimum Initial Configuration Requirements
Here are the minimum configuration requirements to deploy the repo:
- You will need to know your username within Linode Cloud Manager that you have SSH keys already setup against. You can find your username by going to your profile page. Be sure that you also have at least one SSH key setup on your account as well.
- You will need to know your public IPv4 address that you will be accessing the nodes from. You can find your IP address by using Akamai’s User Agent Tool.
- You will need to specify the region where you want the test harness to de deployed. You will want to use the same region that your database is deployed as pgbench works best when latency is minimized between the server generating the load and the location of the database.
You can supply your configuration at runtime when you execute terraform apply
or you can create a terraform.tfvars
file with those values in the terraform
directory. Here is an example the format of the Terraform configuration file (note the trailing /32
for the IP address):
NOTE: There are some variables within variables.tf
that can be changed from default values, but it’s not necessary for the purposes of the demo. The only required config elements that do not have defaults are as stated above.
Once you deploy the compute instance and access it via SSH there is one final configuration step but this will be covered in the deployment instructions below.
Deploying the Demo Solution
Step 1 — Deploy the Infrastructure With Terraform
To deploy the solution the first step is to download the GitHub repo, make sure you’ve configured all the pre-requisites as specified in the sections above, navigate to the terraform directory, initialize the directory via terraform init
, and then run terraform apply
. If you configured a terraform.tfvars
file like demonstrated above all you need to do is approve the deployment. Otherwise it will ask you to supply the required variables.
Step 2 — Validate Compute Instance Deployment and Configure the pgbench Script
ℹ️ Before we get to the details of the this step I wanted to take a moment and highlight a specific element of this infrastructure-as-code (IaC) configuration. As I mentioned earlier in this article, we are leveraging the Linode Marketplace One-Click App platform component as part of this deployment. A natural question would be: “what does this mean and why is this useful”? A quick bullet-pointed summary would be:
- Marketplace apps are essentially automation templates powered by StackScripts and the Linode Ansible Collection which deploy a server on initial startup with software, configurations, and security hardening elements. All the technical details and repos for our marketplace apps can be found on the Akamai Compute Maketplace GitHub repo.
- Linode users can deploy marketplace apps from the Cloud Manager UI or by specifying the StackScript ID associated to the app within API calls or IaC automation tools. This is what we did for the Terraform code used in this article (see main.tf and variables.tf)
- By using the Docker Marketplace App as part of our deployment we are able to incorporate IaC that is already written, tested, and deploys additional security hardening configurations to our compute instance. For instance, with the example used in this article we are taking advantage of configuration options that will create a non-root user as well as disabling root user access over SSH to the compute instance.
Now that we understand that we are deploying a marketplace app and that it will execute configurations using Ansible, before we can use our instance we want to be sure that the IaC automation has finished. There are a couple of ways we can observe this, but one of the simplest ways is to open the LISH Console. You can do this by clicking “Launch LISH Console” in the upper right of your compute instance within Cloud Manager. If you open the console right after your Terraform run has finished you will probably see something like the below.
The above is the standard output of Ansible running the playbook, and in this case it is configuring the machine with Docker and the security settings we discussed above. When the configurations have finished you will see a final message that reads “Installation Complete” and a prompt to login.
LISH Console uses “root” to login, but our Terraform script generates a random password so we will be logging in over SSH using our private key. You can close this window and proceed to SSH into the compute instance using the authorized_user
that you specified in your Terraform variable configuration in Step 1. If the IP of your compute instance is 100.20.30.40 and your username is spongebob
then you would SSH by doing the following:
$ ssh -i ~/.ssh/yoursshkey spongebob@100.20.30.40
Once you login to the server you can find the scripts for testing at /test-artifacts/aclp-dbaas-pg-test-harness-main/scripts
. Within that directory is a vars.sh
file that is setup for the most part to be used with minimal modification. Specifically you will want to edit this file so that the database connection parameters match what is setup on your managed DB. There are default values that should already map closely to what is configured on the Akamai platform by default. You can fill in the additional values as shown below so that your vars.sh
contains the values for your DB host and password. NOTE: You can also remove these entries from this configuration file and export them directly in your shell session as well if you like.
The other parameters in the script will control aspects of how pgbench will run the load test. The settings I have here are a good starting point for generating data that we can visualize within Cloud Pulse. Once you save vars.sh you are almost ready to run the main script.
Step 3 — Configure ACL to Allow the pgbench Compute Instance to Communicate with the DB
When you initially deployed the managed DB you had the option to set allowed IPs that can communicate with your DB server. We need to enable this new compute instance to communicate with the DB. Go to the “Settings” tab and select “Manage Access” to add the public IPv4 and IPv6 global unicast address for your compute instance (you can use /32 for the IPv4 and /128 for the IPv6).
Testing and Validating the Deployment
In the SSH session for your “pg-test-harness” compute instance within the scripts
directory you can now run the main.sh
script with the pgbench
option. NOTE: In order to run the command with sudo
you will be asked to enter a password. When an instance is deployed via the marketplace and you’ve implemented a non-root user (as in this case) the default password is randomly generated and stored in ~/.credentials
. You can find the password there and use it as is or run the appropriate commands to change your password.
The script execution and expected output example is shown below.
$ sudo ./main.sh pgbench
Creating network "scripts_default" with the default driver
Creating scripts_pgbench-init_run ... done
dropping old tables...
creating tables...
generating data (client-side)...
vacuuming...
creating primary keys...
done in 2.74 s (drop tables 0.03 s, create tables 0.00 s, client-side generate 1.98 s, vacuum 0.31 s, primary keys 0.41 s).
Creating scripts_pgbench_run ... done
pgbench (17.2 (Debian 17.2-1.pgdg120+1))
starting vacuum...end.
progress: 1.0 s, 40.0 tps, lat 3.123 ms stddev 1.128, 0 failed, lag 0.147 ms
progress: 2.0 s, 54.0 tps, lat 3.204 ms stddev 1.310, 0 failed, lag 0.075 ms
progress: 3.0 s, 53.0 tps, lat 3.301 ms stddev 1.281, 0 failed, lag 0.088 ms
progress: 4.0 s, 53.0 tps, lat 2.786 ms stddev 0.698, 0 failed, lag 0.129 ms
progress: 5.0 s, 55.0 tps, lat 2.549 ms stddev 0.523, 0 failed, lag 0.082 ms
progress: 6.0 s, 40.0 tps, lat 2.474 ms stddev 0.529, 0 failed, lag 0.076 ms
... (truncated for brevity) ...
progress: 299.0 s, 43.0 tps, lat 3.926 ms stddev 1.707, 0 failed, lag 0.089 ms
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
duration: 300 s
number of transactions actually processed: 15155
number of failed transactions: 0 (0.000%)
latency average = 2.997 ms
latency stddev = 0.818 ms
rate limit schedule lag: avg 0.103 (max 9.560) ms
initial connection time = 124.037 ms
tps = 50.537861 (without initial connection time)
The script will output metrics every second until it has finished. Once it has finished you should be able to explore the Cloud Pulse dashboards to see it’s effects.
Exploring the Akamai Cloud Pulse Dashboards in Cloud Manager
The image below provides a summary of some of the features and functionality of the monitoring dashboards, aggregation functions and time windows we can use to explore the data, and the types of information that we can observe based on standard operational activity (e.g. backups) as well as the synthetic testing we have performed. In your example you can view data in a short time window to see the affects of the pgbench test that you run, but in the case below I have been experimenting over the course of a week so am showing a longer time frame in this case.
You can find a more detailed list of features and capabilities within our product documentation if you want to explore further.
Conclusions, Potential Improvements, and Next Steps
This setup has demonstrated a starting point for exploring our managed database solution and it’s associated managed monitoring capabilities within Akamai Cloud Pulse. Here are some areas for potential improvement or further areas to explore with this deployment example:
- We did not explore testing pgbench on the secondary (e.g. read-only) database servers within the cluster. You could update the parameters of the script to execute read-only queries and the connection parameters to test out the secondary cluster members to see metrics in the dashboard as well.
- We did not explore the alerting capabilities of the managed monitoring solution. You could tune the pgbench parameters and/or deploy a smaller database cluster with less resources if you wanted to test the alerting thresholds and functionality.
- The
main.sh
script doesn’t do much other than orchestrate the initialization of environment variables and orchestrate the docker commands. There’s an opportunity to extend this harness and test other aspects such as database point-in-time-recovery capabilities such as adding conditional logic within this script that can be called by a cron task to insert records regularly or do other things that generate load and traffic on the system. The script was designed so that it could be easily extended in this manner.
Thanks for reading, and I hope at a minimum this article has inspired you to explore a creative solution to solve your next challenge in whatever that may be!