The trend of declaring infrastructure as code has been picking up steam over the last few years, offering a way for DevOps teams to transparently manage and scale cloud infrastructure. Why should the way we manage monitoring be any different? In this article, we address this point and illustrate it with a practical example of Monitoring-as-Code on Checkly.
The trend of declaring infrastructure as code has been picking up steam over the last few years, offering a way for DevOps teams to transparently manage, monitor and scale cloud infrastructure. Why should the way we manage monitoring be any different? In this article, we address this point and illustrate it with a practical example of Monitoring-as-Code on Checkly.
Monitoring as Code is a transformative approach to maintaining the reliability and performance of modern web applications by integrating monitoring processes directly into the software development lifecycle. This method is fully compatible with Infrastructure as Code (IaC) and configuration management tools, allowing your monitoring solution to be managed alongside other infrastructure components.
Seamless Integration: Checkly allows developers to define and manage their monitoring configurations alongside their application code. This means monitoring scripts and configurations can be version-controlled, reviewed, and tested just like any other part of the codebase.
CI/CD Pipeline Support: With Checkly, you can embed monitoring checks directly into your CI/CD pipelines. This ensures that every deployment is automatically verified against your monitoring criteria, reducing the risk of undetected issues reaching production and enabling hassle-free continuous delivery.
Infrastructure as Code (IaC) Compatibility: Checkly’s API and CLI tools make it easy to programmatically set up and manage monitoring. This aligns with the principles of IaC, allowing teams to define their monitoring infrastructure using code, ensuring consistency and reproducibility across environments.
Programmable Monitoring Checks: Define advanced monitoring checks using JavaScript and TypeScript. This flexibility enables you to create highly specific and complex checks tailored to your application’s unique requirements.
Let’s dive deeper into Checkly’s Monitoring as Code solution powered by the Checkly CLI.
We’ve released our brand new CLI and recommend giving it a try for the best Monitoring as Code experience.
Please visit our documentation if you want to learn more about the pros & cons of the Terraform provider vs. the CLI for code monitoring automation.
Historically, IT infrastructure has been provisioned manually, both on-premise and in the cloud. This presented several challenges, including fragmented workflows, lack of transparency, and scalability issues. In response to these problems, the last few years have seen a shift to the Infrastructure-as-Code (IaC) paradigm, in which large-scale systems are declared in configuration files, a method that code monitoring adopts to enhance operational efficiency.
A new generation of tools has emerged to serve this use case, the most notable example of which is HashiCorp Terraform. Terraform provides a CLI workflow that allows users to specify the desired final infrastructure setup, handling all the intermediate steps and processes needed to achieve it, embodying the principle of monitoring as code.
Terraform can provision infrastructure on many cloud vendors thanks to its provider ecosystem. Each provider maps to the vendor’s API, offering resources in a domain-specific language known as HCL, a cornerstone for both IaC and code monitoring operations.
Setting up monitoring tools and monitoring in general can present some of the same issues as provisioning infrastructure. This becomes apparent when scaling beyond the initial rollout or proof-of-concept phase, as monitoring as code helps manage the growing scope and maintenance needs efficiently.
Monitoring-as-Code learns from IaC and brings your monitoring config closer to your application and your development workflows. How? By having it also declared as code, much like you would do with any kind of IT infrastructure.
What does one gain when moving from a manual process to a Monitoring-as-Code approach? The main advantages are:
Users who have just started out will be familiar with creating checks, groups, alert channels and other resources through the Checkly UI. However, the official Terraform provider for Checkly allows for these elements to be declared as code, streamlining the provisioning and deployment of active monitoring setups, a practice central to code monitoring.
You can find the Checkly Terraform provider on the official Terraform registry.
Exploring code monitoring in practice, we set up a small monitoring configuration for our demo e-commerce website using Terraform and Playwright scripts, showcasing the efficiency, reliability and scalability of monitoring as code.
For our example we will be creating browser checks using Playwright scripts we have previously written as part of our Playwright guides.
Let’s start off by creating a brand new folder:
mkdir checkly-terraform-example && cd $_
To keep things easy, we create a subdirectory…
mkdir scripts
…and copy all our scripts from above into separate files, for example login.spec.js
.
Next up, we want to create our main.tf
file and include the basic configuration as follows:
We are ready to initialise our project and have the Checkly Terraform provider set up for us. That is achieved by running:
terraform init
After a few seconds, you should see a similar message to the following:
In the same file, right below our initial instructions, we can now add resources one after the other. They will be browser checks based on the Playwright scripts we previously stored in the scripts
directory. Here is what each resource could look like:
Now that our Terraform project has been initialised and we have added some resources, we can generate a Terraform plan by running terraform plan
.
Terraform will determine all the needed changes to be performed to replicate our monitoring software configuration on Checkly. In doing so, we will be asked for our Checkly API key, which we can find under our account settings as shown below. Not on Checkly yet? Register a free account and enjoy your free monthly checks!
We can expose this as an environment variable in order to spare developers from having to copy-paste it all the time: export TF_VAR_checkly_api_key=<YOUR_API_KEY>
.
We can now finally apply our changes with terraform apply
. We might be asked for one final confirmation in the command prompt, after which we will be greeted by the following confirmation message:
Logging in to our Checkly account, we will see the dashboard has been populated with data from our three checks, which will soon start executing on their set schedules.
Browser checks are now there to keep us informed on the status of our key website flows. What about our APIs, though? Whether they make up the foundation of our service or they are consumed directly by the customer, we need to ensure our endpoints are working as expected. This is easily achieved by setting up API check resources.
We can now once more run terraform plan
, followed by terraform apply
to see the new check on Checkly:
Now that we have our checks in place, we want to set up alerting to ensure we are informed as soon as a failure takes place. Alert channels can be declared as resources, just like the checks. Let’s add the following to our main.tf
file:
We are setting up things so that we are alerted when our check starts failing, as well as when it recovers. But we still need to decide which checks will subscribe to this channel, and therefore be able to trigger the alerts. This is done by adding the following inside the resource declaration of each check, e.g.:
Going through the usual terraform plan
and terraform apply
sequence will apply the changes on our Checkly account:
We are now fully up and running with our monitoring-as-code setup. Our checks will run on a schedule, informing us promptly if critical anything were to go wrong. Rapidly getting to know about failures in our API and key website flows will allow us to react fast and mitigate impact on our users, ensuring a better experience with our product.
You can find the complete setup described in this guide on our dedicated repository.
As our setup expands, we might want to deploy additional tools to make our lives easier. We could: