(Updated: )

Software Deployment Best Practices for Modern Engineering Teams

Share on social

Table of contents

Adopting best practices for software deployment is essential to maintaining a high standard of quality, minimizing downtime, and ensuring that your applications meet user expectations. Here are five best practices to help you deploy your software more securely and reliably.

Consider a Canary Deployment Strategy

A canary deployment strategy rolls out new code to a small, controlled subset of users before a full-scale release. This minimizes risk by limiting the exposure of untested changes, allowing you to monitor performance on a smaller scale. If issues arise, they can be caught early and addressed before they impact your entire user base, ensuring a more controlled and safer deployment.

To implement a canary deployment, segment your user base and gradually increase the rollout based on factors like geography or user demographics. Monitor the canary group closely. If everything runs smoothly, you can proceed with confidence. If not, pause, fix the issues, and re-test. This approach allows you to mitigate risks effectively and avoid widespread disruptions.

The recent CrowdStrike incident underscores the importance of this strategy. A widespread outage could have been avoided with a canary deployment, where potential issues would have been detected early within a smaller group of users. By validating your code in a real-world environment on a limited scale, you safeguard the stability and reliability of your application, refining your deployment process and boosting overall confidence.

Maintain a Robust Rollback Plan

A robust rollback plan is essential for any deployment strategy. No matter how thorough your testing and planning, issues can still arise. A rollback plan allows you to quickly revert to a stable version if something goes wrong, minimizing downtime and preserving user trust.

Key elements of a solid rollback plan include version control for easy reversion and a deployment process that supports rapid rollbacks. Automation plays a crucial role here, enabling you to execute a rollback swiftly with minimal manual intervention, often with just a click.

Clear communication protocols are also vital. Every team member should know the rollback procedures and their specific role in the process. Regular drills ensure everyone is prepared and the plan can be executed smoothly under pressure. A well-maintained rollback plan not only safeguards your application but also reinforces your commitment to reliability and user satisfaction.

Create Focused Runbooks

While comprehensive documentation might seem like a good idea, it often becomes outdated and cumbersome, with large portions rarely, if ever, used. Instead, focus on creating targeted, up-to-date runbooks that provide clear, actionable instructions for handling critical situations.

Your on-call team needs precise guidance when an alert triggers in the middle of the night—not pages of unnecessary details. A well-crafted runbook should be their go-to resource, offering step-by-step instructions for managing emergencies, such as rolling back a deployment or addressing a critical system failure.

By streamlining your documentation to focus on the most crucial aspects of your processes, you ensure that your team has the tools they need when it matters most. These focused runbooks are easier to maintain, ensuring they stay relevant and accurate as your systems evolve.

This approach reduces the burden of maintaining extensive documentation and enhances your team’s ability to respond effectively during high-pressure situations. Prioritizing runbooks over exhaustive documentation makes your deployment process more agile, reliable, and resilient.

Make Monitoring Part of Your CI/CD

Integrating monitoring directly into your CI/CD pipeline is essential for ensuring your applications are healthy and performing as expected from the moment they are deployed. Monitoring should be a core component of your deployment process, not an afterthought. By embedding it into your workflows, you can catch issues early, often before they reach production, helping to maintain the stability and reliability of your applications.

This integration allows you to automatically validate key functionalities with every deployment. If issues arise during these checks, they can be addressed immediately, reducing the risk of bugs or performance problems in your live environment. This not only improves software quality but also accelerates the feedback loop, enabling your team to respond to issues more quickly.

Furthermore, monitoring within your CI/CD pipeline supports continuous improvement. By analyzing deployment data, you can identify patterns, diagnose root causes, and refine your monitoring strategies over time. This proactive approach ensures your monitoring adapts to the evolving needs of your applications, leading to more reliable deployments and reduced downtime.

Meet Monitoring as Code

The best way to implement monitoring into your CI/CD is by adopting Monitoring as Code.

Monitoring as Code is the practice of defining and managing monitoring configurations and policies through version-controlled code. It enables automated, consistent, and scalable monitoring across environments.

Some benefits of Monitoring as Code include:

  1. Consistency and Reproducibility: Monitoring configurations are versioned alongside the application code, ensuring that changes are tracked and can be rolled back if necessary.
  2. Collaboration: Engineers and operations teams can collaborate more effectively, as monitoring configurations are treated as code and reviewed through pull requests.
  3. Automation: Automated checks and alerts can be integrated into CI/CD pipelines, providing real-time feedback on the application's health and performance before it reaches production.

With Checkly’s CLI, you can integrate Monitoring as Code into your CI/CD pipeline in minutes.

The Checkly CLI gives you a JavaScript/TypeScript-native workflow for building and maintaining monitors at scale, from your code repository. There are two core commands: test to, respectively, run your monitoring checks as tests in CI or on your local machine, and deploy to push your resources to the Checkly cloud and run them around the clock.

We’re going to pretend we are working on adding a feature to a web application that also requires some updates to our API backend. We will assume we already bootstrapped our repository with a Checkly CLI project using:

npm create checkly

This command sets up all the basics to kickstart your MaC workflow in your repo.

In your project directory, you will find a folder named “__checks__” containing the following check templates:

|__checks__
    |- api.check.ts
    |- heartbeat.check.ts
    |- homepage.spec.ts

Once this setup is complete, log in to your Checkly account via the CLI using the following command:

npx checkly login

You can choose to log in from the browser or in your terminal. After logging in, you'll be able to update Checkly Checks from your local machine as long as you're connected to the internet.

Write Your First Monitoring Script

In your development environment, write JavaScript/TypeScript tests for your code updates, similar to unit tests. We typically use the Playwright testing framework in the .spec.ts or .check.ts file.

Consider a scenario where you want to monitor the title of the Checkly documentation and take a screenshot of the page. To do this, replace the code in the homepage.spec.ts with the following:

import { test, expect } from '@playwright/test';

test('Checkly Docs', async ({ page }) => {
  const response = await page.goto('https://www.checklyhq.com/docs/browser-checks/');
  
  // Ensure the page is loaded successfully
  expect(response?.status()).toBeLessThan(400);

  // Check if the page title is as expected
  const pageTitle = await page.title();
  const expectedTitle = 'Introduction to Checkly | Checkly';
  expect(pageTitle).toBe(expectedTitle);

  // Optionally, you can take a screenshot if needed
  await page.screenshot({ path: 'homepage.jpg' });
});

This test uses the page.goto method to navigate to the specified URL (https://www.checklyhq.com/docs/browser-checks/). The method returns a response object, which is stored in the response variable.

Then we use the expect function to assert that the HTTP status code of the response is less than 400. This is a way to ensure that the page is loaded successfully without any HTTP errors.

page.title() retrieves the title of the page and compares it with the expected title ('Introduction to Checkly | Checkly') using the expect function. This ensures that the page title matches the expected value.

Finally, we take a screenshot of the page and save it as 'homepage.jpg'.

Now, we could directly deploy this monitor, or, we could add another layer of security to make sure it’s reliable. Move on to the next step for more.

Run Your E2E Tests Locally

With Monitoring as Code in place, one significant advantage is the ability to run your end-to-end (E2E) tests locally.

Running your E2E tests locally is crucial for ensuring the reliability and accuracy of your application before it reaches production. By testing locally, you can identify and resolve potential issues early in the development process, such as broken user flows, faulty integrations, or performance bottlenecks. This proactive approach allows you to refine your application in a controlled environment, ensuring it functions as intended when deployed.

Additionally, local E2E testing helps you validate that your application is aligned with your current infrastructure and meets business requirements. By simulating real-world scenarios on your local machine, you can catch discrepancies and edge cases that might otherwise be overlooked. This thorough testing process leads to a more stable and resilient application, reducing the risk of downtime and improving overall performance once the application goes live.

Running E2E tests locally not only enhances your development workflow but also provides a higher level of confidence in your application’s quality and readiness for production.

Debugging Test Results

However, once you start testing, it can be challenging to efficiently debug and monitor test results. Traditional methods often involve manually parsing through extensive logs, which can be time-consuming and prone to errors, especially when trying to identify the root cause of a failure in complex systems. This lack of streamlined, accessible debugging information can slow down development cycles and make it difficult to maintain high-quality software.

Checkly's test sessions provide detailed test results. By using the record flag with the Checkly test command, you can generate URLs that provide comprehensive summaries of each test session. This feature is particularly beneficial in CI/CD pipelines, where parsing logs can be cumbersome. Instead, it offers a user-friendly UI that displays all necessary debugging information.

screenshot of Checkly's UI showing test session results

In the test session overview, you can access all recorded sessions, view details such as the test execution locations, associated git information, and even access trace files for in-depth analysis.

Moreover, you can find a specific set of test sessions based on various criteria, such as deployments related to the product you're working on or the team you belong to. This is crucial when trying to understand why a particular check, such as an API or Playwright check, failed.

The test sessions overview in the Checkly UI ensures that all test results are easily accessible, stored securely, and available for future reference, making it an essential tool for you as a developer or DevOps professional aiming to maintain robust monitoring and testing processes.

Test Your Monitors Directly from Your CI/CD

You can use the Checkly CLI to record your tests. This workflow uses the best practices from standard testing frameworks like Playwright and Jest and extends them so you can deploy your checks to Checkly’s global infrastructure and run them as monitors.

As mentioned, the CLI gives you two powerful commands: test and deploy.

After setting up your first checks inside your repo, you can run them using the test command,

npx checkly test --record

This runs your checks on our global platform, reports the results in your terminal and records a test session.

Running 5 checks in eu-west-1.

src/__checks__/group.check.ts
  ✔ Homepage - fetch stats (43ms)
src/__checks__/home.check.ts
404 page (7s)
Homepage (7s)
src/services/api/api.check.ts
  ✔ Homepage - fetch stats (50ms)
src/services/docs/__checks__/docs-search.spec.ts
  ✔ docs-search.spec.ts (11s)

5 passed, 5 total

Now, after validating your checks are correct, you deploy your checks to Checkly, turning them into monitors. You can add alert channels like email, Slack, Pagerduty etc. to alert you when things break.

npx checkly deploy

Once the deployment is complete, you'll see a success message in your terminal, indicating that the project has been deployed to your Checkly account.

Find out more information about testing with the Checkly CLI in our docs.

Start Detecting Issues Faster

Yes, these are some global best practices for software deployment, but it's crucial to remember that every organization is unique. Your specific infrastructure, team dynamics, and business goals will influence how you implement these strategies.

Tailor these practices to fit your particular setup, continuously refining them as your organization grows and evolves. Stay adaptable, learn from each deployment, and integrate tools and processes that align with your workflow.

Ready to take the next step?

Start by exploring how the Checkly CLI can integrate monitoring into your CI/CD pipeline, helping you catch issues early and maintain a high standard of quality throughout your deployment process. Install the Checkly CLI and customize your approach to fit your organization’s unique needs.

Share on social