Umut Uzgur

Nočnica Mellifera

Checkly is a key part of a professional developer’s workflow, making it easy to know if your service is up or down, and measure performance. As we integrate with almost any development workflow, we also have Prometheus endpoints to let you use the popular Grafana stack to keep track of your site checks’ status. As large enterprise users grew in usage, their check performance data grew in parallel, and our endpoint started returning occasional 429 status codes. This ‘conflict’ error code indicated multiple attempts to access a single resource, and we even had a user in our slack community comment to us: “if you’re getting 429s, put a cache on there!”

This is the story of how we optimized the process of sending prometheus responses, and ended up adding compression at a surprising layer of our stack (tl;dr at the Node compute layer)

Prometheus response metrics can be pretty big! Intended to give time series data and events, Prometheus requests for information on large time chunks with lots of events could get huge, we had users that produced 10mb outputs, and that’s going to be tough for any naive system to handle at a high rate. Further, with many large responses, we were worried about using a huge amount of storage space in any cache. We’re running on an affordable Heroku plan for Redis, and didn’t want to massively upgrade our Redis plan. We introduced rate limiting, but that was both a temporary solution, and not ideal from a business perspective.

Just an idea: If you’re introducing rate limiting, or reducing the qualities of large requests (for example by introducing paging and requiring multiple follow-up requests), this can seem like a smart move because you’re discouraging your most expensive requests that cost you the most to fulfill. However it’s a good idea to look into the identity of these requestors because often the users making the most expensive requests are also your most valuable customers. You don’t want to be in the position of treating your most valuable users worse than your average user!

By any road, rate limiting wasn’t a permanent solution, and we didn’t want engineers who were working to integrate Checkly to have to find hacks for our restrictive rate limits.

Prometheus responses are a great candidate for compression

With multiple labels, tags, and transaction names repeated several times in any response, the large amount of repeated text in a Prometheus response meant that compression should be not only possible but highly effective. If we implemented compression along with a cache, we could deliver better performance to our users without breaking the bank for the costs of caching.

From some previous experience in the AWS Lambda environment, we liked the NodeJS implementation of the Brotli compression algorithm. Brotli works great with web artifacts, so we were sure it could work here. After testing Brotli.js on a local build, it reduced the content size by 95%, and was super fast. Hooray problem solved! That was, we thought it was solved until we did our first deploy to production.

Why isn’t compression working on Heroku?

As we started to preview our new compression and caching structure, we noticed a problem: on Heroku, our response times started increasing significantly. The 95th percentile of responses was over 10s, longer than we could expect users to wait. As mentioned above, this group of responses was likely going to our biggest, best customers so it wasn’t going to work to make them wait more than 10 seconds. We had expected that compression running on a standard Heroku dyno might be a bit slower than on our laptops, but the compression time was jumping from 50ms to 10s.

Just an idea: In modern containerization, you might be surprised by some of the environments running your code in production. These containers are optimized to do a few tasks very well, and might well have lower specs than the laptop currently heating your legs. This is why development testing, unit testing, and any other simulation tool can only take you so far, you must observe your code running in a real production environment. That’s why we made Checkly, to observe your service in the real world.

Sure enough, Heroku dynos, mainly used as access points for datastores with plenty of memory and bandwidth but little compute, were struggling to run our Brotli.js compression. Part of our goal was not to significantly increase infra costs, so vertical scaling of our dynos wasn’t our first answer. How could we get the benefits of compression without geometric increases in compute time?

I recommend that everyone implement some compression at least once. With modern computer science, there’s no chance you’ll improve on the experts, but it will help you understand fundamentally what compression algorithms really do, how they can trade quality for size, or more significantly in this context, how they can trade CPU for bandwidth. Since we’d used Brotli before, I was curious how Brotli.js was configured in Node.js. To my surprise, the default was the highest Brotli compression setting, appropriately enough labelled as level 11. That meant it was using the maximum amount of compute power to get the most possible compression.

At level 11, our response time for the highest 5% of requests shot through the roof

We wanted compression, but there was no need to max out our computer usage to get it. All we had to do was optimize the parameters and find a sweet spot for compression. Thankfully Matt Bullock over at Cloudflare has answered this exact question in a blog post. He found that level 4 gave great performance without significant compute use. After some testing, we settled on a quality level of 5, and you can see here the spike around using level 11, and how level 5 brought things back down to normal response times.

It’s quite difficult to differentiate the CPU usage on the left (no compression) from those on the right (compression set down to quality level 5)

The final size of responses using Brotli.js

Our redis usage only increased by 2MB by the end of this process! As mentioned above, Prometheus is perfect for compression, as it’s extremely verbose and keywords for check runs are repetitive. A 2.2MB response was only 52KB at the end.

Before starting the compression process we weren’t using caching at all, and adopted caching at the same time. Our overall cache usage barely increased by adopting compression at the same time. Our users were happy, no longer getting errors when accessing our Prometheus endpoint, and our leadership was happy that we didn’t need to increase our infrastructure budget.

There were some interesting takeaways from this process, which resulted in a configuration that most people wouldn’t try by default (with compression happening at the compute layer and in Node.js):

You must always observe your code working in production to know if it’s solving users’ problems — we were surprised by the fact that our dynos had less compute than a laptop, though it made sense in retrospect, and no amount of pre-deploy testing on our local environment would have found the problem.

The fastest solution to implement isn’t always the best — Naive caching would have had real infrastructure cost impact, and the same optimization work would have only been put off for a later day.

If only some users are affected by an issue, try to figure out what makes them different — It’s a very different thing to say “a small percentage of our users” when that small percentage are your heaviest users representing most of your ARR!

Our biggest clients no longer have to deal with rate limits on our Prometheus endpoints since we’re handling them so much more efficiently.

DOES Cache Rule Everything Around Me? — Using Compression for our Prometheus Cache

Detect

Communicate

Resolve

Monitoring as Code

Developers

Resources

Community

DOES Cache Rule Everything Around Me? — Using Compression for our Prometheus Cache