Skip to main content
This page describes a new V2 version of the Prometheus exporter. For information about the old Prometheus exporter, see the Prometheus V1 docs.
If you are using Prometheus for monitoring and the popular Grafana stack for dashboarding, you can expose Checkly’s core metrics on a dedicated, secured endpoint.

Activation

Activating this integration is simple.
  1. Navigate to the integrations tab on the account screen and click the ‘Create Prometheus endpoint’ button. Prometheus integration step 1
  2. We directly create an endpoint for you and provide its URL and the required Bearer token. Prometheus integration step 2
  3. Create a new job in your Prometheus prometheus.yml config and set up a scraping interval. The scrape interval should be above 60 seconds. Add the URL (divided into metrics_path, scheme and target) and bearer_token. Here is an example
# prometheus.yml
- job_name: 'checkly'
  scrape_interval: 60s
  metrics_path: '/accounts/<your account ID>/v2/prometheus/metrics'
  bearer_token: '<your bearer token>'
  scheme: https
  static_configs:
  - targets: ['api.checklyhq.com']
Now restart Prometheus and you should see metrics coming in.
The Prometheus metrics endpoint has a rate limit of 50 requests per minute. The responses from this endpoint are cached during 60 seconds. Any request made to this endpoint within 60 seconds of the initial request will receive the cached response. We recommend using a scrape interval of 60 seconds.

Check Metrics

The Prometheus exporter exposes several metrics you can use to monitor the status of your checks, as well as to inspect detailed information such as Web Vitals. The following metrics are available to monitor checks:
MetricTypeDescription
checkly_check_statusGaugeIndicates whether a given check is currently passing, degraded, or failing.
checkly_check_result_totalCounterThe number of passing, degraded, and failing check results.
checkly_browser_check_web_vitals_secondsHistogramThe Web Vitals timings.
checkly_browser_check_duration_secondsHistogramThe total check duration. This includes all pages visited and any waits.
checkly_browser_check_errorsHistogramThe errors encountered during a full browser session.
checkly_api_check_timing_secondsHistogramThe response time for the API request, as well as the duration of the different phases.
checkly_url_monitor_timing_secondsHistogramThe response time for the HTTP request, as well as the duration of the different phases.
checkly_tcp_monitor_timing_secondsHistogramThe response time for the TCP request, as well as the duration of the different phases.
checkly_tcp_check_timing_secondsHistogramDeprecated: use checkly_tcp_monitor_timing_seconds instead. Emits identical values.
checkly_icmp_monitor_timing_secondsHistogramThe timing phases of the ICMP monitor, including DNS resolution and latency measurements (avg, min, max).
checkly_icmp_monitor_packet_loss_ratioGaugeThe average packet loss ratio (0-1) for the ICMP monitor.
checkly_dns_monitor_timing_secondsHistogramThe total duration of the DNS monitor.
checkly_multistep_check_duration_secondsHistogramThe total check duration. This includes all requests done and any waits.
checkly_heartbeat_last_success_timestamp_secondsGaugeThe Unix timestamp of the last successful heartbeat ping. A value of 0 means no successful ping has ever been received.
checkly_heartbeat_expected_interval_secondsGaugeThe configured maximum interval between heartbeat pings, in seconds. Derived from the heartbeat monitor’s period and period unit settings.
checkly_time_to_ssl_expiry_secondsGaugeThe amount of time remaining before the SSL certificate of the monitored domain expires. See the SSL certificate expiration docs for more information on monitoring SSL certificates with checks.
The checkly_check_status and checkly_check_result_total metrics contain a status label with values passing, failing, and degraded. The checkly_check_status gauge is 1 when the check has the status indicated by the status label and is 0 otherwise. For example, if a check is passing the result will be:
checkly_check_status{name="Passing Browser Check",status="passing"} 1
checkly_check_status{name="Passing Browser Check",status="failing"} 0
checkly_check_status{name="Passing Browser Check",status="degraded"} 0
checkly_check_status can be useful for viewing the current status of a check, whereas checkly_check_result_total can be useful for calculating overall statistics. The metrics checkly_browser_check_web_vitals_seconds, checkly_browser_check_errors, checkly_api_check_timing_seconds, checkly_tcp_monitor_timing_seconds, checkly_icmp_monitor_timing_seconds, and checkly_dns_monitor_timing_seconds contain a type label. This label indicates the different Web Vitals, error types, and timing phases being measured. For checkly_tcp_monitor_timing_seconds, the type label has the following values: dns, connection, data, and total. For checkly_icmp_monitor_timing_seconds, the type label has the following values: dns (DNS resolution time), latency_avg, latency_min, and latency_max. For checkly_dns_monitor_timing_seconds, the type label has the value total. checkly_time_to_ssl_expiry_seconds contains a domain label giving the domain of the monitored SSL certificate. In addition, the check metrics all contain the following labels:
LabelDescription
nameThe name of the check.
check_idThe unique UUID of the check.
check_typeThe type of check: api, browser, multi_step, playwright, heartbeat, tcp, icmp, dns, url.
mutedWhether the check is muted, configured to not send alerts.
activatedWhether the check is activated. Deactivated checks aren’t be run.
groupThe name of the check group.
tagsThe tags of the check.
You can set key:value tags in your checks and groups and they will be exported as custom labels in Prometheus. For instance the tag env:production will be exposed as a custom label env="production". You can disable this by adding the query param disableTagParsing=true. Please note that Prometheus label names may only contain ASCII letters, numbers, as well as underscores (see the official docs). Tags containing other characters in the label name will be sanitized.
The counter and histogram metrics are reset every hour. These resets can be handled in Prometheus by using the rate or increase functions.

Check Run Location Label

To avoid creating a high volume of metrics, by default the metrics don’t include a label for the check run location. It is possible to enable this by adding the query param locationLabelEnabled=true to your API request. This will add a location label giving the location where the checks ran. Since check status and SSL days remaining is only tracked on a per-check basis rather than by location, checkly_check_status and checkly_time_to_ssl_expiry_seconds do not have the location label included. Heartbeat metrics (checkly_heartbeat_last_success_timestamp_seconds, checkly_heartbeat_expected_interval_seconds, and heartbeat entries in checkly_check_result_total) also do not have a location label, since heartbeat monitors receive pings rather than running from specific locations. All other check metrics will have the location label added. Here is an example for how to set this in your prometheus.yml config:
# prometheus.yml
- job_name: 'checkly'
  scrape_interval: 60s
  metrics_path: '/accounts/<your account ID>/v2/prometheus/metrics'
  bearer_token: '<your bearer token>'
  scheme: https
  static_configs:
  - targets: ['api.checklyhq.com']
  params:
    locationLabelEnabled: ['true']

PromQL Examples

This section contains a few PromQL queries that you can use to start working with the Prometheus data.

Currently failing checks

To graph whether checks are passing or failing, use the query:
checkly_check_status{status="passing"}
Passing checks will have the value 1 while failing and degraded checks will have the value 0. This can be used to build a Grafana table of currently failing checks.

Failure percentage

To calculate the percentage of check runs that failed in the last 24 hours, use:
increase(checkly_check_result_total{status="failing"}[24h]) / ignoring(status) sum without (status) (increase(checkly_check_result_total[24h]))

Histogram averages

The different histogram metrics can all be used to compute averages. For example, query the average web vitals times for a check using:
sum by(type) (rate(checkly_browser_check_web_vitals_seconds_sum{name="Check Name"}[30m])) / sum by(type) (rate(checkly_browser_check_web_vitals_seconds_count{name="Check Name"}[30m]))

Heartbeat Metrics

Heartbeat monitors are passive checks that wait for your scheduled jobs and automated processes to send a ping. The Prometheus exporter includes heartbeat checks in the standard checkly_check_status and checkly_check_result_total metrics, plus two heartbeat-specific gauges designed for dead man’s switch alerting.
  • checkly_heartbeat_last_success_timestamp_seconds — The Unix timestamp of the last successful ping, searched across the entire event history (no time window). A value of 0 means no successful ping has ever been received — this immediately triggers a dead man’s switch alert since time() - 0 produces a very large value.
  • checkly_heartbeat_expected_interval_seconds — The maximum expected time between pings, derived from the heartbeat monitor’s configured period and period unit (e.g., 5 minutes = 300 seconds). Use this as a threshold for alerting.

How heartbeats map to checkly_check_result_total

Heartbeat checks contribute to checkly_check_result_total using a 1-hour count window, the same as other check types. The status label values map to heartbeat event states as follows:
status labelHeartbeat states counted
passingRECEIVED, EARLY, GRACE, LATE — any ping that arrived
failingFAILING — no ping received before the grace period expired
degradedAlways 0 — heartbeat monitors do not have a degraded state

Dead Man’s Switch PromQL Examples

Alert when a heartbeat ping is overdue

Compare the current time against the last success timestamp. If the difference exceeds the expected interval, the job has missed its window:
time() - checkly_heartbeat_last_success_timestamp_seconds > checkly_heartbeat_expected_interval_seconds

Alert with a safety margin

Add a grace period multiplier (e.g., 1.5x) to avoid false alarms from minor delays:
time() - checkly_heartbeat_last_success_timestamp_seconds > 1.5 * checkly_heartbeat_expected_interval_seconds

Grafana table of heartbeat health

Show how long ago each heartbeat was last seen:
time() - checkly_heartbeat_last_success_timestamp_seconds

Private Location Metrics

The Prometheus exporter also contains metrics for monitoring Private Locations. These metrics can be used to ensure that your Private Locations have enough Checkly Agent instances running to execute all of your checks. The following metrics are available to monitor Private Locations:
MetricTypeDescription
checkly_private_location_queue_sizeGaugeThe number of check runs scheduled to the Private Location and waiting to be executed. A high value indicates that checks are becoming backlogged and that you may need to scale your Checkly Agents.
checkly_private_location_oldest_scheduled_check_runGaugeThe age in seconds of the oldest check run job scheduled to the Private Location queue. A high value indicates that checks are becoming backlogged.
checkly_private_location_agent_countGaugeThe number of agents connected for the Private Location.
The Private Location metrics all contain the following labels:
LabelDescription
private_location_namethe name of the Private Location.
private_location_slug_namethe Private Location’s human readable unique identifier.
private_location_idthe Private Location’s UUID.