Uptime.com custom checks help monitor periodic jobs and processes, issuing alerts according to whether an action occurs or a heartbeat is not detected. Such a check type is useful for process monitoring that includes cron tasks, scripts, agents, workers, daemons or any form of automation that powers your underlying infrastructure.
Custom checks, combined with our external checks such as HTTP(S) or API monitoring, can simplify a diagnosis with data from inside your systems.
This article will walk you through creating and utilizing these check types, and provide some use cases and ideas.
Table of Contents
- Adding Your First Heartbeat Check
- Adding Your First Incoming Webhook Check
- Failed Checks
- Finalizing Your Custom check Monitor
- Advanced Options
- Use Cases for Custom Checks for Process Monitoring
- Using Heartbeat to Phase Out Old Deployments
- Backup Example with Webhook
- Backup Example with Webhook and Heartbeat
To add a new Heartbeat Check, click Monitoring followed by Checks, and then Add New. Select Heartbeat from the Check Type drop-down menu.
A Heartbeat check receives alerts and metrics from an external source. There are two attributes that it can relay:
- “state_is_up”: can be “true” or “false”. Controls alert state of the heartbeat
- “response_time”: can relay performance metrics for the heartbeat (using seconds)
A Heartbeat check expects to receive a request to a unique URL at a default interval of 5 minutes, or an arbitrary interval of 1 minute or greater. You can specify minutes, hours, or days. It is also possible to use floating point numbers, but the heartbeat check will convert these numbers and/or round up to the next unit as necessary. Some examples are below:
- 1.5 minutes will become 2 minutes.
- 1.5 hours will become 90 minutes
- 1.5 days will become 36 hours
It may be helpful to think about a Heartbeat check in contrast to a Ping ICMP check. The Ping check is an external monitor that tells you whether a system is up and running. It does not communicate the status of the processes that system governs. Use a Ping ICMP check to confirm the server running a process is alive, while the heartbeat check tells you the process itself is active. Heartbeat checks are also useful in systems that do not have an interface or otherwise expose status.
When you have saved your Heartbeat check, you will see three additional fields that include the following:
- A URL to send data to ensure the heartbeat monitor remains UP
- A sample cURL POST request to test the check is online, or for general use
- A sample cURL post that includes response time data (0.8 is 800ms, while 1.0 is 1000 ms or 1 second)
The heartbeat check will go down and trigger an alert to the Contact(s) assigned to your check if it did not receive a request within the specified time period.
Please note: Heartbeat requests are throttled to once-per-minute. Multiple requests in a single limit will trigger a rate limit error.
To add a new Incoming Webhook Check, click Monitoring followed by Checks, and then Add New. Select Incoming Webhook from the Check Type drop-down menu.
An Incoming Webhook receives alerts and metrics from an external source. There are two attributes that it can relay:
- “state_is_up”: can be “true” or “false”. Controls alert state of the webhook
- “response_time”: can relay performance metrics for the webhook (using seconds)
An Incoming Webhook allows you to remain proactive in your approach to internal process monitoring. A webhook takes action when something else occurs (or doesn’t), offering instantaneous alerting for a process that is no longer responding.
When you have saved your Incoming Webhook, you will see three additional fields that include the following:
- A URL to send data for the webhook
- A sample cURL POST request to trigger the incoming webhook as DOWN
- A sample cURL post that includes both DOWN state and response time data (0.8 is 800ms, while 1.0 is 1000 ms or 1 second)
Incoming webhooks are used in response to an event. Heartbeat monitoring is continuous, applied at specific intervals. Combining these two check types provides a great deal of infrastructure observability.
We have multiple options for viewing reporting related to a check. The first contact, and preferred notification method, was defined when you created your Uptime account. You were asked to enter the email and extended contact information for the individuals to notify on your team. If a check fails, these contacts will receive notifications of failure by default.
You can also review the check from your Dashboard, or review alert details in your Alert history page. You will also receive an alert notification when a heartbeat check fails:
Before saving, every Heartbeat check must have the following:
- Defined interval
- Contacts - Choose a specific list of contacts that should be notified
Before saving, every Incoming Webhook must have the following:
- Contacts - Choose a specific list of contacts that should be notified
Some advanced options provide added functionality to Heartbeat checks and Incoming Webhooks that DevOps teams may find valuable.
For a detailed breakdown of this parameter, please visit the Field Explanation support article.
Custom check use cases are numerous and heavily reliant on the design of your system.
Heartbeat and Webhook process monitoring provide an internal indication that a specific process is working. Here are a few ideas that may help you consider their usage.
When you deploy a new feature, there can be an open question of who utilizes the new feature versus the old. A practical use of heartbeat monitoring involves phasing out a system that has not run for a specified time period with minimal impact to your userbase.
Heartbeat monitoring provides some insights with an alert that indicates it has been X number of days since the process was run. Once it has been safely isolated, you can deprecate it without impacting your user base. Even notifying your users before or during the upcoming deprecation in time to allow for any data migration.
Backups are essential to any functioning business, from microscopic to enterprise. Losing this functionality for an entire day isn’t a big deal. An undetected outage could be monumentally catastrophic.
Server A is in charge of regular backups, and we want to know when these backups fail so we can act on them. In the script that manages our job, we can include our webhook to relay true or false state depending on the outcome. If the backup fails, webhooks will receive failure notifications and relay them to us.
We can take the above example a step further by combining these two check types. In the course of our webhook implementation above, we discovered that Server A is just prone to failure. If the server stops responding in the middle of the job, we will never know if the job was successful or failed because that status will not output properly. We are looking into that, but in the meantime we want to be sure the system attempted a backup as well as whether it was successful.
We will need two custom checks to help us determine the state of our server and backup process: a heartbeat and a webhook.
The Incoming Webhook built into our script tells us the script is working and if the backups are succeeding or not.
A Heartbeat Check set to every 12 hours tells us the job managing that backup is alive and well. When our heartbeat monitor fails, we develop a more specific idea of what went wrong.
Tip: Use the Notes section for both check types to provide more detail about the process in question so your team can make a diagnosis faster.