Custom Checks for Process Monitoring

Uptime.com custom checks help monitor periodic jobs and processes, issuing alerts according to whether an action occurs or a heartbeat is not detected. Such a check type is useful for process monitoring that includes cron tasks, scripts, agents, workers, daemons or any form of automation that powers your underlying infrastructure. 

Custom checks, combined with our external checks such as HTTP(S) or API monitoring, can simplify a diagnosis with data from inside your systems. 

This article will walk you through creating and utilizing these check types, and provide some use cases and ideas. 

Table of Contents

Adding Your First Heartbeat Check

Return

To add a new Heartbeat Check, click Monitoring followed by Checks, then Add New. Select Heartbeat from the Check Type drop-down menu. Press the Save button, to reveal your Heartbeat URL for your check.

 

A Heartbeat check receives alerts and metrics from an external source. There is one attribute that it can relay:

  • “response_time”: can relay performance metrics for the heartbeat (using seconds)

When viewing the Custom Time Series Data for a Heartbeat check, response times are displayed in milliseconds (ms). When response time is less than 1ms, times are reported in microseconds (μs):

heartbeat_microseconds.png

A Heartbeat check expects to receive a request to a unique URL at a default interval of 5 minutes, or an arbitrary interval of 1 minute or greater. You can specify minutes, hours, or days. It is also possible to use floating point numbers, but the heartbeat check will convert these numbers and/or round up to the next unit as necessary. Some examples are below:

  • 1.5 minutes will become 2 minutes. 
  • 1.5 hours will become 90 minutes 
  • 1.5 days will become 36 hours

It may be helpful to think about a Heartbeat check in contrast to a Ping ICMP check. The Ping check is an external monitor that tells you whether a system is up and running. It does not communicate the status of the processes that the system governs. Use a Ping ICMP check to confirm the server running a process is alive, while the heartbeat check tells you the process itself is active. Heartbeat checks are also useful in systems that do not have an interface or otherwise expose status. 

When you have saved your Heartbeat check, you will see three additional fields that include the following:

  1. A URL to send data to ensure the heartbeat monitor remains UP
  2. A sample cURL POST request to test the check is online, or for general use
  3. A sample cURL post that includes response time data (0.8 is 800ms, while 1.0 is 1000 ms or 1 second)

The heartbeat check will go down and trigger an alert to the Contact(s) assigned to your check if it did not receive a request within the specified time period. 

Please note: Heartbeat checks allow 2 requests per minute, plus up to 6 “bonus” requests every 15 mins. Multiple requests in a single limit will trigger a rate limit error.

Adding Your First Incoming Webhook Check

Return

To add a new Incoming Webhook Check, click Monitoring followed by Checks, and then Add New. Select Incoming Webhook from the Check Type drop-down menu. Press the Save button, to reveal your Webhook URL for your check.

 

 

An Incoming Webhook receives alerts and metrics from an external source. There are two attributes that it can relay:

  • “state_is_up”: can be “true” or “false”. Controls alert state of the webhook 
  • “response_time”: can relay performance metrics for the webhook (using seconds)

An Incoming Webhook allows you to remain proactive in your approach to internal process monitoring. A webhook takes action when something else occurs (or doesn’t), offering instantaneous alerting for a process that is no longer responding. 

Please note: Incoming Webhooks have a rate limit of 1 request per minute, per Webhook ID.

When you have saved your Incoming Webhook, you will see three additional fields that include the following:

  1. A URL to send data for the webhook 
  2. A sample cURL POST request to trigger the incoming webhook as DOWN
  3. A sample cURL post that includes both DOWN state and response time data (0.8 is 800ms, while 1.0 is 1000 ms or 1 second)

Incoming webhooks are used in response to an event. Heartbeat monitoring is continuous, applied at specific intervals. Combining these two check types provides a great deal of infrastructure observability. 

Please Note: Webhook checks allow 2 requests per minute, plus up to 6 “bonus” requests every 15 minutes.

Failed Checks

Return

We have multiple options for viewing reporting related to a check. The first contact, and preferred notification method, was defined when you created your Uptime account. You were asked to enter the email and extended contact information for the individuals to notify on your team. If a check fails, these contacts will receive notifications of failure by default.

You can also review the check from your Dashboard, or review alert details in your Alert history page. You will also receive an alert notification when a heartbeat check fails:

custom-check-alert.png

Finalizing Your Custom Check Monitor

Return

Before saving, every Heartbeat check must have the following:

  • Name
  • Defined interval
  • Contacts - Choose a specific list of contacts that should be notified

Before saving, every Incoming Webhook must have the following:

  • Name
  • Contacts - Choose a specific list of contacts that should be notified

Advanced Options

Some advanced options provide added functionality to Heartbeat checks and Incoming Webhooks that DevOps teams may find valuable.

  • Notes

For a detailed breakdown of this parameter, please visit the Field Explanation support article.

Use Cases for Custom Checks for Process Monitoring 

Return

Custom check use cases are numerous and heavily reliant on the design of your system. 

Heartbeat and Webhook process monitoring provide an internal indication that a specific process is working. Here are a few ideas that may help you consider their usage.

 

Alerting for Disruptions in Scheduled Deployments

In the case of scheduled builds to a product, such as nightly builds to a testing environment, a heartbeat check could be used in combination with a cron job to ensure that the build is successful and to provide alerts through the Uptime.com platform if a build fails.

To automate this, a cron job can run every night (or whatever the desired interval) to deploy to the environment. When the job is finished, a heartbeat is sent to the Uptime.com check to indicate that it has successfully deployed. If the heartbeat isn’t received, an alert is triggered and sent to the contact set up through Uptime, indicating that the build wasn’t completed successfully and notifying the relevant personnel that there was a problem with the build.

 

Alerting for Failures in an E-Commerce System

In addition to using HTTP(S) checks and transaction checks to ensure that basic workflows like checking out with an item are working correctly for an e-commerce store, a custom webhook check could be used to ensure that backend systems like payment processing and customer notification emails are functioning correctly.

For example, when a customer places an order in an online store, several systems are triggered including adjusting inventory based on the order and sending an email to the customer. Once this is completed, a webhook could be used to notify the Uptime.com account that the order has successfully run through the full process. A failure at any point in the process can indicate that it is not functioning correctly and send an alert through the Uptime interface using a webhook so that it can be investigated by the relevant teams.

 

Using Heartbeat to Phase Out Old Deployments

When you deploy a new feature, there can be an open question of who utilizes the new feature versus the old. A practical use of heartbeat monitoring involves phasing out a system that has not run for a specified time period with minimal impact to your userbase. 

Heartbeat monitoring provides some insights with an alert that indicates it has been X number of days since the process was run. Once it has been safely isolated, you can deprecate it without impacting your user base. Even notifying your users before or during the upcoming deprecation in time to allow for any data migration. 

Backup Example with Webhook

Backups are essential to any functioning business, from microscopic to enterprise. Losing this functionality for an entire day isn’t a big deal. An undetected outage could be monumentally catastrophic. 

Server A is in charge of regular backups, and we want to know when these backups fail so we can act on them. In the script that manages our job, we can include our webhook to relay true or false state depending on the outcome. If the backup fails, webhooks will receive failure notifications and relay them to us. 

Backup Example with Webhook and Heartbeat

We can take the above example a step further by combining these two check types. In the course of our webhook implementation above, we discovered that Server A is just prone to failure. If the server stops responding in the middle of the job, we will never know if the job was successful or failed because that status will not output properly. We are looking into that, but in the meantime we want to be sure the system attempted a backup as well as whether it was successful.

We will need two custom checks to help us determine the state of our server and backup process: a heartbeat and a webhook. 

The Incoming Webhook built into our script tells us the script is working and if the backups are succeeding or not. 

A Heartbeat Check set to every 12 hours tells us the job, managing that backup is alive and well. When our heartbeat monitor fails, we develop a more specific idea of what went wrong. 

Tip: Use the Notes section for both check types to provide more detail about the process in question so your team can make a diagnosis faster.

Was this article helpful?
7 out of 8 found this helpful

Comments

0 comments

Article is closed for comments.

Have more questions?
Submit a request
Share it, if you like it.