With AppFormix Alarms, a user may configure an alarm to be generated when a condition is met in the infrastructure. The AppFormix Agent watches metrics at each host, analyzing the raw data for conditions of Alarms that apply to that host and the instances running on that host. By analyzing data at the source of collection, AppFormix efficiently scales with your infrastructure.
Each Alarm applies to a specified "scope" that identifies the type of entity to monitor for a condition. Entity type may be "Host," "Instance," or "Service." For a particular entity type, a user selects a set of entities to which an alarm rule applies. For example, when selecting the "Host" scope, a user may configure an Alarm to apply to "All Hosts" in the infrastructure or to all hosts in a specified Host Aggregate.
AppFormix offers two types of thresholds for alarms:
- Static threshold: compare measurements to a fixed threshold.
- Dynamic threshold: compare measurements against historical trends for a set of resources.
How Alarms Work
AppFormix Agent continuously collects measurements of metrics for a host and its instances. For a particular alarm, Agent aggregates the samples according to a user-specified function (average, standard deviation, min, max, sum) and produces a single measurement for each user-specified interval. Agent compares each measurement to a threshold. For an alarm with a static threshold, a measurement is compared to a fixed value using a user-specified comparison function (above, below, equal). For dynamic thresholds, a measurement is compared with a value learned by AppFormix over time.
Dynamic thresholds enable outlier detection in resource consumption based on historical trends. Resource consumption may vary significantly at various hours of the day and days of the week. This makes it difficult to set a static threshold for a metric. For example, 70% CPU usage may be considered normal for Monday mornings between 10:00 AM and 12:00 PM, but the same amount of CPU usage may be considered abnormally high for Saturday nights between 9:00 PM and 10:00 PM.
With dynamic thresholds, AppFormix learns trends in metrics across all resources in scope to which an alarm applies. For example, if an alarm is configured for a host aggregate, AppFormix learns a baseline from metric values collected for hosts in that aggregate. Similarly, an alarm with a dynamic threshold configured for project will learn a baseline from metric values collected for instances in that project. Then, Agent generates an alarm when a measurement deviates from the baseline value learned for a particular time period.
When creating an alarm with a dynamic threshold, a user selects a metric, a period of time over which to establish a baseline, and the sensitivity to measurements that deviate from the baseline. The sensitivity may be configured as ‘High’, ‘Medium’ or ‘Low’. Higher sensitivity will report smaller deviations from baseline and vice versa.
An alarm also has a mode: alert or event. When configured as an alert, AppFormix Agent sends notification on the message bus whenever the state of the alert changes. The alert will initially be in "learning" state until AppFormix Agent has collected enough data to evaluate the conditions of the alert. An alert is "active" when conditions of the alarm are met, and "inactive" when the conditions are not met.
When configured as an event, AppFormix Agent will send notifications on the message bus for each interval in which the conditions of the alarm are met.
As an example, consider an alarm for average CPU usage above 90% over an interval of 60 seconds. If the alarm mode is 'alert', then a notification will be sent when the alarm becomes 'active' at time T1. When the CPU drops below 90% at time T5, a notification will be sent that the alert is 'inactive'.
If the same alarm is configured in 'event' mode, then a notification will be sent for each of the five intervals in which the CPU load exceeds 90%.
Each alarm becomes part of the monitoring policy applied to resources in the infrastructure. When configuring an alarm, a user chooses the scope to which the alarm applies: Host, Instance, Service. Further, for a particular scope type, a subset of the resources of that type may be selected. When the scope is Host, a user may select All hosts or hosts that belong to a specified Host Aggregate. When scope is Instance, a user must select a Project for which the alarm will be configured. Any new resource that matches the scope will have the alarm automatically configured.
For example, a user configures an alarm with Instance scope for a given project. Afterward, when a instance is created in that project, Controller will configure the alarm for the new Instance.
Configuring an Alarm
To configure an alarm, select 'Alarms' in the left-hand navigation menu, then select 'Add alarm' in the upper-right. A user is presented with a input panel (see figure)
The basic configuration settings for an alarm are:
- Name: a name that identifies the alarm. Name is displayed in the Dashboard and is the user-facing identifier for external notification systems;
- Scope: type of resource to which an alarm applies: "Host" or "Instance";
- Aggregate: set of resources to which the alarm applies;
- Mode: "Alert" or "Event";
- Metric: metric that will be monitored on resources;
- Aggregation Function: how Agent will combine samples during each measurment interval (Average, Max, Min, Sum, Standard Deviation);
- Comparison Function: how to compare a measurement to the threshold (Above, Below, Equal);
- Threshold: a value by which to compare a metric measurement. Units for the threshold are determined by the metric type;
- Interval: duration of the measurement interval in seconds.