- Top ↑
Default policies are created by Netuitive and intended to provide recommendations for ways to monitor the behavior of the elements in your environment. Default policies can be found on the Policies page and are marked as Netuitive in the Created By column. You can edit default policies as needed to suit the behavior of your environment. When new default policies are provisioned to your account, Netuitive will not overwrite any changes you made to existing default policies. Furthermore, any new default policies added to your account will be disabled by default.
Table 1-5 below describes the default policies for AWS Auto Scaling Group (ASG) elements.
Before reading about the EBS default policy, it is important to understand the following Netuitive computed metrics. For more information about computed metrics, see Computed metrics.
- Average Latency: Average Latency is straightforward as it represents the average amount of time that it takes for a disk operation to complete.
- Queue Length Differential: Queue Length Differential measures the difference between the actual disk queue length and the "ideal" disk queue length.The ideal queue length is based on Amazon's rule of thumb that for every 200 IOPS you should have a queue length of 1. In theory, a well-optimized volume should have a queue length differential that tends to hover around 0. In practice, we have seen volumes with extremely low latency (< 0.0001) have queue length differentials that are higher than 0; presumably this is because the latency is much lower than Amazon is assuming for their rule of thumb. Even in these cases, the differential is a pretty steady number.
Table 1-5 below describes the default policy for EBS elements.
Table 1-6 below describes the default policies for AWS EC2 elements.
Table 1-7 below describes the default policies for ELB elements.
|AWS Lambda - Elevated Invocation Count||30 minutes||
||WARNING||The number of calls to the function (invocations) have been greater than expected for at least the last 30 minutes.|
|AWS Lambda - Depressed Invocation Count||10 minutes||
||WARNING||The number of calls to the function (invocations) have been lower than expected for at least the last 10 minutes.|
|AWS Lambda - Elevated Latency||30 minutes||
||WARNING||The average duration per function call (latency) has been higher than expected for at least the past 30 minutes.|
Table 1-8 below describes the default policies for RDS elements.
Table 1-6 below describes the default policies for AWS Simple Queue Service (SQS) elements.
|AWS SQS - Queue Falling Behind||2 hours||
netuitive.aws.sqs.arrivalrate has a > netuitive.aws.sqs.completionrate
|CRITICAL||The arrival rate for the queue has been greater than the completion rate for at least 2 hours. This is an indication that processing of the queue is falling behind.|
|Policy name||Metrics Required||Duration||Conditions||Category||Description|
|Azure VM - CPU Threshold Exceeded||Boot Diagnostics||15 minutes||
Processor.PercentProcessorTime has a > 50%
|WARNING||The CPU on the Azure Virtual Machine has exceeded 95% for at least 15 minutes.|
|Azure VM - Elevated CPU Activity (Normal Network Activity)||Boot Diagnostics||30 minutes||
|INFO||Increases in CPU activity are not uncommon when there is a rise in network activity. Increased traffic to a server means more work for that server to do. This policy is designed to catch cases where CPU activity is higher than than normal and said behavior cannot be explained by a corresponding increase in network traffic. It may or may not represent a problem, but it is useful to know about. This policy will not fire if CPU utilization is less than 20% though.|
|Azure VM - Elevated Disk Activity||Boot Diagnostics||30 minutes||
|INFO||Disk activity has been higher than expected for at least 30 minutes.|
|Azure VM - Elevated Memory Utilization||Basic Metrics||15 minutes||
||WARNING||The memory utilization on the Azure Virtual Machine is higher than expected.|
|Azure VM - Elevated Network Activity||Boot Diagnostics||30 minutes||
|INFO||Network activity has been higher than expected for at least 30 minutes.|
|Azure VM - Heavy Disk Load||Basic Metrics||5 minutes||
||WARNING||Average disk queue length is greater than expected, which could indicate a problem with heavy disk load.|
|Cassandra - Depressed Key Cache Hit Rate||30 minutes||
||WARNING||The hit rate for the key cache is lower than expected and is less than 85%. This condition has been persisting for at least the past 30 minutes.|
|Cassandra - Elevated Node Read Latency||30 minutes||
cassandra.Keyspace.ReadLatency.OneMinuteRate has an
|WARNING||The overall keyspace read latency on this Cassandra node has been higher than expected for at least 30 minutes.|
|Cassandra - Elevated Node Write Latency||30 minutes||
cassandra.Keyspace.WriteLatency.OneMinuteRate has an
|WARNING||The overall keyspace write latency on this Cassandra node has been higher than expected for at least 30 minutes.|
|Cassandra - Elevated Number of Pending Compaction Tasks||15 minutes||
cassandra.Compaction.PendingTasks has an
|WARNING||The number of pending compaction tasks has been higher than expected for at least the past 15 minutes. This could indicate that the node is falling behind on compaction tasks.|
|Cassandra - Elevated Number of Pending Thread Pool Tasks||15 minutes||
cassandra.ThreadPools.*.PendingTasks has an
|WARNING||For at least the past 15 minutes, the number of pending tasks for one or more thread pools has been higher than expected. This could indicate that the pools are falling behind on their tasks.|
|Cassandra - Unavailable Exceptions Greater Than Zero||5 minutes||
cassandra.*Unavailables.OneMinuteRate has a ≤ 1
|CRITICAL||The required number of nodes were unavailable for one or more requests.|
Table 1-10 below describes the default policies for colllectd elements.
Table 1-11 below describes the default policies for Diamond and Linux elements.
This is done by setting the percore setting set to FALSE (it is TRUE by default) and the normalize setting set to TRUE (it is FALSE by default) in your configuration file. After adjusting these settings, save the configuration file and restart the agent to apply the changes. See the Linux or Diamond agent documentation for more information.
|Docker Container - CPU Throttling||15 minutes||
netuitive.docker.cpu.container_throttling_percent has a > 0
|The Docker container has had its CPU usage throttled for at least the past 15 minutes.|
|Docker Container - Elevated CPU Utilization||30 minutes||
||INFO||CPU usage on the Docker container has been higher than expected for 30 minutes or longer.|
|Docker Container - Elevated Memory Utililzation||30 minutes||
||INFO||Memory usage on the Docker container has been highter than expected for 30 minutes or longer.|
|Docker Container - Extensive CPU Throttling||1 hour 5 minutes||
netuitive.docker.cpu.container_throttling_percent has a > 0
|CRITICAL||The Docker container has had its CPU usage throttled for over an hour.|
|Elevated CPU Activity||15 minutes||
elasticsearch.process.cpu.percent has an
|This policy generates a warning event when the Elastic Search CPU activity is higher than expected.|
|Elevated JVM Heap Usage||15 minutes||
elasticsearch.jvm.mem.heap_used_percent has an
|WARNING||This policy generates a warning event when the Elastic Search JVM's heap usage is higher than expected.|
|Elevated JVM Threads||15 minutes||
elasticsearch.jvm.threads.count has an
|WARNING||This policy generates a warning event when the number of threads used by the Elastic Search JVM is higher than expected.|
|Elevated Processing Time||15 minutes||
elasticsearch.indices._all.*time_in_millis has an
|WARNING||This policy generates a warning event if any of the "time in millis" metrics on the "_all" index deviate above the baseline for 15 minutes or more.|
|Reject Count Greater Than Zero||5 minutes||
elasticsearch.thread_pool.*.rejected has a > 0
|WARNING||This policy generates a warning if any of the Elastic Search thread pools has a "rejected" count greater than 0.|
Table 1-1 below describes the default policies for Windows elements.