Alert Types

This section describes the specific system-defined and user-defined alerts you can configure on your appliance.

Alerts return two messages reflecting an OPEN status and a CLOSED status for the event that triggered the alert. OPEN alerts are considered active until CLOSED. An alert is closed when the alert event is resolved, such as when the database or a compute blade comes back online. Test alerts also have OPEN and CLOSED messages, which follow each other within about 10 seconds.

Alert messages provide specific details about the error conditions that triggered the alert. For example, for a Database State alert you may see one of the following messages:
The database stopped running: too many missing compute nodes
The database is degraded due to: rebuilding. Performance may be affected.

System-Defined Alerts

The following system-defined alerts are supported. (You cannot create new alert types.)

Alert Name Rule Resource ID in Alert Messages
Cluster Quiesce Alert when the database is quiesced or an attempt to quiesce it fails. When the database is quiesced, active queries are cancelled. Queries that were queued start running when the database comes back online.
database:event
Compute Blade Alert when a compute blade changes state. For example, a blade may be powered off, causing an alert. Chassis number and blade number. For example:
chassis0:blade10
Compute Blade Reset Alert when a compute blade restarts. Chassis number and blade number. For example:
chassis2:blade14
Database State Alert when the database changes state. For example, the database may be degraded because a compute node is offline.
database:state
Database Row Store Alert when the database row store changes state.
database:rowstore
Fan Alert when a fan changes state. For example, a fan may have failed or have been removed from the appliance. Chassis number and fan number. For example:
chassis0:fan2
Manager Node Drive Not Detected Alert when a manager node drive is not detected. For example, a specific drive may not be installed. Manager node number and drive ID. For example:
manager1:drive:nvme4n1
Manager Node HA State Alert when a manager node changes state. For example, one of the manager nodes may be offline, and failover is temporarily not supported.
database:ha_state
Network Switch Alert when a network switch changes state. Chassis number and switch number. For example:
chassis0:switch2
Power Supply Alert when a power supply changes state. Chassis number and power supply number. For example:
chassis0:power2
Temperature Alert when the inlet temperature for the system exceeds 35C.
database:temperature
Test Alert when Test Alert is requested by the SMC user.
database:test

Test Alerts

Test alerts are system-defined, but you can trigger them in two different ways:
  • After finishing the creation of a new endpoint, you can send a test alert for that specific endpoint. In this case, the endpoint may be enabled or disabled. Click Test Alert within the summary screen for the endpoint.
  • You can send an alert to all enabled endpoints via Configure > Alerting > Test Alert). Disabled endpoints will not receive the alert.

User-Defined Alerts

The following user-defined alerts are supported. By default, they are all disabled. You can enable all of them or any subset.

You cannot create new alert types.

The alerts with numeric thresholds have default values for Major and Critical severity alerts. You can define additional thresholds for Informational and Minor severity alerts.

Alert Name Rule Resource ID in Messages
Compute Blade Disk Used Alert when compute blade disk usage exceeds the specified percentage. One alert is triggered per cluster, when any one drive exceeds the threshold. Default thresholds:
  • Major severity when value is greater than 85
  • Critical severity when value is greater than 95
Chassis number, blade number, drive number, then usage. For example:
chassis0:blade9:drive3:usage
Compute Blade Disk Wear Alert when compute blade disk wear exceeds the specified percentage. Default thresholds:
  • Major severity when value is greater than 85
  • Critical severity when value is greater than 95
Chassis number, blade number, drive number, then wear. For example:
chassis0:blade9:drive3:wear
Database Connections Used Alert when the number of database connections exceeds the specified percentage. Default thresholds:
  • Major severity when value is greater than 85
  • Critical severity when value is greater than 95
database:connections
Manager Node Disk Wear Alert when manager node disk wear exceeds the specified percentage. Default thresholds:
  • Major severity when value is greater than 85
  • Critical severity when value is greater than 95
Manager node number, drive name, then wear. For example:
manager2:mgmt2-/dev/nvme2n1:wear
Network Status (External) Alert when the external network status changes.
manager#:external_bond
WLM Rule Alert when a WLM rule is triggered with the action Log ERROR or Log WARN. See Rule Actions. WLM alerts are based on workload management rules rather than alerting rules. The message for a WLM alert contains the query ID that triggered the WLM rule in parentheses. For example:
database:wlm:SELECT * rule
where SELECT * rule is the name of a WLM rule that was triggered and in turn triggered the alert.

Query Alerts

To see active and logged query alerts, go to Manage > Query Alerts. WLM alerts appear under Query Alerts by default; if they are enabled in the Configure Alerting screen, WLM alerts also appear under Cluster Alerts.