MatrixUI Health Monitoring

Health monitoring is a core operations and maintenance feature provided by YMatrix. It proactively monitors the operational status of the database cluster through predefined health checks, detects potential issues in advance, and sends timely notifications to ensure system stability. This feature is primarily designed for database administrators and developers, offering multidimensional health checks such as cluster availability monitoring, query/transaction timeout detection, and lock-wait timeout alerts.

The key benefits of health monitoring include:

  • Proactive alerting: Identifies risks before failures occur, replacing reactive troubleshooting.
  • Automated notifications: Sends real-time alerts via email or other channels.
  • Historical tracing: Logs all triggered events for post-incident analysis and root cause identification.
  • Flexible configuration: Allows users to customize check parameters and monitoring policies based on business requirements.

Accessing the Page

Navigation Path

  1. Log in to the MatrixUI management interface
    Enter the machine’s IP address (default is the Master node IP) and port number in your browser to access the web UI:
    http://<IP>:8240
  2. In the left navigation pane, click Health Monitoring.
  3. The system displays the Check Configuration tab by default.

Page Layout

The Health Monitoring page consists of the following sections:

  • Check List Area: Displays all predefined health checks.
  • Configuration Action Area: Provides buttons to enable/disable or edit checks.
  • Event History Area: Records all triggered alert events.

Managing Predefined Checks

Check Categories

Check Category Specific Check Item Monitoring Dimension Default Status
Disk Monitoring Disk space will be exhausted within 7 days Disk Space Enabled
Disk Monitoring Disk space below 20% Disk Space Enabled
Disk Monitoring Abnormal disk growth detected within 1 day Disk Space Enabled
Disk Monitoring Disk full Disk Space Enabled
Disk Monitoring Database set to read-only when disk usage exceeds threshold Disk Space Disabled
Cluster Monitoring Cluster unavailable Cluster Status Enabled
Transaction Monitoring Long-running uncommitted transaction Transaction Status Enabled
Lock Monitoring Lock wait timeout Lock Status Enabled

Configuring Checks

Enabling or Disabling a Check

  1. Locate the target check in the check list.
  2. Click the toggle switch on the right side of the check item (green = enabled, gray = disabled).
  3. The system applies the status change immediately.

Image 1

Editing Check Parameters

  1. Find the target check in the list.
  2. Click the Edit button on the right.
  3. In the configuration dialog, adjust the following parameters:
    • Alert Level: Sets the severity of the event.
    • Parameter Configuration: Defines the condition for triggering the check (e.g., disk space threshold).
    • Check Interval: Sets how often the check runs automatically (e.g., every 1 hour).
  4. Click Save to apply the changes immediately.

Image 2

Check Interval Settings

  • Intervals are specified in hours (e.g., "1h" means once per hour).
  • Some checks lack an interval setting, meaning they are triggered reactively based on updates from dependent data sources.
  • After modifying and saving the interval, the system cancels the previous scheduled task, runs the check immediately, and then starts the new interval cycle.

Image 3

Detailed Explanation of Disk Monitoring Checks (New in v6.7.1)

Automatic Read-Only Mode on Low Disk Space

Feature Description: Starting in MatrixUI v6.7.1, a new health rule automatically sets the database to read-only mode when disk usage exceeds a configured threshold, preventing further writes that could exhaust disk space entirely.
Default Status: This rule is disabled by default and must be manually enabled.

Configuring the Read-Only Rule

  1. Go to the Check Configuration tab in Health Monitoring.
  2. Under the Disk category, locate the relevant check item.
  3. Click Edit and configure the following:
    • Parameter Configuration: Set the disk usage threshold (e.g., 90%).
  4. Enable the check and save the configuration.

Image 4
Image 5

Alert Notification Configuration

Email Notification Setup

  1. In the Health Monitoring page, locate the Notification Configuration section.
  2. Click Configure Email to open the email settings dialog.
  3. Enter the following details:
    • SMTP server address
    • SMTP port number
    • Sender email address
    • Sender email password
    • Recipient email addresses (multiple allowed)
  4. Click Test Send to verify the configuration.
  5. Click Save to complete setup.

Alert Notification Format

When a health rule is triggered, the system sends an alert email containing:

  • Event description
  • Trigger time
  • Affected scope
  • Recommended actions

Image 6

Viewing and Analyzing Event History

Browsing Event History

  1. Switch to the Event History tab in Health Monitoring.
  2. Events are displayed in reverse chronological order by default.
  3. Filter events by:
    • Event type
    • Time range
    • Alert level

Viewing Event Details

  1. Locate the target event in the list.
  2. Click the Details button on the right.
  3. The details dialog shows:
    • Event ID
    • Trigger time
    • Event description
    • Affected objects
    • Resolution status

Common Issues and Solutions

  1. How to recover after the disk read-only rule is triggered?
    After the rule activates, restore normal operation by:
    a. Freeing disk space: Delete unnecessary log files, temporary files, etc.
    b. Temporarily disabling the rule: Click Disable Rule and Restore in the alert banner to bypass the rule temporarily.

    Image 7

  2. How to test if health monitoring works correctly?
    Validate functionality by:
    a. Simulating low disk space: Create large files to fill disk space up to the threshold.
    b. Simulating long transactions: Execute a transaction and leave it uncommitted for an extended period.
    c. Verifying alerts: Confirm that corresponding alert emails are received.

Best Practices

Check Configuration Recommendations

  • Disk Monitoring: Enable at least “Disk space below 20%” and “Disk space will be exhausted within 7 days.”
  • Transaction Monitoring: Set long-transaction thresholds based on business needs; recommend ≤ 30 minutes.
  • Lock Monitoring: Configure lock-wait timeout based on concurrency levels; recommend ≤ 5 minutes.

Notification Configuration Recommendations

  • Recipients: Configure at least two recipient emails to avoid missed alerts.
  • Notification Frequency: For frequently triggered alerts, set a cooldown interval (e.g., once per hour).
  • Testing: Always perform a test send after initial configuration to ensure delivery.

Routine Maintenance Recommendations

  • Regular Review: Check the event history weekly to identify and address underlying issues.
  • Parameter Tuning: Periodically adjust check parameters based on business growth and system load changes.