Health Monitoring

This document introduces the cluster health monitoring function of the graphic interface.

When supporting daily business, the MatrixDB database will run a large number of SQL statements, which may cause hardware problems such as network failures, lock waiting caused by transaction concurrency, etc. If not processed in time, it will lead to slow response speed or even direct errors, which will affect the efficiency of business operations. In order to better deal with the above problems, the health monitoring function of the graphic interface can help you discover abstract performance of the database cluster faster.

Health monitoring will regularly check the corresponding database system tables based on different detection items to check whether the operating status of the query meets business expectations. Once it is found that the expected status does not meet, we will immediately send a notification. The notification can be viewed in the graphic interface. If you think it is inconvenient to always view the page, you can also choose to receive alarm information more timely by email notification.

1 Preparation for use

Enter the IP of the machine where MatrixGate is located (the IP of Master by default) and port number in the browser to log in to the graphic interface.

http://<IP>:8240

2 Health Monitoring

After successfully logging in, go to the “Health Monitoring” - “Check Item Configuration” page.

2.1 Email Configuration

You can choose whether to configure your email address as needed. If you complete the email configuration, you will receive email notifications.

  1. Graphical Interface Domain Name
    To facilitate timely access to detailed alert information, we will include a link in the email to redirect to the graphical interface. If the email recipient cannot access the default domain name, this field must be modified.

  2. SMTP Server Address
    The SMTP server address consists of an IP address and port number. Example: smtp.example.com:465.

Common Third-Party Email Servers

  • Aliyun Email Service Address Guidelines
    Personal Edition: First enable SMTP service, refer to Documentation. SMTP service address and port number, refer to Documentation.
    Enterprise Edition: The email administrator must enable the SMTP service. Refer to the documentation. SMTP service address and port number: Refer to the documentation.
  • Google Mail Service Address Notes: First enable IMAP or POP service. Refer to the document.
  • NetEase Mail Service Address Notes: Personal Edition: First enable SMTP service. Refer to the [document](https://help.mail.163.com/ faqDetail.do?code=d7a5dc8471cd0c0e8b4b8f4f8e49998b374173cfe9171305fa1ce630d7f67ac2cda80145a1742516).
    Enterprise Edition: SMTP service is enabled by default. To verify the service status, refer to the document. SMTP service address and port number, refer to the document.
  • QQ Mail Service Address Notes: Personal Edition: Enable the SMTP service first. Refer to the documentation. SMTP service address and port number, refer to the documentation.
    Enterprise Edition: Steps to enable SMTP service, refer to Document. For the SMTP service address and port number, refer to [Document] (https://work.weixin.qq.com/help?person_id=0&doc_id=431&helpType=exmail).

Notes!
If the email service is set up by the enterprise itself, consult the email administrator or email service provider.

  1. Username
    The account used for authentication on the SMTP server. This field is optional and only required when the SMTP server requires a username for authentication. Example: [email protected].

  2. Password
    The password for the SMTP username. This field is optional and only required when the SMTP server requires both a username and password for authentication.

  • Common third-party email servers
  • Alibaba Cloud Email:
    Use the email login password, which is the password associated with the username email.
  • Google Email:
    Use the email login password, which is the password associated with the username email.
  • NetEase Email:
    Personal version: An authorization code must be used as the password, refer to Documentation .
    Enterprise Edition: The default login password is the email password. If the administrator has enabled the client authorization code feature, you must consult the administrator on how to obtain the authorization code.
  • QQ Email: Personal Edition: You must use the authorization code as the password. Refer to the document .
    Enterprise Edition: The default login password is the email password. If the administrator has enabled secure login, an authorization code is required. Refer to the document.

Notes!
If the email service is set up by the enterprise itself, consult the email administrator or email service provider.

  1. Sender
    If using a third-party email service, this field should be consistent with the “username” content; if using a self-built email service, just fill in the sender's email address.

  2. Recipient
    Enter the recipient's email address; multiple addresses can be entered.

2.2 Monitoring Projects

The list shows the monitoring projects currently provided by YMatrix, which are enabled by default. You can enable them as needed.

If you believe that the default parameters of the monitoring projects do not meet your business needs, you can modify them yourself.

Serial number Detection items Instructions
1 Cluster not available Verify whether the cluster is available by periodically executing the query SELECT * FROM gp_dist_random('gp_id'); If the query fails three times in a row, it is likely that the cluster has crashed, which may be the main Segment, corresponding mirror Segment simultaneous failure, network failure, power failure, hardware failure, etc.
2 Segment failed The main segment failure will cause the corresponding mirror segment server resources to be tilted. The processing pressure of the machine where the mirror segment data is located will increase, and the query speed will slow down. In severe cases, it may cause the memory resources of the tilted node to be exhausted and the cluster will be unavailable.
The presence of a mirrored segment failure will cause the cluster to be less highly available. Once the corresponding primary segment fails, the cluster will be unavailable
3 Query/transaction takes more than 12 hours If the query/transaction takes too long, it may occupy a large amount of memory, CPU and other server resources, causing the database service response to slow down and the system triggers OOM (memory overflow), etc.; in addition, it may cause the VACUUM process to be delayed
4 The transaction is in idle in transaction state for more than 1 hour The transaction is in idle in transaction state for a long time, and most queries with tables involved in the transaction will be blocked, which will also prevent the VACUUM process from reclaiming records, causing table data to bloat
5 Single query/transaction blocks more than 5 other queries and lasts for more than 15 minutes Query/transaction blocks many other queries, and the blocking time is long, which can easily cause other statements to block each other, affecting service response efficiency
6 The query that applied for Exclusive or AccessExclusive lock was blocked for more than 15 minutes Ques for table-level Exclusive or AccessExclusive locks are blocked for a long time, which may cause query blocking and accumulation, affecting service response efficiency
7 Query/transaction holds Exclusive or AccessExclusive locks and takes more than 2 hours Query/transaction holds table-level Exclusive or AccessExclusive locks and takes a long time, which will cause all queries involving locked tables to be blocked, affecting service response efficiency
8 Transactions that hold Exclusive or AccessExclusive locks and are in idle in transaction status for more than 15 minutes If the transaction holds an Exclusive or AccessExclusive lock and is in the idle in transaction state for 15 minutes, most of the queries with the tables involved in the transaction will be blocked, affecting the service response efficiency

2.3 Email Notifications

If you have configured an email account, you will receive an email when an event that meets the detection item failure conditions occurs.

2.4 Event History

Regardless of whether you have configured an email address, you can view records of events that occurred in the cluster and met the fault conditions of the detection items in the “Event History” section.