Self-service inspection

This document shows the self-service inspection function of the YMatrix graphical interface.

If the monitoring and alarm function of the cluster is compared to the "emergency clinic" of the cluster, then the inspection function is the "regular physical examination" of the cluster. Conducting regular inspections can help you better understand the overall operation of the cluster, discover problems that affect the smooth operation of the cluster in advance, determine the best time for various operation and maintenance operations such as vacuum, avoid failures, and reduce the burden of operation and maintenance work.

The self-service inspection function of YMatrix graphical interface supports:

  1. Make a self-service inspection plan by checking the inspection option
  2. Generate a detailed inspection report, including the number of abnormal items, the abnormal items that need to be paid attention to, the analysis of inspection results, follow-up operation suggestions, inspection logs, etc.

1 Preparation

First, you need to log in to the graphical interface. Enter the IP and port number of Master in the browser:

http://<IP>:8240

2 Self-service inspection

Self-service inspection page.

Self-service inspection plan.

The complete checklist is as follows:

Check Category Check Items Level
Cluster Basic Information Check whether all instances are reachable High
Check cluster status High
Check the password for less than 30 days Medium
License validity check Chinese
Check the health of the connection number Low
Check cluster version Low
Database running status Check MARS2/CV health High
Check if there is data in the Default partition High
Database Age Top 10 High
Check the largest 20 business tables Medium
View the top 20 system tables in size Medium
The 20 SQLs that took the longest time to check Medium
View tables with data skew sizes of more than 10,000 rows Medium
Check the consistency between Master and Segment index Medium
Check each instance core file Chinese
Check the running status of automatic partitioning policy Medium
Check HEAP/MARS2 table with expansion rate exceeding 20% ​​ Medium
View the top 10 Schemas in size Low
Check the 20 indexes with the lowest usage Low
Check the 20 indexes with the lowest hit rate in the index cache Low
Check the system table with excessive index Low
Check the largest 20 indexes Low
Check the number of subpartitions for each partition table Low
Check for duplicate indexes Low
View each database size Low
Check Plpython Parameters Low
Check the database log size of each instance Low
Check database parameters Low
Server running status Check the process running status in the last 7 days High
Check network bandwidth usage in the last 7 days High
Check disk usage High
Check Disk I/O Usage in the Last 7 Days High
Check CPU usage in the last 7 days High
Check the last 7 days Commit memory High
Check the system load in the last 7 days Low
Check I/O bandwidth usage in the last 7 days Low
Check operating system parameters Low
mxgate Running Status Check the mxgate log for error information Low
Check the number of connections occupied by mxgate database Low

The logic for setting the level of the inspection item is as follows:

Level Description
High If an exception occurs in such check items, it will affect cluster stability
Medium If such check items are abnormal, it will affect some services in the cluster
Low If such checks are abnormal, they will not have a direct impact on the existing cluster, but the long-term development will have a stronger impact

Notes!
Please refer to the inspection report for detailed instructions.

Perform patrol inspection.

Complete the inspection.

Check the report and refer to the result description for subsequent maintenance operations.