Performance Tuning

This document outlines how to identify performance bottlenecks and describes typical performance tuning scenarios both inside and outside the YMatrix database.

1 How to Identify Performance Bottlenecks?

When a business SQL query becomes slow or you want to improve the performance of a SQL query, first confirm the general direction of the issue:

Is the entire server system slowing down:

  • Determine this by analyzing server resource usage.

Or is only a specific SQL query slowing down:

  • Use the Query Monitor feature in the YMatrix graphical interface.
  • Or observe real-time slow queries via the Overview page in Grafana/Prometheus monitoring.

Once a slow SQL query is identified, follow the steps below for investigation and analysis:

  • First, determine whether the performance bottleneck exists outside the database (server resources, business changes, etc.).
  • If not due to external reasons, then analyze the internal database (statistics update, data skew, query plan analysis, etc.).

1.1 Identifying External Database Issues

1.1.1 Server Resource Analysis

Note!
There are various methods to collect information. Here we detail only one or two. If you have other preferred commands, you can use them for collection and analysis.

Category Information Collection Analysis
Server Resources CPU Usage Via YMatrix GUI - Cluster Management - Metrics View:
• CPU usage per node
Or use the top command to collect detailed CPU info:
• User CPU (us)
• System CPU (sy)
• Idle CPU (id)
• When CPU usage is high, determine whether it is user CPU or system CPU
• If user CPU is high, it indicates high CPU usage by a program. Check the efficiency of the code execution
• If system CPU is high, check whether other server resources (disk I/O, memory, network, etc.) are insufficient
Memory & Virtual Memory Via YMatrix GUI - Cluster Management - Metrics View:
• Memory usage per node
Or use vmstat to view memory details:
• si (memory swapped in per second)
• so (memory swapped out per second)
If si and so are consistently non-zero, it indicates insufficient memory and heavy use of virtual memory, leading to performance degradation
Disk I/O (Read/Write Rate) Via YMatrix GUI - Cluster Management - Metrics View:
• Disk I/O per node
Or use iostat -x to view disk I/O details:
• %util (percentage of time spent doing I/O)
• %iowait (percentage of CPU time waiting for I/O)
• Disk I/O speed, not size, affects performance. If %util is near 100%, it means too many I/O requests and the system is overloaded
• If %iowait is high, it indicates a disk I/O bottleneck. Consider upgrading or replacing the disk array
Network Via YMatrix GUI - Cluster Management - Metrics View:
• Network receive/send rate per node
Or use sar -n DEV 1 2 to view network details:
• rxkb/s (received data per second in KB)
• txkb/s (transmitted data per second in KB)
Compare rxkB/s with total network bandwidth. If it is close to the total bandwidth, a network bottleneck exists
Kernel Parameters On Linux systems, check the corresponding kernel parameter files under /proc/sys, e.g., cat overcommit_memory

1.1.2 Business Change Confirmation

  • Confirm whether it is a newly launched business.
  • Check the DDL of cluster tables to confirm whether the issue is caused by online DDL. For example, adding an index to a large table consumes significant cluster resources and may interfere with normal access requests.

1.2 How to Investigate Internal Database Issues

Note!
Not all options in the table below are mandatory.

Category Information Collection Analysis
Software Environment OS Version Use uname -a Analyze whether the bottleneck is related to the OS version
YMatrix Version Use SELECT version(); Analyze whether the bottleneck is related to the YMatrix version
Cluster Info Cluster Deployment Topology • Via YMatrix GUI - Cluster Management
• Or use SELECT * FROM gp_segment_configuration;
If failover occurs, a single physical node may host more Segments, potentially causing performance degradation
Database Info Table Structure • Via YMatrix GUI - Data Tables
• Or use \d+
Confirm whether data skew is severe due to unreasonable distribution key settings
Related Logs Default log directory: $HOME/gpAdminLogs
Database logs are in the data directory
Analyze logs if needed
Slow Queries Via YMatrix GUI - Query Monitor: check for blocking sessions If slow queries exist, locate and analyze them
Query Plan Use EXPLAIN SELECT... to view the query plan If the query plan cost is too high, analyze the path and root cause
Save Environment Use YMatrix's minirepro tool

2 Typical Internal Database Tuning Scenarios

Internal database tuning refers to tuning individual SQL statements.

2.1 Statistics Skew

Symptoms:
A table has undergone significant data changes (e.g., inserts, deletes). You need to run ANALYZE to collect accurate statistics to avoid incorrect query plans due to outdated stats, which can degrade performance.

Analysis Method:
Use EXPLAIN ANALYZE to check if statistics are inaccurate. If the row estimate in the query plan is far off, it indicates skew. Re-run:

=# ANALYZE <tablename>;

2.2 Data Skew

Symptoms:
Unreasonable distribution keys during initial data model design cause data to be concentrated on a few Segment instances, leading to skew. This creates a bottleneck in YMatrix's distributed architecture: query time depends on the slowest Segment + Master processing time.

Analysis Method:
Use the following command to check data distribution. If Segment data counts differ significantly, skew exists:

=# SELECT gp_segment_id, COUNT(*) FROM <tablename> GROUP BY gp_segment_id;

Solution:
If skew occurs, consider modifying the distribution key and redistributing data evenly:

=# ALTER TABLE <tablename> SET DISTRIBUTED BY (<newcolumn>);
=# ALTER TABLE <tablename> SET WITH (REORGANIZE=true);
=# ANALYZE <tablename>;

Attention!
Distribution key selection affects data distribution and query performance. Modifying it after business launch carries performance risks. Choose carefully during design. Refer to YMatrix DDL Best Practices.

2.3 Data Bloat

Symptoms:
Frequent updates/deletes without a proper cleanup cycle cause table bloat.

Solutions:

  1. Drop historical (cold) partition subtables:

    =# DROP TABLE <partition_tablename>;
  2. Set up a reasonable cleanup cycle:

    =# VACUUM <tablename>;

    See Daily Cleanup.

  3. For severely bloated tables, reorganize:

    =# ALTER TABLE <tablename> SET WITH (REORGANIZE=true);
    =# ANALYZE <tablename>;

2.4 Query Plan Analysis

If none of the above causes the SQL performance bottleneck, analyze the query plan and ask:

  • Does a specific operation take too long?

    • Find the most time-consuming operation. For example, if an index scan is slower than expected, the index may be outdated and need rebuilding. Or adjust enable_<operator> parameters to force the optimizer to choose a different plan.
  • Are selective predicates applied early?

    • Apply the most selective filters early to reduce rows passed up the plan tree. If the optimizer misestimates selectivity, collect more stats on relevant columns or reorder the WHERE clause.
  • Is the join order optimal?

    • When joining multiple tables, ensure the optimizer chooses the most selective join order. If not, set join_collapse_limit=1 and use explicit JOIN syntax to force the order. Or collect more stats on join columns.
  • Does the optimizer selectively scan partitioned tables?

    • If using partitioning, ensure only relevant child tables are scanned. Parent table scans should return 0 rows.
  • Are hash aggregations and joins used?

    • Hash operations are usually faster than other joins or aggregations, as they can be done in memory without disk I/O.
  • Does enabling/disabling vectorization significantly affect performance?

    • The vectorized execution engine improves performance for sequential operations like expressions, filters, and aggregations. For random operations like sorting or hashing, the improvement is limited.

For more on query plans, see Understanding Query Plans.