MatrixGate Parameters

This document provides detailed parameter information for MatrixGate.

1 Configuration File Parameters

Notes! The parameters in this section are those in the configuration file generated before mxgate is started.

Parameter Name Default Value Description
[database] Category
--db-database postgres MatrixGate connects to the YMatrix database name
--db-master-host Local hostname MatrixGate connects to the YMatrix hostname
--db-master-port 5432 MatrixGate connects to the YMatrix host port number
--db-user Current system username MatrixGate Connect YMatrix Username
Notes! The user must have permission to create external tables. If you are using a non-superuser, use the following command to add permissions:
alter user {username} CREATEEXTTABLE;
--db-password Empty MatrixGate connection YMatrix user password
--db-max-conn 10 MatrixGate connection YMatrix maximum number of connections
[job]category
--allow-dynamic false When --allow-dynamic=true is specified, it allows dynamic adaptation of the target table to be inserted based on the POST data content (first line). This option should only be used when the target table name is not yet determined when MatrixGate is started. If you want to insert into a known target table, it is recommended to explicitly specify the table name using --target
--delimiter | Specifies the character used to separate columns within each line (row) of the file
--error-handling accurate How to handle lines with format errors
'accurate':Incorrect data is not entered into the database and an error log is recorded. Other data in the same batch is not affected.
'legacy':The entire batch failed.
--exclude-columns Empty The number and order of columns provided by default during data loading must match the table definition. When data loading only provides some columns, --exclude-columns is used to mark the column names to be excluded. The order of the remaining columns must still match the table definition. Note: If --use-auto-increment is enabled to skip auto-increment fields, there is no need to list these auto-increment fields here. This parameter only needs to mark other column names that need to be excluded.
--format text Specifies the data format of the source data as text or csv. text is the fastest but does not support line breaks in character types. The csv format is more widely applicable, and character type columns must be enclosed in double quotes
--null-as empty string Specifies the string representing the null value. The default value is an empty string without quotes. When the column constraint in the data table is NOT NULL and the data content for that column is null, it will cause a loading error. Note: If you need to use \N as the null value, you must escape the backslash, e.g.: --null-as \N
--time-format unix-second Specifies the timestamp unit: unix-second \| unix-ms\|unix-nano \| raw. MatrixGate defaults to treating the first column of each row as the Unix representation of the timestamp and automatically converts it to the database time format. If the timestamp is not in the first column or the user has already converted it to the database format, use raw so that MatrixGate does not perform a time type conversion.
--upsert-key empty The key name for UPSERT, which can be specified multiple times.
Tables that require UPSERT must have UNIQUE constraints established, and all constraint keys must be specified in the parameters.
--deduplicate-key Empty Similar to UPSERT, the difference is that only empty values are updated. If the old value is not empty, the new value is discarded.
Mutually exclusive with the --upsert-key parameter; only one can be selected
--use-auto-increment true When the target table contains an auto-increment field, whether to skip assigning values to the auto-increment field when loading data and use the system default auto-increment value instead
--target schemaName.tableName Specifies the target table name. The schemaName can be omitted, with the default being public. Multiple target tables can be specified using the format “--target table1 --target table2 …”. When this parameter is not provided, the --allow-dynamic parameter can be additionally specified to allow dynamic adaptation of table names
--dml-template The file path of the mapping template that maps JSON fields to tuple columns
[misc] category
--log-archive-hours 72 MatrixGate log files in the log directory that have not changed for a certain period of time are automatically compressed
--log-compress true Global switch to enable automatic log compression
--log-dir /home/mxadmin/gpAdminLogs Log directory
--log-max-archive-files 0 Maximum number of compressed log files to retain. Once this number is exceeded, the oldest log files will be deleted. 0 means no deletion
--log-remove-after-days 0 Number of days after which compressed log files are automatically deleted. 0 means no deletion
--log-rotate-size-mb 100 When the current log file exceeds a certain size, it is automatically switched to a new file, and the old file is immediately compressed
-v / --verbose Print detailed verbose logs
-V / --debug Print detailed debug logs (including verbose logs and debug logs)
--pprof-port Port for accessing Pprof information. 0 indicates disabled
--no-cleanup Retain temporary mode even when exiting normally
--grpc-port Port for accessing gRPC information; 0 indicates disabled
[metrics] category
--metrics-enable true Enable metrics
--metrics-sample-interval 15 (adjusted to 15 from v5.3.2, previously 3) Metrics sampling interval (seconds). Set to >0 to enable metrics collection (will reduce performance)
[source] category
--source http MatrixGate data source, supports http / stdin / kafka / transfer / grpc
[source] category [HTTP] Notes: This mode is the default data source connection mode in the configuration file
--http-port 8086 MatrixGate HTTP interface for user data submission
--max-body-bytes 4194304 Maximum size of each HTTP packet body
--max-concurrency 40000 Maximum number of concurrent HTTP connections
--request-timeout 0 Request timeout, default 0. When set to a value greater than 0, it will time out after waiting for the set time in milliseconds and return timeout(408)
--disable-keep-alive false MatrixGate forces the connection to be closed after each HTTP request
--http-debug false Output additional HTTP source diagnostic information
[source] Category [Transfer] Notes: Migration mode is not the default mode. If you need to use this mode, please manually configure the parameters in this section.
--src-host IP address of the source repository Master
--src-port Port number of the source database Master
--src-user Username to connect to the source database (recommended to use Superuser)
--src-password Connection password
--src-schema The schema name of the source table
--src-table The table name of the source table
--src-sql The SQL used for data filtering during migration
--compress The compression method for transferring data from the source database segment to this database:
An empty string “”, representing no compression and plaintext transmission.
gzip: To use gzip compression, the gzip Linux command must be installed on the Segment host of the source database.
lz4: To use lz4 compression, the lz4 Linux command must be installed on the Segment host of the source database.
Recommended: lz4 > gzip > No compression
--port-base 9129 A port will be occupied during transmission
--local-ip Must use the IP address that the source repository can connect to on the local machine
[transform] category
--transform plain Convert the format or type of the data to be written. Supports plain / json / nil / tsbs / hanagdbc
[writer] category
--interval 100 milliseconds MatrixGate batch data loading time cycle
--writer stream MatrixGate writes data to YMatrix via Writer. Supports stream / nil

--stream-prepared
10 Call several slot processes simultaneously in a single job
--stream-host mdw The hostname of the Master in YMatrix connected to MatrixGate. This is for systems with multiple network interfaces
--use-gzip auto When MatrixGate sends data to Segment, you can configure whether to enable compression using the parameters auto/yes/no:
auto Set to prioritize the zstd compression algorithm;
Setting --use-gzip=yes means using the gzip compression algorithm.
Setting --use-gzip=no will disable compression during transmission. While this setting can save a small amount of CPU usage, it will significantly increase the amount of network data transmitted. We recommend using the default value auto unless the database is deployed on a single machine and both mxgate and the database are on the same host.
--max-seg-conn 128 The number of segments started when the external table pulls data from MatrixGate. Increasing this parameter will increase network connection resources.
--timing false After setting this parameter to true, MatrixGate will add timing information to each INSERT when logging.
--insert-timeout 600000 MatrixGate INSERT statement timeout.
Setting a value greater than 0 will cause a timeout after waiting for the configured time in milliseconds.
-I / --instrumentation disable Enable detection of slot(s), supporting disable / single / all options:
disable means to disable this feature;
single indicates that only slot[0] detection is enabled.
all indicates that detection for all slots is enabled

--bytes-limit
Batch data loading size limit. Ensures uniform data ingestion when the data stream input to MatrixGate is uneven. This feature is disabled by default. If enabled, the size must be manually configured, with values ranging from 0~INT_MAX

--auto-tune
false After setting this parameter to true, MatrixGate can adjust the number of slots for write tasks.

--abort-by-pause-timeout
10000 After a job is paused, if the specified timeout period in this parameter is reached, the data accumulated in memory that is about to be written to the database will be automatically discarded. The valid range for this parameter is 0~INT_MAX; the recommended range is 1000~10000, with the unit being milliseconds (ms); if set to 0, this feature is disabled. It is recommended to configure this value significantly lower than --request-timeout to ensure that when the job is not in a paused state, --request-timeout is still used as the timeout value. Once the job is paused, mxgate will automatically compare the two timeout parameters mentioned above and trigger a timeout error based on the lower value. This parameter is only configurable when writer=“stream”

2 Command Line Parameters

Notes!
The parameters in this section can be run in the command line after mxgate is started.

Parameter name Sub-parameter name Description
run Run MatrixGate from the command line
start Start the MatrixGate background process
stop Terminate the MatrixGate background process
-f / --force Forcefully terminate the MatrixGate process
--grpc-port Terminate the MatrixGate process listening on a specified port by sending a gRPC request
status Print the status of the MatrixGate background process
config Print the full configuration file
log Show the latest 10 log entries
-n / --lines int Specify the number of latest log entries to display; the int parameter is required
version Display the version
help Show usage and parameter list
-C / --config Load a configuration file; currently supported as a sub-parameter for start / run
-p / --pid Process ID of the MatrixGate instance; currently supported as a sub-parameter for stop / log / watch / pause / resume
--job The Job to set parameters for. Since each Job resides in a table, simply specify the corresponding table name in practice; currently supported as a sub-parameter for stop / log / watch / pause / resume
set --stream-prepared-cli Set the number of active write slots in a writing Job
࠾ --stream-prepared-cli defaults to 10
࠾ Example: mxgate set --stream-prepared-cli 3
--job-intervalSet the ETL (Extract-Transform-Load) work interval for a writing Job (unit: milliseconds, ms)
࠾ --job-interval defaults to 100
࠾ Example: mxgate set --job-interval 150
--high-water-markSet the high-water mark (write-load threshold) for the target Job
࠾ --high-water-mark defaults to 0
࠾ Example: mxgate set --high-water-mark 20
--disable-high-water-markSet whether to disable the high-water mark threshold
࠾ --disable-high-water-mark defaults to false
࠾ Example: mxgate set --disable-high-water-mark=true

--auto-tune
Set whether to enable automatic slot-count tuning
࠾ Turn on for all Jobs: mxgate set --auto-tune=true
࠾ Turn off for all Jobs: mxgate set --auto-tune=false
࠾ Turn on for a specific Job: mxgate set --auto-tune=true --job public.t1
get --stream-prepared-get Get the number of write slots in a writing Job
࠾ Example: mxgate get --stream-prepared-get
--stream-status-getGet the status of write slots in a writing Job
࠾ Example: mxgate get --stream-status-get
--job-interval-getGet the work interval of a writing Job
࠾ Example: mxgate get --job-interval-get
--job-listGet information on all writing Jobs
࠾ Example: mxgate get --job-list
--job-stateGet the status of all writing Jobs
࠾ Example: mxgate get --job-state
--current-configGet the current configuration of the MatrixGate process
࠾ Example: mxgate get --current-config
--kafka-topicsGet all test Kafka Topics
࠾ Example: mxgate get --kafka-topics
--kafka-messagesGet all messages from a specific Kafka Topic
࠾ Example: mxgate get --kafka-messages
--source-pressureGet the source-side data write load for the target Job
࠾ Example: mxgate get --source-pressure
--high-water-mark-getGet the high-water mark (write-load threshold) set for the target Job
࠾ Example: mxgate get --high-water-mark-get

--auto-tune-status
Get the auto-tune status for write-slot counts in writing Jobs
࠾ Get auto-tune status for all writing Jobs: mxgate get --auto-tune-status
࠾ Get auto-tune status for a specific Job: mxgate get --auto-tune-status --job public.t1
pause -X / --disconnect When pausing mxgate, whether to disconnect slots between Segment and mxgate (this interrupts all slots for all Jobs)
࠾ Example: mxgate pause -X
-S / --sync Without this sub-parameter, mxgate pauses asynchronously; when explicitly provided, mxgate waits until all writing Jobs have paused before returning
࠾ Example: mxgate pause -S
resume-R / --reload When resuming writes, whether to reload Job metadata
࠾ Example: mxgate resume -R
watch
-D / --daemon-addr gRPC service address (host:port) of the MatrixGate process
-i / --info Continuously print the meaning of each column shown by mxgate watch
࠾ Example: mxgate watch --info
-T / --time Specify the duration (in seconds, s) for monitoring metrics
࠾ Default is -1, which means print continuously
࠾ Example: mxgate watch --time 200
-H / --history Show historical metrics
࠾ Example: mxgate watch --history
--watch-latency Continuously print latency-related monitoring metrics
࠾ Example: mxgate watch --watch-latency
--watch-start View historical start records, defaulting to the last 24 hours before the current time, e.g., 2022-04-11 11:09:20
࠾ Example: mxgate watch --watch-start
--watch-end View historical termination records, defaulting to the current time, e.g., 2022-04-11 11:09:20
࠾ Example: mxgate watch --watch-end
--watch-duration View historical duration
࠾ Example: mxgate watch --watch-duration

Notes!
For an overview of MatrixGate's main features, please refer to MatrixGate Main Features.