YMatrix
Quick Start
Connecting
Benchmarks
Deployment
Data Usage
Manage Clusters
Upgrade
Global Maintenance
Expansion
Monitoring
Security
Best Practice
Technical Principles
Data Type
Storage Engine
Execution Engine
Streaming Engine(Domino)
MARS3 Index
Extension
Advanced Features
Advanced Query
Federal Query
Grafana
Backup and Restore
Disaster Recovery
Guide
Performance Tuning
Troubleshooting
Tools
Configuration Parameters
SQL Reference
File ingestion refers to importing existing text data files into a target table. Text files are typically in CSV format.
Create a CSV file:
$ vi rows.csv
The test CSV file has the following format. The first row contains column names, and subsequent rows contain data. The three columns are of timestamp, integer, and string types respectively:
time,c1,c2
2021-01-01 00:00:00,1,a1
2021-01-01 00:00:00,2,a2
2021-01-01 00:00:00,3,a3
The target database is test, and the schema of the target table dest matches the CSV file format:
=# CREATE TABLE dest(
time timestamp,
c1 int,
c2 text
)USING MARS3
DISTRIBUTED BY(c1)
ORDER BY(time,c1);
Below are several common methods for importing file contents into the target table:
COPY is a built-in SQL command in YMatrix that imports data files located on the master node (Master) into the target table.
First, connect to the target database using psql, then execute the COPY command:
[mxadmin@mdw ~]$ psql test
psql (12)
Type "help" for help.
test=# COPY dest FROM '/home/mxadmin/rows.csv' DELIMITER ',' HEADER;
COPY 3
The DELIMITER parameter specifies the field delimiter. The HEADER option indicates that the header line should be skipped during import.
The COPY method is simple to use, but has the following limitations:
MatrixGate uses YMatrix's high-speed data ingestion tool mxgate to import data.
Compared to COPY, mxgate offers better performance for large volumes of data and allows deployment of data files independent from the Master node.
$ tail -n +2 rows.csv | mxgate --source stdin --db-database test --db-master-host localhost --db-master-port 5432 --db-user mxadmin --time-format raw --target public.dest --parallel 2 --delimiter ','
The above command uses a pipe to redirect the data portion of rows.csv (starting from the second line) into mxgate. mxgate ingests the data from stdin into the dest table in the test database.
Note!
Sincemxgatedoes not require a file header during ingestion, thetail -n +2command is used to output the file content starting from the second line.
For more information about mxgate, refer to the documentation.