YMatrix Architecture

This document describes the following aspects of the YMatrix architecture:

Global Architecture
- Hyper-Converged Architecture
- Database Architecture
Local Architecture
- High Availability Architecture

Overview

To reduce the complexity of the data ecosystem, YMatrix has designed a simple architecture with hyper-convergence genes, integrating computing, storage, and network resources into a unified system. It is based on a massively parallel processing (MPP) system and complies with the characteristics of a microkernel architecture.

This architecture is flexible and adaptable to multiple scenarios. It is not only friendly to IoT time series scenarios, but also supports traditional analytical data warehouse environments and business intelligence (BI) work.

Advantages

Replacing traditional data technology stacks with hyper-converged architectures may seem like a daunting task. So why do we need to do this?

In fact, regardless of the situation, the comprehensive use of hyper-converged architectures can benefit many enterprises by providing a unified data foundation for complex IT systems, including smart connected vehicles, industrial Internet, smart manufacturing, smart cities, energy, finance, and pharmaceuticals.

Compared to complex data technology stacks like the Hadoop ecosystem, the YMatrix architecture offers the following advantages:

Hyper-convergence
- Robustness: A complex technology stack typically consists of N separate data processing systems. Assuming that the probability of failure for any given component is P, the stability of the entire system can be approximated as (1-P)^N, meaning that each additional component significantly reduces stability. The hyper-converged architecture, with only one system, is naturally the most stable and robust.
- Cost-effectiveness: Due to its hyper-converged nature, YMatrix can consume and manage data within a single system without needing to transfer it across multiple distributed systems, thereby avoiding the need for data to be stored across multiple systems. Physical hardware requirements, such as disks, are minimal, resulting in low storage costs.
- Timeliness: In a hyper-converged architecture, data does not need to be transferred across multiple systems, resulting in low latency and high timeliness.
- Simplified Management: The hyper-converged solution makes the entire data ecosystem easier to manage, eliminating the need for expertise in multiple product technologies and programming languages—basic SQL knowledge is sufficient for operation.
High availability
- In the event of a few nodes failing, YMatrix's state data management service can automatically perform node failover without human intervention, making it transparent to the business. This reduces your labor costs and lowers human risk.
Rich toolchain ecosystem
- Compatible with the Postgres/Greenplum ecosystem. Covers a variety of scenarios, including data migration, writing, performance testing, backup, and recovery.
Supports standard SQL
- Supports SQL: 2016 standard, covering data types, scalar expressions, query expressions, character sets, data allocation rules, set operators, and more.
Full support for ACID transactions
- Ensures data integrity and consistency, avoiding complex error checks and handling at the user level, and reducing your operational burden.

Hyper-Converged Architecture

Overview

Compared to databases with other architectures, YMatrix's hyper-convergence is reflected in the integration of multiple data types and data operations, enabling high-performance support for multiple data types + multiple scenarios within a single database. In terms of YMatrix's internal architecture, it has microkernel characteristics. Building upon common foundational components, it provides different storage and execution engine combinations tailored to diverse business scenario requirements, thereby enabling distinct microkernels to achieve targeted improvements in write, storage, and query performance.

Diagram

The diagram below describes the composition and functions of the hyper-converged architecture within YMatrix:

The following sections provide a detailed overview of the components of the YMatrix hyper-converged architecture.

Common Core Components
These primarily refer to shared resources within the database, such as memory management, network communication protocols, and basic data structures.
Storage Engines and Execution Engines
These refer to the combinations of storage engines and execution engines that can be selected when creating tables in YMatrix under different scenarios. Each combination can form a microkernel.
Optimizer
This converts an SQL string into a query plan and generates the optimal plan based on the capabilities provided by the selected underlying storage engine.
Logging, Transactions, Concurrency, Lock Management, Snapshots
These are standard components within the YMatrix kernel that provide generic functionality such as concurrency control, transaction mechanisms, and fault recovery.
SQL
This refers to the standard SQL interface between YMatrix and the client.
Authentication, Roles, Auditing, Encryption, Monitoring, Backup, Recovery, High Availability
These are some other common database features supported by YMatrix.

Database Architecture

Overview

YMatrix's high-level database architecture is based on the classic MPP (massively parallel processing) database technology architecture with some enhancements.

Diagram

The diagram below describes the core components that make up a YMatrix database system and how they work together:

The following sections provide a detailed introduction to the various components of the YMatrix database system and their functions.

Master Node
- Responsible for establishing and managing session connections with clients.
- Responsible for parsing SQL statements and forming query plans (Query Plan).
- Distributes query plans to Segments, monitors query execution processes, and collects feedback results to return to clients.
- The Master does not store business data; it only stores the Data Dictionary, which is the collection of definitions and attributes for all data elements used in the system.
- In a cluster, only one Master is allowed, and a primary-standby configuration can be adopted, with the standby node referred to as Standby.
Data Nodes (Segments)
Responsible for storing and distributing the execution of SQL queries.
- The key to achieving optimal performance with YMatrix lies in evenly distributing data and workloads across a large number of Segment nodes with identical capabilities, enabling all Segment nodes to begin working on a task simultaneously and complete their tasks concurrently.
Client
This term is used as a generic term to refer to any device, client, or application capable of accessing the database.
MatrixGate
- MatrixGate, abbreviated as mxgate, is YMatrix's high-speed streaming data write tool. For more information, see mxgate.
Network Layer (Interconnect)
- Refers to the network layer in the database architecture, which handles inter-process communication between Segments and the network infrastructure supporting such communication.
State Data Management Service (Cluster Service)
Cluster Service ensures database high availability by managing node status information. YMatrix uses an ETCD cluster to implement this service: when a database node fails, ETCD retrieves the stored node status data, identifies the currently healthy node as the new primary node, and promotes it to ensure cluster availability.
For example, if the Master node fails, its Standby node is promoted to Master; if the Standby node fails, it has no impact on the entire cluster. Similarly, if the Primary node fails, its Mirror node is promoted to Primary; if the Mirror node itself fails, it has no impact on the entire cluster. For more information, see Failure Recovery.

Highly available architecture

Overview

YMatrix uses its proprietary ALOHA (Advanced Least Operation High Availability) technology to ensure high availability of the cluster. When a single point of failure occurs in the cluster, the corresponding standby instance will switch roles and replace the failed instance to provide services, thereby ensuring uninterrupted cluster services.

Faulty node instance	Impact	Time
Mirror	When Mirror fails, it does not affect users' queries of data in the corresponding Primary, but the faulty Mirror must be manually restored	Seconds in a network-connected environment
Standby	When the Standby node fails, it does not affect users' queries of cluster data, but the failed Standby node must be manually restored
Primary	When the Primary node fails, users cannot query the corresponding data and must wait for the system to automatically promote the corresponding Mirror to Primary before querying again
Master	When the Master fails, the cluster becomes unavailable, and users cannot query the corresponding data. They must wait for the system to automatically promote the corresponding Standby to Master before querying again.

Notes!
For detailed steps on fault recovery, see Fault Recovery.

Principle

ALOHA services use a highly available architecture based on ETCD clusters. This architecture solves the problem of automatic failover, which essentially means implementing an automatic master node election mechanism and maintaining strong data consistency.

This mechanism includes the following steps:

Status data collection There are two sources of data reported to ETCD by the data cluster:
- Regular detection and reporting regardless of whether there is a fault in the cluster or not
- Cluster failure, service process is aware and reported - Status data storage The status data is stored in the ETCD cluster. ETCD is a mature distributed storage solution implemented based on the Raft protocol. As a state storage layer, it can provide reliable and unique cluster status information for process management and control in the event of a cluster failure.

Notes!
ETCD clusters will have more disk operations during storage state, so ideally, several separate physical machines should be used to deploy ETCD clusters, which can fully guarantee the performance of ETCD. However, in actual applications, the number of physical machines may not be enough to support ETCD independent deployment, so odd ETCD instances will be randomly deployed on some or all physical machines in the data cluster to save the status data of nodes and instances.

Process Control Based on the distributed and consistent storage provided by ETCD, the postmaster process of the service included in the API layer will obtain the status of each instance in real time. In this way, when a failure occurs, the roles of different primary and secondary instances will be scheduled instantly, thereby minimizing the impact of the failure.

The schematic diagram is as follows:

You can see:

Clients such as management tools or MatrixUI send requests to the API layer;
The API layer issues requests to the data cluster, and the data cluster provides state data for the API layer;
The API layer reports cluster status data to the ETCD cluster, and can also read cluster status and service data from the ETCD cluster.

Appendix

ETCD

ETCD is a distributed key-value storage cluster used to store and retrieve data in a distributed system. ETCD uses Raft consistency algorithm to ensure data consistency and reliability. It is designed to be highly available with strong failure recovery capabilities. ETCD provides a simple RESTful API interface, allowing applications to easily access and manipulate key-value pairs stored there.

Important concepts:

Leader: The manager of the ETCD cluster is elected and unique.
Follower: Follower, synchronizes the logs received from the leader, and the default status is when ETCD is started.
Candidate: A candidate who can initiate a leader election.