Data Mirroring is a process that synchronizes the contents of data-handling systems, such as databases, storage, message queues, and caches, in real time, maintaining the same state across two or more systems. This technology is commonly used to prevent data loss due to system failures or for data analysis purposes. In mKC the data mirroring service enables customers to implement data redundancy easily without setting up additional infrastructure.
Why is Data Mirroring Necessary?
Data mirroring is beneficial in the following scenarios:
1. Disaster Recovery : The Kafka cluster in use may experience data loss or system failures. In such cases, data mirroring allows quick recovery by using a backup cluster. If one cluster encounters an issue, it can immediately be replaced with the mirrored cluster, minimizing service interruption.
2. Data Analysis : By replicating data from clusters located in distant geographic locations to a central cluster, it is possible to integrate various data sources for analysis. This enables comprehensive, real-time analysis of business data, facilitating valuable business insights.
Key Concepts of Data Mirroring
Mirroring Job
This is the unit of work in data mirroring. Information such as the Kafka Connect cluster where the job is executed, the source and target clusters, and job components are managed as a single unit.
What is the Relationship Between Data Mirroring and Kafka Connect?
The data mirroring service operates based on the Kafka Connect framework and is composed of several MirrorMaker connectors. Thus, understanding Kafka Connect is essential for using the data mirroring service. Data replication from the source to the target cluster is carried out using Kafka Connect's various functions, where dedicated connectors like MirrorSourceConnector and MirrorCheckpointConnector serve as the main components for data mirroring.
For more detailed information on Kafka Connect, please refer to the Kafka Connect documentation.
Source and Target Cluster
- Source Cluster : The original Kafka cluster that provides data in a data mirroring job.
- Target Cluster : The destination Kafka cluster that receives the replicated data in a data mirroring job.
In a data mirroring job, data (topics) is read from the source cluster and replicated to the target cluster.
Job Components
These refer to the connectors that comprise a data mirroring job. The types of components used in data mirroring include:
- Data Replication (MirrorSourceConnector): Reads data from the source cluster and replicates it to the target cluster. It is a combination of Sync and Source connectors.
- Consumer Offset Synchronization (MirrorCheckpointConnector): Synchronizes the consumer group offset between the source and target clusters, helping consumers resume processing from the correct location in case of cluster failure.