What is Kafka Connect?
Kafka Connect is a tool for stably streaming data between Apache Kafka® and other data systems. This enables real-time movement of large datasets either into Kafka or out of Kafka into other systems. Kafka Connect facilitates the easy configuration of data pipelines using Connectors.
mKC provides monitoring for Connect Clusters and a convenient interface for managing connectors, allowing the seamless connection between various data sources and target systems.
Key Concepts of Kafka Connect
Connectors
Connector is one of the most important components in the Kafka Connect Eco System, enabling easy integration and transmission of data between data sources, Sinks, and Kafka Brokers.
There are two main types of Connectors.
-
Source Connector
It is a connector that retrieves data from external systems and writes it to Kafka topics. It reads data from various sources such as databases, file systems, HTTP endpoints, and more. For example, using a JDBC Source Connector, you can fetch data from a relational database and write it into Kafka topics. -
Sink Connector
It is a connector that retrieves data from Kafka topics and sends it out to external systems. It sends data to databases, file systems, search indexes, and more. For example, using a JDBC Sink Connector, you can insert data from Kafka topics into a relational database.
Tasks
Task refers to the unit of individual data transfer. Each connector can consist of one or more Tasks, and each Task can run in parallel, allowing the efficient processing of data transfers. The number of Tasks can be adjusted in each connector's settings, and this helps you optimize the system's performance and scalability by adjusting the parallelism in data transfer.
Workers
Worker is a node that provides the execution environment for Kafka Connect, responsible for actually running connectors and Tasks. Workers can be run independently, or clustered to operate in a distributed environment.
- Single Node Mode
In this mode, Connect is run with a single worker. It is suitable for simple tests or small-scale tasks. - Distributed Mode
In this mode, multiple workers form a cluster to process connectors and tasks in a distributed manner. It is suitable for large-scale data transfer tasks and supports failover and load balancing among workers.
Key Functions of Workers
- Task Execution
Workers execute the tasks defined by connectors. - Task Allocation & Reallocation
In distributed mode, each worker automatically receives tasks to execute, and in the case of worker failure, reallocates them to other workers. - Configuration Management
Workers manage the configurations of connectors and tasks. Configurations can be updated and monitored through the REST API. - Data Transformation & Processing: Workers apply the necessary transformation and processing logic during data transfer.
DLQ (Dead Letter Queue)
A DLQ (Dead Letter Queue) in Kafka Connect is a special Kafka topic used to store messages that are undeliverable or contain errors, during data transfer. This helps prevent data loss in case of failures and allows later analysis and reprocessing.