Skip to main content

Apache Kafka

Apache Kafka is a distributed event streaming platform capable of handling trillions of events per day, designed for high-throughput, fault-tolerant real-time data pipelines and streaming applications.

Supported Versions and Architectures

  • Versions: Kafka 2.0 ~ 2.5 (built on Scala 2.12)
  • Architectures: Single-node or cluster

Supported Data Types

CategoryData Types
BooleanBOOLEAN
IntegerSHORT, INTEGER, LONG
Floating PointFLOAT, DOUBLE
NumericNUMBER
StringCHAR, VARCHAR, STRING, TEXT
BinaryBINARY
CompositeARRAY, MAP, OBJECT
Date/TimeTIME, DATE, DATETIME, TIMESTAMP
UUIDUUID

Data Structure Modes

Kafka supports two data structure modes to meet different business requirements:

Purpose: Handles complete DML operations (INSERT, UPDATE, DELETE) with standardized event format for CDC scenarios.

Use Case: CDC log queues where relational database changes are streamed through Kafka to downstream systems.

Data Format:

{
"ts": 1727097087513,
"op": "DML:UPDATE",
"opTs": 1727097087512,
"table": "table_name",
"before": {},
"after": {}
}

Sync Modes

  • Full Only: Reads from the beginning and stops at the current position
  • Full + Incremental: Reads all historical data then continues with real-time streaming
  • Incremental Only: Starts from current position or specified timestamp

Limitations

  • Authentication: Currently supports only authentication-free Kafka instances
  • Data Types: Source data types must be compatible with target system requirements
  • Delivery Semantics: At-least-once delivery may cause duplicates; ensure target-side idempotency
  • Consumer Groups: Each consumption thread uses different consumer group IDs