Getting Started with Apache Kafka: Introduction

If you’re a Java developer curious about real-time data streaming, event-driven architecture, or simply want to build fast and scalable data pipelines, Apache Kafka is a must-learn tool. In this beginner-friendly post, we’ll explore what Kafka is, how it works, its core components, and why it’s a game-changer in the world of data.

Getting Started with Apache Kafka

What is Apache Kafka?

Apache Kafka is an open-source distributed event streaming platform designed to handle high-throughput, low-latency data processing. Originally developed by LinkedIn and later donated to the Apache Software Foundation, Kafka is now used by thousands of companies like Netflix, Uber, Airbnb, and more.

Think of Kafka as a high-performance messaging system that lets your applications talk to each other using events (messages) — quickly, reliably, and at scale.

Why Do Developers Use Kafka?

Before Apache Kafka, developers relied on traditional messaging systems (e.g., JMS, RabbitMQ, ActiveMQ). These systems work well for certain use cases, but they often struggle with:

  • Handling high volumes of data
  • Scalability across multiple machines
  • Real-time analytics

Kafka solves these problems with:

1. High Throughput

Kafka can process millions of messages per second on modest hardware. It’s designed for high performance, making it ideal for large-scale, real-time data pipelines and streaming applications.

Example: A social media platform can handle real-time likes, comments, and shares without delays.

2. Distributed Architecture

Kafka is built as a distributed system, which means it can scale horizontally by adding more brokers and partitions. This makes it resilient and able to handle growing loads.

Example: If a startup grows into a global company, Kafka can easily expand to handle increased traffic without changing the core application.

3. Persistent Message Storage

Unlike traditional messaging queues, Kafka stores data on disk with a commit log architecture, allowing consumers to re-read messages at any time.

Example: If a consumer app goes down temporarily, it can resume from where it left off once it restarts.

4. Fault Tolerance

Kafka replicates data across multiple brokers, so if one broker fails, data is still safe and available from other replicas.

Example: In case of hardware failure, a backup broker can continue serving producers and consumers without data loss.

5. Real-Time Stream Processing

Kafka enables real-time processing of events using Kafka Streams or integration with tools like Apache Flink or Apache Spark.

Example: A ride-sharing app can match riders and drivers in real time by processing location updates instantly.

Core Concepts of Apache Kafka

Let’s break down Kafka into simple building blocks:

1. Producer

A producer is any application or service that sends (publishes) messages (events) to Kafka topics.

Example:

  • In a ride-hailing app, the GPS tracking service produces location data to a Kafka topic every few seconds.
  • In an e-commerce platform, when a user places an order, the order service acts as a producer and sends the order details (message) to the orders topic.
  • A payment service sends transaction details to Kafka topics.

2. Consumer

A Consumer reads (subscribes to) messages from a Kafka topic. Consumers can be grouped into consumer groups to enable load balancing.

Example:

  • In the e-commerce platform, an email service may consume messages from the orders topic to send order confirmation emails.
  • In a banking system, a fraud detection engine might consume transaction data in real time to check for anomalies.

3. Topic

A Topic is a category or stream to which records are published. Topics are partitioned, allowing parallel processing and scalability.

Example:

  • Payments, orders, and inventory can be separate topics in an online shopping app.
  • In IoT, each sensor type (temperature, humidity, motion) could have its own topic.

4. Broker

A Broker is a Kafka server that stores topic data and handles client requests (from producers and consumers). Kafka clusters usually consist of multiple brokers.

Example:

  • A Kafka cluster with 3 brokers stores and balances data from multiple topics.
  • If one broker goes down, the cluster still works by redirecting traffic to other brokers (thanks to replication).

5. Partition

Each topic is split into partitions, which allow Kafka to scale horizontally. Each partition is an ordered log of messages, and they are distributed across Kafka brokers.

Example:

  • A topic named “user-activity” may have 4 partitions, each capable of handling messages from thousands of users, ensuring faster processing.
  • For a logs topic collecting logs from microservices, partitions can help balance load and allow independent log processing.

6. Offset

Each message in a partition has a unique offset, which acts as its identifier. Consumers keep track of offsets to know which message to read next.

Example:

  • If a consumer reads up to offset 150 in a partition, it will resume from offset 151 next time.
  • In case of a failure, a consumer can restart and begin reading from a specific offset, allowing reliable processing.

7. Zookeeper (until Kafka 2.8)

Zookeeper manages Kafka cluster metadata, leader election, and configuration synchronization. Starting with Kafka 2.8+, Zookeeper is optional due to the new KRaft mode (Kafka Raft).

Example:

  • Zookeeper helps Kafka choose which broker will be the leader for a partition.
  • It monitors which brokers are alive and manages metadata updates across the cluster.

Apache Kafka Architecture Diagram (Explained Simply)

apache kafka architecture diagram
  • Producers send messages to topics.
  • Topics are stored in partitions across brokers.
  • Consumers read from the topics at their own pace.

Real-World Use Cases of Kafka

Kafka powers real-time systems around the world. Common use cases include:

  1. Real-Time Analytics
    → Log aggregation, metrics processing, and dashboards.
  2. Microservices Communication
    → Services publish and consume events for decoupling.
  3. Event Sourcing
    → Store events (e.g., account created, order shipped) instead of just the current state.
  4. Data Ingestion Pipelines
    → Stream data from apps to data lakes or warehouses.

Code Example: Java Kafka Producer (Hello Kafka)

Here’s a basic Java Kafka producer example that sends a message to a topic.

Maven Dependency

Add this to your pom.xml:

<dependencies>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>3.9.1</version>
    </dependency>
</dependencies>

Java Code: Simple Kafka Producer

import org.apache.kafka.clients.producer.*;
import java.util.Properties;

public class HelloKafkaProducer {
    public static void main(String[] args) {
        String bootstrapServers = "localhost:9092";
        String topic = "my-first-topic";

        // Producer properties
        Properties props = new Properties();
        props.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
        props.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, 
            "org.apache.kafka.common.serialization.StringSerializer");
        props.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, 
            "org.apache.kafka.common.serialization.StringSerializer");

        // Create Kafka producer
        KafkaProducer<String, String> producer = new KafkaProducer<>(props);

        // Send a message
        ProducerRecord<String, String> record = new ProducerRecord<>(topic, "hello", "Kafka World!");
        producer.send(record);

        // Flush and close
        producer.flush();
        producer.close();

        System.out.println("Message sent to Kafka successfully!");
    }
}

Java Code: Simple Kafka Consumer

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.KafkaConsumer;

import java.time.Duration;
import java.util.Collections;
import java.util.Properties;

public class SimpleKafkaConsumer {

    public static void main(String[] args) {

        // 1. Define consumer properties
        Properties props = new Properties();
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); // Kafka broker
        props.put(ConsumerConfig.GROUP_ID_CONFIG, "my-java-consumer-group");
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, 
                  "org.apache.kafka.common.serialization.StringDeserializer");
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, 
                  "org.apache.kafka.common.serialization.StringDeserializer");
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); // read from the beginning

        // 2. Create Kafka consumer
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

        // 3. Subscribe to a topic
        consumer.subscribe(Collections.singletonList("my-first-topic"));

        System.out.println("Kafka consumer started...");

        // 4. Poll for messages
        try {
            while (true) {
                ConsumerRecords<String, String> records = 
                        consumer.poll(Duration.ofMillis(1000)); // wait for new messages
                for (ConsumerRecord<String, String> record : records) {
                    System.out.printf("Received Message -> Key: %s, Value: %s, Partition: %d, Offset: %d\n",
                            record.key(), record.value(), record.partition(), record.offset());
                }
            }
        } finally {
            consumer.close();
        }
    }
}

How to Run It

  1. Start your Kafka server and Zookeeper.
  2. Create the topic manually using Kafka CLI (or let it auto-create if enabled):
    kafka-topics.bat --create --topic my-first-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
  3. Run the Java class. You’ll see (Producer side):
    Message sent to Kafka successfully!
  4. You’ll see (Consumer side):
    Kafka consumer started…
    Received Message -> Key: hello, Value: Kafka World!, Partition: 0, Offset: 0

Conclusion

By now, you should have a solid foundational understanding of:

  • What Kafka is and why it’s used
  • The core components: topics, producers, consumers, brokers
  • Kafka’s real-world use cases
  • A simple Java producer example

What’s Next?

In the next post, we’ll learn how to:

Stay tuned!


🔁 Want to revisit the lessons or explore more?

⬅️ Return to the Apache Kafka Tutorial Home Page

Whether you want to review a specific topic or go through the full tutorial again, everything is structured to help you master Apache Kafka step by step.

Share with friends

Leave a Comment

Your email address will not be published. Required fields are marked *