Installing and Configuring Apache Kafka on Ubuntu 20.04

July 8, 2024 Kafka
Avatar photo

Russell Bates

Apache Kafka: A Powerful Distributed Messaging System

Apache Kafka stands out as a highly efficient distributed message broker, engineered to manage substantial volumes of real-time data. Its architecture allows for impressive scalability and fault tolerance, surpassing the throughput capabilities of alternative message brokers like ActiveMQ and RabbitMQ. While primarily utilized as a publish/subscribe messaging platform, many organizations also leverage Kafka for log aggregation, taking advantage of its persistent storage for published messages.

The publish/subscribe paradigm implemented by Kafka enables one or more producers to broadcast messages without concern for the number of consumers or their processing methods. Subscribed clients receive automatic notifications about updates and newly created messages. This approach offers superior efficiency and scalability compared to systems where clients must periodically poll for new message availability.

In this tutorial, we’ll walk you through the process of installing and configuring Apache Kafka 2.8.2 on an Ubuntu 20.04 server.

Before You Begin

To successfully complete this tutorial, ensure you have:

  1. An Ubuntu 20.04 server with a minimum of 4 GB RAM.
  2. A non-root user with sudo privileges (refer to our Initial Server Setup guide if needed).
  3. OpenJDK 11 installed on your server (follow our tutorial on How To Install Java with APT on Ubuntu 20.04).

Note: Kafka installations on systems with less than 4GB of RAM may experience service failures.

Step 1: Creating a Dedicated Kafka User

To enhance security, we’ll create a dedicated user for the Kafka service. This practice minimizes potential damage to your Ubuntu machine in case of a Kafka server compromise.

  1. Log in to your server as your non-root sudo user.
  2. Create a new user named kafka:
   sudo adduser kafka
  1. Follow the prompts to set a password and complete user creation.
  2. Add the kafka user to the sudo group:
   sudo adduser kafka sudo
  1. Switch to the kafka user:
   su -l kafka

Step 2: Downloading and Extracting Kafka Binaries

Now, we’ll download and extract the Kafka binaries into the kafka user’s home directory.

  1. Create a Downloads directory:
   mkdir ~/Downloads
  1. Download Kafka using curl:
   curl "https://downloads.apache.org/kafka/2.8.2/kafka_2.13-2.8.2.tgz" -o ~/Downloads/kafka.tgz
  1. Create and navigate to a kafka directory:
   mkdir ~/kafka && cd ~/kafka
  1. Extract the downloaded archive:
   tar -xvzf ~/Downloads/kafka.tgz --strip 1

Step 3: Configuring the Kafka Server

We’ll now modify Kafka’s configuration to enable topic deletion and specify a custom log directory.

  1. Open the server.properties file:
   nano ~/kafka/config/server.properties
  1. Add the following line at the end of the file:
   delete.topic.enable = true
  1. Locate the log.dirs property and update it:
   log.dirs=/home/kafka/logs
  1. Save and close the file.

Step 4: Creating systemd Unit Files and Starting Kafka

To manage Kafka as a system service, we’ll create systemd unit files for both Zookeeper and Kafka.

  1. Create the Zookeeper unit file:
   sudo nano /etc/systemd/system/zookeeper.service

Add the following content:

   [Unit]
   Requires=network.target remote-fs.target
   After=network.target remote-fs.target

   [Service]
   Type=simple
   User=kafka
   ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
   ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
   Restart=on-abnormal

   [Install]
   WantedBy=multi-user.target
  1. Create the Kafka unit file:
   sudo nano /etc/systemd/system/kafka.service

Add the following content:

   [Unit]
   Requires=zookeeper.service
   After=zookeeper.service

   [Service]
   Type=simple
   User=kafka
   ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
   ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
   Restart=on-abnormal

   [Install]
   WantedBy=multi-user.target
  1. Start the Kafka service:
   sudo systemctl start kafka
  1. Verify the service status:
   sudo systemctl status kafka
  1. Enable Kafka and Zookeeper to start on boot:
   sudo systemctl enable zookeeper
   sudo systemctl enable kafka

Step 5: Testing the Kafka Installation

Let’s verify our Kafka setup by publishing and consuming a test message.

  1. Create a test topic:
   ~/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic TutorialTopic
  1. Publish a test message:
   echo "Hello, World" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null
  1. Consume the message:
   ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TutorialTopic --from-beginning
  1. In a new terminal, publish another message:
   echo "Hello World from Sammy at DigitalOcean!" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

Step 6: Hardening the Kafka Server

To enhance security, we’ll remove the kafka user’s sudo privileges and lock the account.

  1. Log out of the kafka user and log in as a non-root sudo user.
  2. Remove kafka from the sudo group:
   sudo deluser kafka sudo
  1. Lock the kafka user’s password:
   sudo passwd kafka -l

Step 7: Installing KafkaT (Optional)

KafkaT is a useful tool for managing and monitoring your Kafka cluster.

  1. Install Ruby and build-essential:
   sudo apt install ruby ruby-dev build-essential
  1. Install KafkaT:
   sudo CFLAGS=-Wno-error=format-overflow gem install kafkat
  1. Create a KafkaT configuration file:
   nano ~/.kafkatcfg

Add the following content:

   {
     "kafka_path": "~/kafka",
     "log_path": "/home/kafka/logs",
     "zk_path": "localhost:2181"
   }
  1. Test KafkaT:
   kafkat partitions

Wrap Up

You’ve successfully set up Apache Kafka on your Ubuntu 20.04 server. This powerful distributed messaging system is now ready for integration into your applications. For more advanced usage and configuration options, consult the official Kafka documentation.