When working with Kafka you might need to write data from a local file to a Kafka topic. This is actually very easy to do with Kafka Connect. Kafka Connect is a framework that provides scalable and reliable streaming of data to and from Apache Kafka. With Kafka Connect, writing a file’s content to a topic requires only a few simple steps.

Starting Kafka and Zookeeper

The first step is to start the Kafka and Zookeeper servers. Check out our Kafka Quickstart Tutorial to get up and running quickly.

Creating a Topic to Write to

Creating a topic from the command line is very easy to do. In this example we create the my-connect-test topic.

$KAFKA_HOME/bin/kafka-topics.sh \
  --create \
  --zookeeper localhost:2181 \
  --replication-factor 1 \
  --partitions 1 \
  --topic my-connect-test

Creating a Source Config File

Since we are reading the contents of a local file and writing to Kafka, this file is considered our “source”. Therefore we will use the FileSource connector. We must create a configuration file to use with this connector. For this most part you can copy the example available in $KAFKA_HOME/config/connect-file-source.properties. Below is an example of our my-file-source.properties file.

#my-file-source.properties config file
name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=/tmp/my-test.txt
topic=my-connect-test

This file indicates that we will use the FileStreamSource connector class, read data from the /tmp.my-test.txt file, and publish records to the my-connect-test Kafka topic. We are also only using 1 task to push this data to Kafka, since we are reading/publishing a single file.

Creating a Worker Config File

Processes that execute Kafka Connect connectors and tasks are called workers. Since we are reading data from a single machine and publishing to Kafka, we can use the simpler of the two types, standalone workers (as opposed to distributed workers). You can find a sample config file for standalone workers in $KAFKA_HOME/config/connect-standalone.properties. We will call our file my-standalone.properties.

# my-standalone.properties worker config file

#bootstrap kafka servers
bootstrap.servers=localhost:9092

# specify input data format
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter

# The internal converter used for offsets, most will always want to use the built-in default
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false

# local file storing offsets and config data
offset.storage.file.filename=/tmp/connect.offsets

The main change in this example in comparison to the default is the key.converter and value.converter settings. Since our file contains simple text, we use the StringConverter types.

Running Kafka Connect

Now it is time to run Kafka Connect with our worker and source configuration files. As mentioned before we will be running Kafka Connect in standalone mode. Here is an example of doing this with our custom config files:

$KAFKA_HOME/bin/connect-standalone.sh my-standalone.properties my-file-source.properties

Our input file /tmp/my-test.txt will be read in a single process to the Kafka my-connect-test topic. Here is a look at the file contents:

this is line 1
this is line 2
this is line 3

Reading from the Kafka Topic

If we read from the Kafka topic that we created earlier, we should see the 3 lines in the source file that were written to Kafka:

#read all from topic
$KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic my-connect-test --from-beginning

this is line 1
this is line 2
this is line 3

* More about Kafka Connect can be found at http://docs.confluent.io/2.0.0/connect/intro.html,

Leave a Reply

Writing Text File contents to Kafka with Kafka Connect