When working with Apache Kafka you might want to write data from a Kafka topic to a local text file. This is actually very easy to do with Kafka Connect. Kafka Connect is a framework that provides scalable and reliable streaming of data to and from Apache Kafka. With Kafka Connect, writing a topic’s content to a local text file requires only a few simple steps.
Starting Kafka and Zookeeper
The first step is to start the Kafka and Zookeeper servers. Check out our Kafka Quickstart Tutorial to get up and running quickly.
Creating a Topic to Write to
Creating a topic from the command line is very easy to do. In this example we create the my-connect-test
topic.
$KAFKA_HOME/bin/kafka-topics.sh \
--create \
--zookeeper localhost:2181 \
--replication-factor 1 \
--partitions 1 \
--topic my-connect-test
Creating a Sink Config File
Since we are reading from a Kafka topic and writing to a local text file, this file is considered our “sink”. Therefore we will use the FileSink
connector. We must create a configuration file to use with this connector. For the most part you can copy the example available in $KAFKA_HOME/config/connect-file-sink.properties
. Below is an example of our my-file-sink.properties
file.
#my-file-sink.properties config file
name=local-file-sink
connector.class=FileStreamSink
tasks.max=1
file=/tmp/my-file-sink.txt
topics=my-connect-test
This file indicates that we will use the FileStreamSink
connector class, read data from the my-connect-test
Kafka topic, and write records to /tmp/my-file-sink.txt
. We are also only using 1 task to read this data from Kafka.
Creating a Worker Config File
Processes that execute Kafka Connect connectors and tasks are called workers
. In this example we can use the simpler of the two worker types, standalone workers (as opposed to distributed workers). You can find a sample config file for standalone workers in $KAFKA_HOME/config/connect-standalone.properties
. We will call our file my-standalone.properties
.
# my-standalone.properties worker config file
#bootstrap kafka servers
bootstrap.servers=localhost:9092
# specify input data format
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
# The internal converter used for offsets, most will always want to use the built-in default
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
# local file storing offsets and config data
offset.storage.file.filename=/tmp/connect.offsets
The main change in this example in comparison to the default is the key.converter
and value.converter
settings. Since our data is simple text, we use the StringConverter
types.
Writing Data to a Kafka Topic
We now need to write some sample data to our Kafka topic. This can easily be done with the kafka-console-producer
which takes data from STDIN and writes to Kafka.
#using the kafka-console-producer to write to Kafka topic
$KAFKA_HOME/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-connect-test
writing line 1
writing line 2
writing line 3
Running Kafka Connect
Now it is time to run Kafka Connect with our worker and sink configuration files. As mentioned before we will be running Kafka Connect in standalone mode. Here is an example of doing this with our custom configuration files:
$KAFKA_HOME/bin/connect-standalone.sh my-standalone.properties my-file-sink.properties
At this point the all data available in the Kafka topic should be written to our local text file. We can confirm this by reading the file contents.
#print contents of local sink file
cat /tmp/my-file-sink.txt
writing line 1
writing line 2
writing line 3
* More about Kafka Connect can be found at http://docs.confluent.io/2.0.0/connect/intro.html,