Producer in Python

Phew ! Congrats on arriving to this section. We will learn how to create a Producer and write to your Kafka Cluster.

Getting familiar with Kafka-Python client

"Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much like the official java client, with a sprinkling of pythonic interfaces (e.g., consumer iterators)."

circle-info

Getting familiar with kafka-python producer

circle-info

The producer is thread safe and sharing a single producer instance across threads will generally be faster than having multiple instances.

Checkout the documentation for creating a Kafka Producer

Create a Kafka Producer

Once you've read enough :) go to the producer.py file and create a producer under produce_messages method

circle-info

Bonus exercise: modify your producer to include the following parameters on creation

  • A client id with a mnemonic name of your preference

  • Remove guarantee of delivery

  • Higher number of retries in case of failure

Send Messages

Sending messages to kafka it could be as simple as Sending Strings :) This code will show you how to send it to a topic called twitter-handlers

Serialize your data to Avro

In real life, the probability of sending single strings to a topic is barely true. Reality is companies and projects are loaded with heavily complex data structures and schemas to be processed by services and applications.

We will work with the following schema:

The schema provided will help us to serialize the data into avro format.

We are going to run through the Avro code quickly before heading off to sending our messages:

Sending messages with Avro format

The code above is not completed :) We have just created each avro message. Could you guess what else is needed ?

Try yourself to send the messages.

In order to check if you are correctly sending the messages use the kafka-console-producer previously seen or wait until next section :-P

circle-exclamation
circle-info

Bonus Exercise:

  • Try to execute the producer with the other data file that is over the data folder and send to a different topic.

  • Try to define your own schema and data and play with it :-) Have fun!

Last updated