Kafka Highlevelconsumer

x and the legacy high-level consumer which required ZooKeeper instead of the more modern Java API. To make multiple consumers consume the same partition, you must increase the number of partitions of the topic up to the parallelism you want to achieve or put every single thread into the separate consumer groups, but I think the latter is not desirable. This page provides Java source code for Kafka09ConsumerClientTest. Download Presentation Apache Kafka An Image/Link below is provided (as is) to download presentation. or Get the max offset at the point when the messages are about to be read and stop till that max offset is reached. 1, the high-level consumer stores these in ZooKeeper, but Kafka expects to ship its own API for this in a future release. For example, we had a "high-level" consumer API which supported consumer groups and handled failover, but didn't support many of the more. Regarding data, we have two main challenges. We prefer to have several small services, each one with a single and well defined responsibility. So for the unit testing of Kafka Streams there comes something called Mocked Streams. It’s a C++11 wrapper built on top of librdkafka, a high performance C client library for the Apache Kafka protocol. To understand this result, let me explain how the high level consumer (ZookeeperConsumerConnector) works in Kafka. Kafka in Clojure; Writing a Kafka Producer and High Level Consumer in Clojure; Hello World Kafka * Libraries and projects. Because I'm using Kafka as a 'queue of transactions' for my application, I need to make absolutely sure I don't miss or re-read any messages. The other old API is called high-level consumer or ZookeeperConsumerConnector. Kafka - A great choice for large scale event processing Posted on December 6th, 2016 by Gayathri Yanamandra Kafka is a highly scalable, highly available queuing system, which is built to handle huge message throughput at lightning-fast speeds. This client also interacts with the server to allow groups of consumers to load bal. So, by using the Kafka high-level consumer API, we implement the Receiver. Implements the Zookeeper-backed consumer implementation that offers offset management, load balancing and automatic failovers. 1 and assume it and ZooKeeper are running on localhost. Kafka provides the option to store all the offsets for a given consumer group in a designated broker (for that group) called the offset manager. It uses the high level consumer API provided by Kafka to read messages from the broker This input will read events from a Kafka topic. Then jobs launched by Kafka - Spark Streaming processes the data. 3) without using Receivers. Understand how Apache Kafka can be used by several third party system for big data processing, such as Apache Storm, Apache Spark, Hadoop, and more; Monitor Apache Kafka using tools like graphite and Ganglia; About : This book will give you details about how to manage and administer your Apache Kafka Cluster. Contribute to SOHU-Co/kafka-node development by creating an account on GitHub. I had to port some applications and implement new ones that would communicate with each other using this protocol. strategy to assign the subscription sets's topics's partitions to the. js client with Zookeeper integration for Apache Kafka 0. It will transparently handle the failure of servers in the Kafka cluster, and transparently adapt as partitions of data it fetches migrate within the cluster. To consume messages, we decided to use the high level consumer. Old Simple Consumer API class kafka. I am trying to use the High level consumer for batch reading the messages in the Kafka topic. x-compatible versions developed, Ckafka now has the compatibility with 0. Apache Kafka is the leading data landing platform. The default input codec is json. 9+), but is backwards-compatible with older versions (to 0. Args: groupId -- (str) kafka consumer group id, default: bench concurrency -- (int) Number of worker threads to spawn, defaults to number of cpus on current host duration -- (int) How long to run the benchmark for, default: 20s topic -- (str) the kafka topic to consume from, defaults to. To achieve higher throughput, we recommend using the Producer in asynchronous mode, so that produce() calls will return immediately and the producer may opt to send messages in larger batches. Events()` channel (set `"go. I want to help others avoid that pain if I can. You can configure it by setting the property offsets. If you run the command without parameters, it provides the usage of the command. Presented at Apache Kafka ATL Meetup on 3/26. 2, a problem we faced and some points / links to make life easier for others trying to do the same. With SimpleConsumer it was obvious that data was read only from one broker. Implements the Zookeeper-backed consumer implementation that offers offset management, load balancing and automatic failovers. You must configure topic_id, white_list or black_list. Alternatives. Kafka Consumers. Kafka - A great choice for large scale event processing Posted on December 6th, 2016 by Gayathri Yanamandra Kafka is a highly scalable, highly available queuing system, which is built to handle huge message throughput at lightning-fast speeds. Mocked Streams. It consumes each and every message from Kafka and records a timestamp for audit. kafka-python is designed to function much like the official java client, with a sprinkling of pythonic interfaces (e. This module provides low-level protocol support for Apache Kafka as well as high-level consumer and producer classes. 9) * Added rd_kafka_get_err_descs() (librdkafka 0. (As of Kafka 0. The Kafka High Level Consumer is presented with and to poll messages from Kafka topics using KafkaStream. 9+), but is backwards-compatible with older versions (to 0. The default input codec is json. 本文主要介绍了Kafka High Level Consumer,Consumer Group,Consumer Rebalance,Low Level Consumer实现的语义,以及适用场景。以及未来版本中对High Level Consumer的重新设计--使用Consumer Coordinator解决Split Brain和Herd等问题。 原创文章,转载请务必将下面这段话置于文章开. So the High Level Consumer is provided to abstract most of the details of consuming events from Kafka. This example shows how to use the high level consumer. This course covers the producer and consumer APIs, and data serialization and deserialization techniques, and strategies for testing Kafka. 'use strict' module. A quickstart is available that can walk you through downloading and starting the services. For most purposes, a high-level consumer comes in handy, especially when you want to … - Selection from Apache Kafka Cookbook [Book]. This client also interacts with the server to allow groups of consumers to load bal. The Consumer is listening all time the new incoming events for specific kafka topic. It includes libraries for Kafka consumers, producers, partitioners, callbacks, serializers, and deserializers. There are two approaches to this - the old approach using Receivers and Kafka’s high-level API, and a new approach (introduced in Spark 1. I want to have multiple logstash reading from a single kafka topic. It will give you a brief understanding of messaging and distributed logs, and important concepts will be defined. Kafka-pixy is written in Go and uses Shopify’s Sarama kafka client library. Further, the received data is stored in Spark executors. The only required configuration is the topic name. , consumer iterators). And yes, the corrupted message is lost and can’t be restored, so it's always a good idea to implement a CRC check before any message gets to Kafka. Although kafka guarantees ordering within a partition, kafka-node's HighLevelConsumer' resembles a sort of firehose, emitting messages as soon as they arrive, regardless of how fast the application is able to process them. I'll cover Kafka in detail with introduction to programmability and will try to cover the almost full architecture of it. Request batching is supported by the protocol as well as broker-aware request routing. * * @param request specifies the topic name, topic partition, starting byte offset, maximum bytes to be fetched. The default input codec is json. 2 Old Simple Consumer API class kafka. storage to kafka. 3) without using Receivers. This post really picks off from our series on Kafka architecture which includes Kafka topics architecture, Kafka producer architecture, Kafka consumer architecture and Kafka ecosystem architecture. Any previous subscription will be unassigned and unsubscribed first. If there is only one partition, only one broker processes messages for the topic and appends them to a file. A quickstart is available that can walk you through downloading and starting the services. 2), one solution is using the Kafka SimpleConsumer and adding the missing pieces of leader election and partition assignment. I am using Kafka 0. Kafka Architecture: Low-Level Design. The log message in a kafka topic should be read by only one of the logstash instances. Further, the received data is stored in Spark executors. Apache Kafka is a pull-based and distributed publish subscribe messaging system, topics are partitioned and replicated across nodes. In our installation, this command is available in the /usr/local/kafka/bin directory and is already added to our path during the installation. Low level consumer : I want to have a custom partition data consuming logic, e. Kafka provides the option to store all the offsets for a given consumer group in a designated broker (for that group) called the offset manager. Pure Python client for Apache Kafka - Python 3. Basically just make topic the same topic as your producer and you are ready to go. It’s a C++11 wrapper built on top of librdkafka, a high performance C client library for the Apache Kafka protocol. Then jobs launched by Kafka – Spark Streaming processes the data. Request batching is supported by the protocol as well as broker-aware request routing. 3) without using Receivers. (3 replies) Hello, I'm using the high level consumer with auto-commit disabled and a single thread per consumer, in order to consume messages in batches. Examples of events include: A periodic sensor reading such as the current. Getting Started with Apache Kafka for the Baffled, Part 1 Jun 16 2015 in Programming. Package ‘rkafka’ Description Apache 'Kafka' is an open-source message broker project developed by the Apache Soft- simple consumer,high level consumer and. * Added high level consumer: Rdkafka\KafkaConsumer (librdkafka 0. High level consumer : I just want to use Kafka as extermely fast persistent FIFO buffer and not worry much about details. 8 and later. In this post i am going to discuss the user of high level consumer with kafka 0. RunKit notebooks are interactive javascript playgrounds connected to a complete node environment right in your browser. So in 2013 there was Kavkaz your points and it included a bunch of new features such as topic reapplication LA complexion. Further, the received data is stored in Spark executors. js, Kafka is a enterprise level tool for sending messages across the microservices. Some features will only be enabled on newer brokers. Kafka provides the kafka-topics. Regarding data, we have two main challenges. For my use case, my consumer was a separate Express server which listened to events and stored them in a database. Here's a compatibility matrix that shows the Kafka client versions that are compatible with each combination of Logstash and the Kafka input plugin:. For my use case, my consumer was a separate Express server which listened to events and stored them in a database. SimpleConsumer { /** * Fetch a set of messages from a topic. 1, the high-level consumer stores these in ZooKeeper, but Kafka expects to ship its own API for this in a future release. Some features will only be enabled on newer brokers. Kafka基本操作命令感觉最近有一阵子没有玩Kafka了,都有点生疏了,我们今天就来讲解如何使用命令操作Kafka: 启动命令 创建Topic 查看Topic列表 删除Topic Producer和Consumer 写入和消费数据 其他命令 根据前一篇如何在阿里云上构建Kafka系统,想必大家都已经知道了. Collective Intellect - Worked on data scraper engine for various social network data providers like GNIP(Twitter data provider), facebook public pages, blogs, etc. A Kafka client that consumes records from a Kafka cluster. As the Zookeeper address is unavailable, the High Level Consumer API requiring Zookeeper address is not supported. On the other hand, if there are as many partitions as brokers, message processing is parallelized and there is up to m times (minus overhead) speedup. Kafka high level consumer coordinates such that the partitions being consumed in a consumer group are balanced across the group and any change in metadata triggers a consumer rebalance. There is a possibility to configure a Kafka Consumer to attend your request, which is very interesting in situations like you described. 9) * Added rd_kafka_get_err_descs() (librdkafka 0. Kafka Producer Type Changes to Producer. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies. Implements the Zookeeper-backed consumer implementation that offers offset management, load balancing and automatic failovers. You can configure it by setting the property offsets. Then jobs launched by Kafka — Spark Streaming processes the data. (As of Kafka 0. kafka-python is best used with 0. It is available for Scala 2. The Kafka input operator consumes data from the partitions of a Kafka topic for processing in Apex. Regarding data, we have two main challenges. io Find an R package R language docs Run R in your browser R Notebooks. When Kafka was originally created, it shipped with a Scala producer and consumer client. I can't yet speak to the performance comparison with the Zookeeper offset storage, but the high level consumer does support storing offsets in Kafka with 0. Consumer is the sample Consumer which uses this Kafka Receivers to generate DStreams from Kafka and apply a Output operation for every messages of the RDD. How does Kafka do all of this? Producers - ** push ** Batching Compression Sync (Ack), Async (auto batch) Replication Sequential writes, guaranteed ordering within each partition. 9+), but is backwards-compatible with older versions (to 0. * * @param request specifies the topic name, topic partition, starting byte offset, maximum bytes to be fetched. Library also forks other threads such as fetcher manager threads, and a leader thread,. Conduktor is the best Kafka desktop Client, on Mac, Windows & Linux. 1 with zkclient - 0. Verisign Public Writing data to Kafka 72 73. OffsetRequest. Also included is a case study for using Kafka with Spark Streaming. The controller is one of the brokers and is responsible for maintaining the leader/follower relationship for all the partitions. By default it will connect to a Zookeeper running on localhost. js client with Zookeeper integration for Apache Kafka 0. Gzip and Snappy compression is also supported for message sets. Kafka Architecture: Low-Level Design. emit (' channel ', message);} // Init the Kafka client. To achieve higher throughput, we recommend using the Producer in asynchronous mode, so that produce() calls will return immediately and the producer may opt to send messages in larger batches. Kafka Cluster Setup High Level Architecture Overview Unlock this content with a FREE 10-day subscription to Packt Get access to all of Packt's 7,000+ eBooks & Videos. kafka-php - Simple and high level consumer and producer client for Kafka Broker (0. sh command to create and modify topics. Kafka provides the kafka-topics. main idea: produce messages by key (each msg contains creation timestamp) this makes sure that each partition has ordered messages by produced time. Its main advantage is simplicity of use and ability to balance partitions between consumers, if multiple instances of the message source are running in parallel. This input will read events from a Kafka topic. In this post i am going to discuss the user of high level consumer with kafka 0. 3) without using Receivers. Kafka; KAFKA-966; Allow high level consumer to 'nak' a message and force Kafka to close the KafkaStream without losing that message. KAFKA INPUT OPERATOR Introduction. 9) * Added rd_kafka_get_err_descs() (librdkafka 0. It builds upon important stream processing concepts such as properly distinguishing between event time and processing time, windowing support, exactly-once processing semantics and simple yet efficient management of application state. Here are some examples to demonstrate how to use them. This webinar explores the use-cases and architecture for Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data. It is the de-facto standard for collecting and then streaming data to different systems. This post really picks off from our series on Kafka architecture which includes Kafka topics architecture, Kafka producer architecture, Kafka consumer architecture and Kafka ecosystem architecture. Let's resume the vocabulary around the Kafka's consuming model, to understand what's in the game: A consumer consumes the partitions of some topics. Some features will only be enabled on newer brokers. It consumes each and every message from Kafka and records a timestamp for audit. I'm using Kafka's high-level consumer. The subscription set denotes the desired topics to consume and this set is provided to the partition assignor (one of the elected group members) for all clients which then uses the configured partition. « Manually Installing the extension; High-level consumer » PHP Manual; Rdkafka; Examples; Examples Table of Contents. This functions shuts down the KAFKA Simple consumer rkafka. • High-level consumer API: takes care of this for you, stores offsets in ZooKeeper • Simple consumer API: nothing provided, it's totally up to you • What does this offset management allow you to do? • Consumers can deliberately rewind "in time" (up to the point where Kafka prunes), e. High-level consumer. Automatic Offset Committing This example demonstrates a simple usage of Kafka's consumer api that relying on automatic offset committing. This functions shuts down the KAFKA Simple consumer rkafka. x) 78 This is an alternative to the existing Kafka PHP Client which is in the. Introduction to Apache Kafka. Then jobs launched by Kafka – Spark Streaming processes the data. This course covers the producer and consumer APIs, and data serialization and deserialization techniques, and strategies for testing Kafka. By setting the same group id multiple processes indicate that they are all part of the same consumer group. This tutorial demonstrates how to process records from a Kafka topic with a Kafka Consumer. HighLevelConsumer. Apache Kafka is a pull-based and distributed publish subscribe messaging system, topics are partitioned and replicated across nodes. The number of partitions is the unit of parallelism in Kafka. Let's inspect different Kafka consumer implementations to see which is the most convenient for our use case. The Kafka input operator consumes data from the partitions of a Kafka topic for processing in Apex. So the High Level Consumer is provided to abstract most of the details of consuming events from Kafka. This example shows how to use the high level consumer. Kafka provides a flexible, scalable, and reliable method to communicate streams of event data from one or more producers to one or more consumers. When I switch the logging level to debug, I can see the following two lines repeating themselves over and over:. It will transparently handle the failure of servers in the Kafka cluster, and transparently adapt as partitions of data it fetches migrate within the cluster. kafka-python Documentation, Release 0. Any previous subscription will be unassigned and unsubscribed first. I want to help others avoid that pain if I can. Kafka-Tools. Either, once all the messages in the topic are exhausted. I could not find any doc related to this. A quickstart is available that can walk you through downloading and starting the services. 4 版本,需要自己单独安装 logstash-kafka 插件。. High-level consumer; Low-level consumer. Because I'm using Kafka as a 'queue of transactions' for my application, I need to make absolutely sure I don't miss or re-read any messages. It also covers. On the client, kafka. Over time we came to realize many of the limitations of these APIs. I had to port some applications and implement new ones that would communicate with each other using this protocol. 9+ kafka brokers. It is a quick fix and I am relying on GC to get the job done. Configuration and use of KafkaBundle are based on the RdKafka extension. The logic will be a bit more complicated and you can follow the example in here. High-level consumer. Apache Kafka. The Simple API generally is more complicated, and you should only use it if there is a need for it. The received data is stored in Spark's worker/executor memory as well as to the WAL (replicated on HDFS). Implements the Zookeeper-backed consumer implementation that offers offset management, load balancing and automatic failovers. confluent-kafka-python is Confluent's Python client for Apache Kafka and the Confluent Platform. So, by using the Kafka high-level consumer API, we implement the Receiver. Pure Python client for Apache Kafka - Python 3. Kafka Consumer Offset Management. It calls "ZkUtils. Note that it does not 'mimic' the Kafka API protocol, but rather provides a facility to do that. Its main advantage is simplicity of use and ability to balance partitions between consumers, if multiple instances of the message source are running in parallel. Spark Streaming + Kafka Integration Guide (Kafka broker version 0. High Level Consumer 可以并且应该被使用在多线程的环境,线程模型中线程的数量(也代表group中consumer的数量)和topic的partition数量有关,下面列举一些规则: 当提供的线程数量多于partition的数量,则部分线程将不会接收到消息; 当提供的线程数量少于partition的数量. So, by using the Kafka high-level consumer API, we implement the Receiver. Pony Kafka is at the moment mostly unoptimized, so we have the ability to squeeze out further performance gains and achieve parity with the C client. But it handles quite a few implementation details that need to be taken care of and provides a language agnostic interface to kafka. To achieve higher throughput, we recommend using the Producer in asynchronous mode, so that produce() calls will return immediately and the producer may opt to send messages in larger batches. , consumer iterators). Let's resume the vocabulary around the Kafka's consuming model, to understand what's in the game: A consumer consumes the partitions of some topics. Let's inspect different Kafka consumer implementations to see which is the most convenient for our use case. my nodejs consumer code :. This client also interacts with the server to allow groups of consumers to load bal. 1 with zkclient - 0. Kafka is generally used for two broad classes of applications:Building real-time streaming data. High-level consumer; Low-level consumer. This quickstart example will demonstrate how to run a streaming application coded in this library. 1, the high-level consumer stores these in ZooKeeper, but Kafka expects to ship its own API for this in a future release. * * @param request specifies the topic name, topic partition, starting byte offset, maximum bytes to be fetched. Although, it is a possibility that this approach can lose data under failures under default. (4 replies) Hi, I am trying to read kafka consumer using high level kafka Consumer API. Then jobs launched by Kafka-Spark Streaming processes the data. API Client. So, by using the Kafka high-level consumer API, we implement the Receiver. Apache Kafka - Quick Guide - In Big Data, an enormous volume of data is used. It uses the high level consumer API provided by Kafka to read messages from the broker. to replay older messages. To make multiple consumers consume the same partition, you must increase the number of partitions of the topic up to the parallelism you want to achieve or put every single thread into the separate consumer groups, but I think the latter is not desirable. This code can be used to benchmark throughput for a kafka cluster. This tutorial demonstrates how to process records from a Kafka topic with a Kafka Consumer. However, after two or three days since the App is started ( but is not used, because is a development enviroment and It has not loading), it crashes by java. It is a quick fix and I am relying on GC to get the job done. Greetings! I've encountered an issue, while trying to use kafka-node module on my production servers: I'm producing 10-15k of records per second, and unfortunately, the most I've been able to get from my consumer is 1-1. You can configure it by setting the property offsets. Re: [Camel-Kafka] consumerStreams vs ConsumersCount Hi, I have a single topic with 8 partitions and my consumer app has to process all the events as fast as possible. Server that subscribes to topic messages from Kafka broker and streams its to key-value pairs into IgniteDataStreamer instance. Kafka-node is a Node. For my use case, my consumer was a separate Express server which listened to events and stored them in a database. Before diving in, it is important to understand the general architecture of a Kafka deployment. Kafka is an awesome system for collecting, distributing, and hard-copying the stream data. Kafka Cluster Setup High Level Architecture Overview Unlock this content with a FREE 10-day subscription to Packt Get access to all of Packt's 7,000+ eBooks & Videos. Spark Streaming + Kafka Integration Guide (Kafka broker version 0. The only required configuration is the topic name. The Consumer is listening all time the new incoming events for specific kafka topic. Note that it does not 'mimic' the Kafka API protocol, but rather provides a facility to do that. A more complete study of this topic can be found in the Data Streaming with Kafka & MongoDB white paper. New Consumer API 3. js client with Zookeeper integration for Apache Kafka 0. Currently Kafka has two different types of consumers. To control this issue, the TopicConsumer implements an in memory queue which processes a single batch of messages at a time. The example above would produce to kafka synchronously - the call only returns after we have confirmation that the message made it to the cluster. 9) * Added rd_kafka_get_err_descs() (librdkafka 0. Apache Kafka. Here's a compatibility matrix that shows the Kafka client versions that are compatible with each combination of Logstash and the Kafka input plugin:. ChaperoneService produces the auditing messages to a dedicated. Consumer and High Level Consumer; Producer and High Level Producer; Manage topic Offsets; SSL connections to brokers (Kafka 0. This functions creates a high level consumer. RunKit notebooks are interactive javascript playgrounds connected to a complete node environment right in your browser. emit (' channel ', message);} // Init the Kafka client. kafka-python is best used with newer brokers (0. Alternatives. So, by using the Kafka high-level consumer API, we implement the Receiver. Kafka Architecture: Low-Level Design. At Epiclabs we branch it out from version 3. My sample kafka queue is having 8 partitions with 2 replication factor. New Consumer API 3. It seems the consumers have started consuming from the beginning (0 offset) instead from the point they had already consumed. Package ‘rkafka’ Description Apache 'Kafka' is an open-source message broker project developed by the Apache Soft- simple consumer,high level consumer and. ) If a consumer fails, it can retrieve its stored offset on startup and resume from there. Kafka Consumers. The Kafka Consumer logic is tolerant to ZK Failures, Kafka Leader of Partition changes, Kafka broker failures, recovery from offset errors and other fail-over aspects. Apache Kafka has gone through various design changes since its inception, Kafka 0. 1 or higher) Here we explain how to configure Spark Streaming to receive data from Kafka. Then jobs launched by Kafka – Spark Streaming processes the data. We can then simply use a Samza job to replicate and aggregate. Here, we use a Receiver to receive the data. 如果你使用的还是 1. This input will read events from a Kafka topic. This website uses cookies to ensure you get the best experience on our website. Similar API as Consumer with some exceptions. Kafka connect is built on top of Kafka core components. * Added high level consumer: Rdkafka\KafkaConsumer (librdkafka 0. Kafka provides the option to store all the offsets for a given consumer group in a designated broker (for that group) called the offset manager. Before diving in, it is important to understand the general architecture of a Kafka deployment. This document explains Kafka and Spark Streaming. The logic will be a bit more complicated and you can follow the example in here. API Client. Replication in Kafka. properties is setup as 168 hours. This code can be used to benchmark throughput for a kafka cluster. Uses Kafka's High Level Consumer API to read messages from Kafka. High performance - confluent-kafka-python is a lightweight wrapper around librdkafka, a finely tuned C client. The Python bindings provides a high-level Producer and Consumer with support for the balanced consumer groups of Apache Kafka >= 0. To guarantee this for all kinds of streaming computations stateful and not-stateful computations, it is requires that the data be replayed through Kafka in exactly same order, and the underlying blocks of data in Spark be regenerated in the exact way as it would have if there was no driver failure. So for the unit testing of Kafka Streams there comes something called Mocked Streams. The cleaner solution would be to make the FetcherRunnable a Disposable. In our installation, this command is available in the /usr/local/kafka/bin directory and is already added to our path during the installation. To achieve higher throughput, we recommend using the Producer in asynchronous mode, so that produce() calls will return immediately and the producer may opt to send messages in larger batches. , set initial offset when restarting the consumer). Out of the three consumers, Simple Consumer operates at the lowest level. Confluent Platform includes the Java consumer shipped with Apache Kafka®. Kafka is a highly scalable, highly available queuing system, which is built to handle huge message throughput at lightning-fast speeds. It is scaleable, durable and distributed by design which is why it is currently one of the most popular choices when choosing a messaging broker for high throughput architectures. 3 and I get OOME just after 10/15 minutes, My volume test setup has just one topic with 10 partitions with continuous message (size ~500KB) flow and below are my configuration;. Then jobs launched by Kafka – Spark Streaming processes the data. 2 Old Simple Consumer API class kafka. KafkaConsumerFactory in config stream.