What is Apache Kafka ?

What is Apache Kafka ?

What is Apache Kafka ?

Apache Kafka is an open-source, distributed, stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation, written in Scala and Java. It is an open source, which means that it is free to use and has a large network of users and developers who contribute towards updates, new features and offering support for new users.

What is Apache Kafka ?

The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka can be connected to external systems for data import or export via Kafka Connect and provides Kafka Streams, a Java stream processing library.

Kafka is a distributed streaming platform, which is used to publish and subscribe to streams of records. Kafka is used for fault tolerant storage and it replicates topic log partitions to the multiple servers. Kafka is designed to allow your apps to process records as they occur.

Kafka uses a binary TCP-based protocol which is optimized for efficiency and relies on a “message set” abstraction that naturally groups messages together to reduce the overhead of the network roundtrip. This “leads to the larger network packets, larger sequential disk operations, and contiguous memory blocks. This allows Kafka to turn a bursty stream of random message writes into linear writes.

Apache Kafka is based on the commit log, which allows users to subscribe to it and publish data to any number of systems or real-time applications. The example applications of Kafka include managing passenger and driver matching at Uber, providing real-time analytics and predictive maintenance for the British Gas smart home and performing numerous real-time services across all of LinkedIn.

Kafka stores key-value messages which comes from the arbitrarily many processes called producers. The data can be partitioned into different “partitions” within different “topics”. In the partition, messages are strictly ordered by their offsets and indexed and stored together with a timestamp.

Kafka runs on a cluster of one or more servers called brokers. The partitions of all topics are distributed across the cluster nodes and the partitions are replicated to multiple brokers. This architecture allows Kafka to deliver massive streams of messages in a fault-tolerant fashion and has allowed it to replace some of the conventional messaging systems like Java Message Service (JMS), Advanced Message Queuing Protocol (AMQP), etc.

Other Courses :

DataStage Online Training

JavaScript Online Training

Manual Testing Online Training

Apache Pig Online Training

Digital Marketing Online Training