Spring Boot + Apache Kafka Tutorial - #2 - Apache Kafka Core Concepts or Terminologies


Welcome to Spring Boot + Apache Kafka Tutorial series. In this lecture, we will take a look at Apache Kafka's important core concepts or terminologies: 

- Kafka cluster 

- Kafka broker 

- Kafka producer 

- Kafka consumer 

- Kafka topic 

- Kafka partitions 

- Kafka offsets 

- Kafka consumer group

Lecture #2 - Apache Kafka Core Concepts or Terminologies

Transcript:

Hi. Welcome back. In this lecture, we will take a look into some of the important Kafka concepts or technologies. Well, in order to use Kafka in a spring boot application, we have to understand Kafka concepts or technologies. 

Well, in this lecture, I'm going to go through some of the important Apache Kafka terminologies or core concepts, so that it will be easy for you to understand how to use, you know, Kafka in a Spring boot application. 

Well, let's begin with Kafka Cluster. Well, as we know that Kafka is a distributed system, right? It acts as a cluster. Well, Kafka Cluster consists of a set of brokers. A cluster has a minimum of three brokers. Well, if you can see the diagram over here, Kafka Cluster is nothing, but it consists of a set of brokers. It means it consists of one or more brokers and it is recommended that at least you have three, you know, brokers in a Kafka Cluster in production. All right. So just remember, Kafka Cluster is nothing but it consists of one or more Kafka brokers. 

Now, the question is what is Kafka Broker?. Well, a Kafka broker is a Kafka server. It's just a meaningful name given to the Kafka server and this name makes sense as well because all that Kafka does is act as a message broker between Producer and Consumer. The Producer and Consumer don't interact directly. They use the Kafka server as an agent or a broker because in the messages, for example, if you can see the diagram over here, Kafka broker, you know, it acts as the agent or a broker to exchange messages between Producer and Consumer. All right. So just remember Kafka broker is nothing but a Kafka server. It acts as a broker or an agent to exchange messages between the Producer and Consumer. All right. 

Now, the question is, what is a Producer ?. Well, Producer is nothing but an application that produces the messages and sends them to the Kafka broker. Well, the Producer does not send messages directly to the recipient. It sends a message only to the Kafka server. All right. Just remember, a Producer is nothing but an application that produces the messages and sends them to the Kafka broker. Producers don't send a message directly to Consumers. It sends a message only to the Kafka Server all right now, the question is what is Consumer? Well, Consumer is basically an application that reads or consumes a message from the Kafka server. Well, If Producers are sending the data they must be sent to someone, right? The Consumers are the recipients. But remember that Producers don't send data to the recipient address, they just send it to the Kafka server first and anyone who is interested in that data can come forward and take it from the Kafka server. So any application that requests data from the server is a Consumer and they can ask for data sent by any Producer provided they have permission to read it. For example, look at here we have a Kafka broker and Producer but then nothing but an application that produces the messages and sends them to that Kafka broker and a Consumer is nothing but an application that consumes or reads a message from the Kafka server. All right. 

Now the question is how the data is stored in a Kafka broker. Well, we learned that the Producer sends data to the Kafka broker. The Consumer can ask for data from the Kafka broker, but the question is what kind of data? We need to have some identification mechanism to request data from the broker where comes the topic. 

Well, what is the topic? Well, the topic is basically nothing but a category in a Kafka broker where the message is basically stored. For example, let's a Producer is an application that produces the message or any data and that data will be sent to the Kafka broker. And the data can be in any format like data can be text, data can be string, data can be avro, data can be a JSON, byte array All right. But when a Consumer consumes that data, there should be an identification mechanism in order to identify which kind of data. All right. So in order to store, we know the data format we can use the topic. So the topic is nothing but a category in a Kafka broker It, you know, categorizes basically the messages or the data. Well, if we can compare a topic with a database term, the topic is like a table in a database. Well, in the database table we store the records in a sequential manner right similarly in the topic, we store the message in a sequential manner. All right. Well, the topic is identified by a name. Well, each topic contains a unique name so that Consumers can easily consume the data from that particular topic. You can have any number of topics in the Kafka cluster or Kafka broker. Well, you cannot query the message in a topic. Well, in a database table, we query the data in the table right, but in the case of a topic, you cannot query the data from that topic. The Producer has to send the data to the broker and the Consumer has to consume the data from the Kafka broker. All right. Just remember Topic is nothing but a category that categorizes the message in the Kafka broker. And each topic has a unique name so that Consumers can, you know, subscribe to that particular topic and you can, you know, create any number of topics that you want in a Kafka broker. 

Next, what are Kafka's partitions? Well, Kafka's topics are further divided into a number of partitions that contains a record unchangeable sequence. Well, if we can see that diagram over here, we have topic one. This topic one is divided into a number of partitions partition one, partition two, and partition three. And again, topic two is also divided into partition one, partition two, and partition three the well. The idea behind a partition is that whenever Kafka brokers will store messages for a topic, the capacity of the data can be anonymous and it may not possible to store in a single computer. Therefore it will be partitioned into multiple parts and distributed among multiple computers since Kafka is a distributed system. Well, as we know that Kafka is a distributed system so whenever we have a large amount of data, we can divide it into partitions and we can distribute it among multiple, you know, Kafka brokers, you know, in a Kafka cluster. Right? So this is what the partitions look like. So just remember partitions are nothing but a topic is again further divided into partitions so that a huge amount of data can be divided among multiple partitions. 

Now the question is how the messages will be stored in a partition of the particular topic and how the message will be identified by IDs. Well, that is where basically Offsets comes into the picture. Offset is nothing but a sequence of ids given to the messages as they arrived at the partition. Once Offset is assigned It will never be changed. The first message gets an Offset zero. The next message receives an Offset one and so on and so forth. Well, consider we have a Kafka broker and it has a topic, and the topic is divided into multiple partitions whenever you know Kafka broker stores the message in a partition of the topic, then a sequence ID will be assigned to that message and that is called Offset. Okay. For example, you can see here partition zero, partition one, and partition two, and these are the sequence IDs given to the messages, okay this is called Offset and the Offset Id starts with the 0 1 , 2, 3 like this. All right, just remember, Offset is nothing but sequence ids are given to the messages in a partition of a particular topic. 

Now, let's take a look into what is Consumer groups. Well, if we can see the diagram over here, we have my topic It has multiple partitions like partition zero, partition one, and partition two and there a lot of Consumers are consuming the data from my topic. And As we can see here the Consumers are again grouped here, Consumer group A contains these four groups, Consumer B contains these two groups. All right. So a Consumer group contains one or more groups working together to process the messages. Well, just remember, the Consumer group contains one or more Consumers working together to process the messages. All right. 
So these are the important Apache Kafka terminologies or core concepts. Well, whenever we create a Spring boot Apache Kafka project, then we are going to use these terminologies a lot. That's why I thought I could cover all these core concepts or terminologies in this lecture. So let's quickly recap all these terminologies again. So Kafka cluster basically consists of one or more brokers and the broker is nothing but a Kafka server and it basically acts as an agent or broker to exchange a message between the Producer and Consumer. The producer is nothing but an application that produces the message and sends it to Kafka Broker and Consumer is nothing but an application that reads or consumes a message from the Kafka server. A Kafka topic is nothing but a category, It categorizes the messages in a Kafka broker and we can create any number of topics in a Kafka broker and each topic, a however unique name so that Consumers can subscribe to that particular topic. Kafka topics are divided into a number of partitions that contain records in an unchangeable sequence. Offset is nothing but a sequence of ids given to messages in a partition of a particular topic. A consumer group contains one or more Consumers working together to process the messages so these are the important Apache Kafka terminologies All right, great.

Comments