Topics
- Offsets have a meaning only within a partition
- Order is guaranteed within a partition
- Data is retained only for 2 weeks
- Data written to a partition cannot be changed
- Data is assigned randomly to different partitions unless key is specified
- No limit on number of partitions per topic
- Topics should have replication factor greater than 1 for durability in case of broker down scenario
- Usually 3 is recommended
Brokers
- Each broker contains certain topic partitions
- After connecting with one broker you will be connected to entire cluster
- Only one broker can be a leader for a partition
- Only the leader can send receive data for partition
- Each partition has one leader and one-or-many InSyncReplicas
Producers
- Producers only need one broker and topic name to be able to publish the data
- Producer can define acknowledgement
- 0 fire and forget
- 1 written to leader
- all written to all replicas
- If key is sent with message like groupid then data for the key is ordered by sending it to the same partition
Consumers
- Each consumer within a group reads from exclusive partitions
- Cannot have more consumers than partitions otherwise some will be idle
- Less is ok but it needs to read data from multiple partitions in parallel
Kafka Gurantees and ordering
- Messages are appended to topic-partition in the order they are sent
- With replication factor of N, consumers and producers can tolerate N-1 broker down
- At most once - if processing goes wrong no chance to reprocess
- At least once - if consumer goes down it gets the message once it is back; most common
- Exactly once - 1.11
Only interaction with ZK is for creating a topic. For all other operations direct connection to broker is sufficient.