Apache Kafka along with RxJS operators

Elie Nehmé
3 min readMar 24, 2022

How to buffer received data while the producer is unavailable?

Photo by Jonas Svidras on Unsplash

Introduction

There are many ways to merge streams with RxJS through different operators: merge, concat, switchMap, … However, finding the right operator is not always a simple task.

In this tutorial, I will explain some merging operators and the issue I faced recently with Kafka and RxJS.

The project I worked on had a processor function — a transformation process between an inbound and an outbound topic. The ultimate purpose of this processor is to consume data from “topic-a”, validate it and publish it to “topic-b”.

Although it sounds a fairly simple use case of processing data, the real challenge is to wait for the producer to be ready without losing any data. Besides, the producer could be paused or disconnected from time to time, and we should handle this case as well.

A complete sample project can be checked out here: https://github.com/elie29/nestjs-kafka-project-github

Setup consumer and producer

With node-rdkafka, creating a consumer is pretty straightforward — the code below is not exhaustive for simplicity:

A simple consumer to show how data is published

We chose RxJS Subject to emit received data.

On the other hand, we have created a producer with node-rdkafka and chosen RxJS BehaviorSubject to notify producer availability as follows:

TakeUntil, RepeatWhen or SwitchMap

Let us start with a first approach of streaming data when the producer is available.

This is can be done using takeUntil and repeatWhen operators:

The example above shows how the consumer will emit data until the producer is disconnected; it is the role of takeUntil. Then, repeatWhen will subscribe to the consumer once the producer is available.

However, with this pattern, all received data will be lost when the producer is paused or goes offline 😡

We can emulate the same result with switchMap:

When the producer is available, we subscribe to the consumer; otherwise, we return an EMPTY source and lose all received data 😫

Another approach with combineLatest

We can try to use combineLatest operator as follows:

Similar to takeUntil and switchMap, we still lose data when the producer is unavailable 😭

A pitfall would be to filter directly on the producer; once it is available, the consumer starts emitting data and no longer takes into account producer disconnections!

WindowToggle and BufferToggle to the rescue

To address the issue of saving data while the producer is disconnected, we need to toggle between streaming and buffering as follows:

Emit when producer is on otherwise buffer data

With this pattern, data is consumed when the producer is resumed and buffered when the producer is paused.

  • windowToggle: A best fit operator to indicate when to emit data and when to ignore them
  • bufferToggle: A suitable operator to indicate whether or not to buffer data

Finally, both streams are merged and flattened in order to publish all values one by one:

Values are buffered when the producer is paused or disconnected

A sample project with NestJS, node-rdkafka and RxJS is available at: https://github.com/elie29/nestjs-kafka-project-github

Don’t hesitate to give me your feedback, or contact me on Twitter @elie_nehme

--

--

Elie Nehmé

Lead Web Application Architect. Passionate about refactoring, clean code, and building scalable solutions with simplicity. https://elie29.hashnode.dev/