Pushpin

Realtime Web Data Streaming with Kafka and Pushpin

Supercharging Kafka:  Enable Realtime Web Streaming by Adding Pushpin

Exposing Kafka messages via a public HTTP streaming API

Matt Butler

Apache Kafka is the new hotness when it comes to adding realtime messaging capabilities to your system. At its core, it is an open source distributed messaging system that uses a publish-subscribe system for building realtime data pipelines. But, more broadly speaking, it is a distributed and horizontally scaleable commit log.

In a Kafka cluster, you will have topics, producers, consumers, and brokers:

  • Topics — A categorization for a group of messages
  • Producers — Push messages into a Kafka topic
  • Consumers — Pulls messages off of a Kafka topic
  • Kafka Broker — A Kafka node
  • Kafka Cluster— A collection of Kafka brokers

Take a deep dive into Kafka here.

Overall, Kafka provides fast, highly scalable and redundant messaging through a publish-subscribe model.

A pub-sub model is a messaging pattern where publishers categorize published messages into topics without knowledge of which subscribers would receive those messages (if any). Likewise, subscribers express interest in one or more topics and only receive messages that are of interest, without knowing anything about the publishers (source).

Kafka Strengths

As a messaging system, Kafka has some transformative strengths that have catalyzed its rising popularity

  1. Realtime Data Pipeline — Can handle realtime messaging throughput with high currency
  2. High-throughput — Ability to support high-velocity and high-volume data (1000’s per second)
  3. Fault-tolerant — Due to its distributed nature, it is relatively resistant to node failure within a cluster
  4. Low Latency — Milliseconds to handle thousands of messages
  5. Scalability — Kafka’s distributed nature allows you to add additional nodes without downtime, facilitating partitioning and replication

Kafka Limits

Due to its intrinsic architecture, Kafka is not optimized to provide API consumers with friendly access to realtime data. As such, many orgs are hesitant to expose their Kafka endpoints publicly.

In other words, it is difficult to expose Kafka across a public API boundary if you want to use traditional protocols (like websockets or HTTP).

To overcome this limit, we can integrate Pushpin into our Kafka ecosystem to handle more traditional protocols and expose our public API in a more accessible and standardized way.

Pushpin + Kafka

Server-sent events (SSE) is a technology where a browser receives automatic updates from a server via HTTP connection (standardized in HTML5 standards). Kafka doesn’t natively support this protocol, so we need to add an additional service to make this happen.

Pushpin’s primary value prop is that it is an open source solution that enables realtime push — a requisite of evented APIs (GitHub Repo). At its core, it is a reverse proxy server that makes it easy to implement WebSocket, HTTP streaming, and HTTP long-polling services. Structurally, Pushpin communicates with backend web applications using regular, short-lived HTTP requests.

Integrating Pushpin and Kafka provides you with some notable benefits:

  • Resource-Oriented API — Provides a more logical resource-oriented API to consumers that fits in with an existing REST API. In other words, you can expose data over standardized, more-secure protocols.
  • Authentication — Reuses existing authentication tokens and data formats.
  • API Management — Harnesses your existing API management system or load balancers.
  • Web Tier Scaleability — If the number of your web consumers grows substantially, then it may be more economical and performant to scale out your web tier, rather than your Kafka cluster.

In this next example, we will expose Kafka message via HTTP streaming API. 

Building Kafka Server-Sent Events

This example project reads messages from a Kafka service and exposes the data over a streaming API using Server-Sent Events (SSE) protocol over HTTP. It is written using Python & Django, and relies on Pushpin for managing the streaming connections.

How it Works

In this demo, we drop a Pushpin instance on top of our Kafka broker. Pushpin acts as a Kafka consumer, subscribes to all topics, and re-publishes received messages to connected clients. Clients listen to events via Pushpin.

More granularly, we use views.py to set up an SSE endpoint, while relay.py handles the messaging input and output.

  1. First, we need to setup virtualenv and install dependencies:
virtualenv --python=python3 venv
. venv/bin/activate
pip install -r requirements.txt

2. Create a suitable .env with Kafka and Pushpin settings:

KAFKA_CONSUMER_CONFIG={"bootstrap.servers":"localhost:9092","group.id":"mygroup"}
GRIP_URL=http://localhost:5561

3. Run the Django server:

python manage.py runserver

4. Run Pushpin:

pushpin --route="* localhost:8000"

5. Run the relay command:

python manage.py relay

The relay command sets up a Kafka consumer according to KAFKA_CONSUMER_CONFIG, subscribes to all topics, and re-publishes received messages to Pushpin, wrapped in SSE format.

Clients can listen to events by making a request (through Pushpin) to /events/{topic}/:

curl -i http://localhost:7999/events/test/

The output stream might look like this:

HTTP/1.1 200 OK
Content-Type: text/event-stream
Transfer-Encoding: chunked
Connection: Transfer-Encoding
event: message
data: hello
event: message
data: world

Repo on GitHub

realtime_data fanout pushpin

Pushpin: An Evented API for your DevOps Stack

“Real-time” is becoming an omnipresent force in the modern tech stack. As consumers demand faster and more frequent data transactions, companies are increasingly investing in product infrastructure that accelerates these transactions. Though we’ve seen APIs become an economic and technological imperative, they are typically based on request-response style interactions, which limits their scope and effectiveness in the real-time arena.

Request-Response vs Event-Driven APIs

At its core, request–response is a message exchange pattern in which a requestor sends a request message to a replier system. The replier system receives and processes the request, and if all goes well, it returns a message in response. While this exchange format works well for more structured requests, it limits integrations to those where the expectant system has a clear idea what it wants from the other. These request-response style APIs, therefore, must follow the interaction script from the calling service.

Pushpin Fanout - Request/Response vs Evented APIs

In an event-driven architecture, applications integrate multiple services and products as equals based on event-driven interactions. These interactions are driven by event emitters, event consumers, and event channels, whereby the events, themselves, are typically significant ‘changes in state’ that are produced, published, propagated, detected, or consumed. This architectural pattern supports loose coupling amongst software components and services. The advantage is that an event emitter does not need to know the state of the consumer, who the consumer is, or how the event will be processed (if at all). It is a mechanism of pushing data through a persistent stream.

Evented API Solutions

In the tech ecosystem, there are a number of ways to approach data streaming and evented APIs. Some of the leading SAAS solutions include PubNubPusherKaazing, and Fanout – which each have their own pros/cons and ramp up investment. For the purposes of understanding the fundamentals of event-driven architecture, we’ll explore some open source software called Pushpin.

Pushpin

Pushpin’s primary value prop is that it is an open source solution that enables real-time push — a requisite of evented APIs (GitHub Repo). At it’s core, it is a reverse proxy server that makes it easy to implement WebSocket, HTTP streaming, and HTTP long-polling services. Structurally, Pushpin communicates with backend web applications using regular, short-lived HTTP requests.

This architecture provides a few core benefits:

  • Backend languages can be written in any language and use any webserver.
  • Data can be pushed via a simple HTTP POST request to Pushpin’s private control API
  • It is invisible to connected clients
  • It manages stateful elements by acting as the responsible party when requiring data from your backend server
  • Horizontally scalable by not requiring communication between Pushpin instances
  • It harnesses a publish-subscribe model for data transmission
  • It can act as both a proxy server and publish-subscribe broker

Integrating Pushpin

From a more systemic perspective, there are a few ways you can integrate Pushpin into your stack. The most basic setup is to put Pushpin in front of a typical web service backend, where the backend publishes data directly to Pushpin. The web service itself might publish data in reaction to incoming requests, or there might be some kind of background process/job that publishes data.

PushPin real-time reverse proxy

Because Pushpin is a proxy server, it works with most API management systems — allowing you to do perform actual API development.  For instance, you can chain proxies together, placing Pushpin in the front so your API management system isn’t subjected to long-lived connections.  More importantly, Pushpin can translate WebSocket protocol to HTTP, allowing the API management system to operate on the translated data.

PushPin real-time reverse proxy with API management system

The Future of Evented APIs

In some follow up articles, I’ll discuss some of the unique features that we can see in evented APIs moving foward.  These include event batching, salience filters, and a standard subscription interface.  If you’re looking to play around with a real-time drop-in API proxy, then I highly recommend experimenting with Pushpin to get started.