realtime api infrastructure

High scalability with Fanout and Fastly

Fanout Cloud is for high scale data push. Fastly is for high scale data pull. Many realtime applications need to work with data that is both pushed and pulled, and thus can benefit from using both of these systems in the same application. Fanout and Fastly can even be connected together!


Using Fanout and Fastly in the same application, independently, is pretty straightforward. For example, at initialization time, past content could be retrieved from Fastly, and Fanout Cloud could provide future pushed updates. What does it mean to connect the two systems together though? Read on to find out.

Proxy chaining

Since Fanout and Fastly both work as reverse proxies, it is possible to have Fanout proxy traffic through Fastly rather than sending it directly to your origin server. This provides some unique benefits:

  1. Cached initial data. Fanout lets you build API endpoints that serve both historical and future content, for example an HTTP streaming connection that returns some initial data before switching into push mode. Fastly can provide that initial data, reducing load on your origin server.
  2. Cached Fanout instructions. Fanout’s behavior (e.g. transport mode, channels to subscribe to, etc.) is determined by instructions provided in origin server responses, usually in the form of special headers such as Grip-Hold and Grip-Channel. Fastly can cache these instructions/headers, again reducing load on your origin server.
  3. High availability. If your origin server goes down, Fastly can serve cached data and instructions to Fanout. This means clients could connect to your API endpoint, receive historical data, and activate a streaming connection, all without needing access to the origin server.

Network flow

Suppose there’s an API endpoint /stream that returns some initial data and then stays open until there is an update to push. With Fanout, this can be implemented by having the origin server respond with instructions:

HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 29
Grip-Hold: stream
Grip-Channel: updates

{"data": "current value"}

When Fanout Cloud receives this response from the origin server, it converts it into a streaming response to the client:

HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
Connection: Transfer-Encoding

{"data": "current value"}

The request between Fanout Cloud and the origin server is now finished, but the request between the client and Fanout Cloud remains open. Here’s a sequence diagram of the process:


Since the request to the origin server is just a normal short-lived request/response interaction, it can alternatively be served through a caching server such as Fastly. Here’s what the process looks like with Fastly in the mix:


Now, guess what happens when the next client makes a request to the /stream endpoint?


That’s right, the origin server isn’t involved at all! Fastly serves the same response to Fanout Cloud, with those special HTTP headers and initial data, and Fanout Cloud sets up a streaming connection with the client.

Of course, this is only the connection setup. To send updates to connected clients, the data must be published to Fanout Cloud.

We may also need to purge the Fastly cache, if an event that triggers a publish causes the origin server response to change as well. For example, suppose the “value” that the /stream endpoint serves has been changed. The new value could be published to all current connections, but we’d also want any new connections that arrive afterwards to receive this latest value as well, rather than the older cached value. This can be solved by purging from Fastly and publishing to Fanout Cloud at the same time.

Here’s a (long) sequence diagram of a client connecting, receiving an update, and then another client connecting:


At the end of this sequence, the first and second clients have both received the latest data.


One gotcha with purging at the same time as publishing is if your data rate is high it can negate the caching benefit of using Fastly.

The sweet spot is data that is accessed frequently (many new visitors per second), changes infrequently (minutes), and you want changes to be delivered instantly (sub-second). An example could be a live blog. In that case, most requests can be served/handled from cache.

If your data changes multiple times per second (or has the potential to change that fast during peak moments), and you expect frequent access, you really don’t want to be purging your cache multiple times per second. The workaround is to rate-limit your purges. For example, during periods of high throughput, you might purge and publish at a maximum rate of once per second or so. This way the majority of new visitors can be served from cache, and the data will be updated shortly after.

An example

We created a Live Counter Demo to show off this combined Fanout + Fastly architecture. Requests first go to Fanout Cloud, then to Fastly, then to a Django backend server which manages the counter API logic. Whenever a counter is incremented, the Fastly cache is purged and the data is published through Fanout Cloud. The purge and publish process is also rate-limited to maximize caching benefit.

The code for the demo is on GitHub.

Examining Mature APIs (Slack, Stripe, Box)

In our previous blog post, we discussed the disconnect between API pricing plans where you pay monthly for a set number of calls and regular developer use cases. We think competition will drive new pricing models that are more developer friendly – and a potential approach could be charging for calls based on their business value. Examining webhook events available via API from Stripe, Slack, and Box gives us a forward look into how this could work.

What’s a mature API?

Forbes nicely summarizes where they see API development going in this graphic (ignore the “customer-driven platform revolution”) portion:


They make a valid point that APIs become more valuable as the data that flows from them becomes bi-directional – APIs are not only returning data based on calls, but actively pushing out data based on API activity.

This data push generally starts around activity with high business value – so we’re going to examine APIs from Stripe, Slack, and Box to get an idea of what events they make available.

Slack has a separate “Events API”

Slack has chosen to implement a separate Events API for developers who want to build apps that respond to events within Slack. Here’s the full list of event types that they can push in realtime as they happen.

Looking at this list in more detail, it’s focused around key messaging and collaboration activities:

  • Creating and updating channels
  • Uploading, sharing, and commenting on files
  • Messages being posted to various channels

Box uses event triggers

Box uses webhooks with event triggers attached to Box files and folders to monitor events attached to files and folders and notify you when they occur. Here’s their full list of events for files and folders.

As expected for Box, events are focused around file management and collaboration:

  • Uploading, previewing, and downloading files
  • Comment and task assignment creation and updating

Stripe sends a variety of events

Stripe sends a wide variety of events around payments, both keyed to internal and external usage:

  • Account creation and updating
  • Product or plan creation
  • Card charges and updates

What does it mean?

The events that these mature APIs have chosen to make available for realtime push have substantial business value for developers building apps using their functionality. As more APIs begin to offer push of data, they may move to a blended pricing model that charges more for these high-value events. We’re interested to see what happens!

Getting Started with Building Realtime API Infrastructure

How companies are adding realtime capabilities to their products and building realtime APIs

Mirroring the rise of API-driven applications, realtime is becoming an emerging, omnipresent force in modern application development. It is powering instant messaging, live sports feeds, geolocation, big data, and social feeds. But, what is realtime and what does it really mean? What types of software and technology are powering this industry? Let’s dive into it.

What Is Realtime?

For the more technical audience, realtime traditionally describes realtime computing, whereby “hardware and software systems are subject to a realtime constraint, for example from event to system response” (Source). For this article, we’re framing realtime from the perspective of an end-user: the perception that an event or action happens sufficiently quickly to be perceived as nearly instantaneous.

Moreover, realtime could be defined in a more relative temporal sense. It could mean that a change in A synchronizes with a change in B. Or, it could mean that a change in A immediately triggers a change in B. Or… it could mean that A tells B that something changed, yet B does nothing. Or… does it mean that A tells everyone something changed, but doesn’t care who listens?

Let’s dig a bit deeper. Realtime does not necessarily mean that something is updated instantly (in fact, there’s no singular definition of “instantly”). So, let’s not focus on the effect, but rather the mechanism. Realtime is about pushing data as fast as possible — it is automated, synchronous, and bi-directional communication between endpoints at a speed within a few hundred milliseconds. 

  • Synchronous means that both endpoints have access to data at the same time.
  • Bi-directional means that data can be sent in either direction.
  • Endpoints are senders or receivers of data (phone, tablet, server).
  • A few hundred milliseconds is a somewhat arbitrary metric since data cannot be delivered instantly, but it most closely aligns to what humans perceive as realtime (Robert Miller proved this in 1986).

With this definition and its caveats in mind, let’s explore the concept of pushing data.

Data Push

We’ll start by contrasting data push with “request-response.” Request-response is the most fundamental way that computer systems communicate. Computer A sends a request for something from Computer B, and Computer B responds with an answer. In other words, you can open up a browser and type “” The browser sends a request to Reddit’s servers and they respond with the web page.

Request-Response vs Evented APIs

In a data push model, data is pushed to a user’s device rather than pulled (requested) by the user. For example, modern push email allows users to receive email messages without having to check manually. Similarly, we can examine data push in a more continuous sense, whereby data is continuously broadcasted. Anyone who has access to a particular channel or frequency can receive that data and decide what to do with it.

Moreover, there are a few ways that data push/streaming is currently achieved:

HTTP Streaming

HTTP Streaming

HTTP streaming provides a long-lived connection for instant and continuous data push. You get the familiarity of HTTP with the performance of WebSockets. The client sends a request to the server and the server holds the response open for an indefinite length. This connection will stay open until a client closes it or a server side-side event occurs. If there is no new data to push, the application will send a series of keep-alive ticks so the connection doesn’t close.


HTTP Web Sockets

WebSockets provide a long-lived connection for exchanging messages between client and server. Messages may flow in either direction for full-duplex communication. This bi-directional connection is established through a WebSocket handshake. Just like in HTTP Streaming and HTTP Long-Polling, the client sends a regular HTTP request to the server first. If the server agrees to the connection, the HTTP connection is replaced with a WebSocket connection.



Webhooks are a simple way of sending data between servers. No long-lived connections are needed. The sender makes an HTTP request to the receiver when there is data to push. A WebHook registers or “hooks” to a callback URL and will notify you anytime an event has occurred. You register this URL in advance and when an event happens, the server sends a HTTP POST request with an Event Object to the callback URL. This event object contains the new data that will be pushed to the callback URL. You might use a WebHook if you want to receive notifications about certain topics. It could also be used to notify you whenever a user changes or updates their profile.

HTTP Long-Polling

HTTP Long Polling

HTTP long-polling provides a long-lived connection for instant data push. It is the easiest mechanism to consume and also the easiest to make reliable. This technique provides a long-lived connection for instant data push. The server holds the request open until new data or a timeout occurs. Most send a timeout after 30 to 120 seconds, it depends on how the API was setup. After the client receives a response (whether that be from new data or a timeout), the client will send another request and this is repeated continuously.

Is pushing data hard? Yes, it is, especially at scale (ex. pushing updates to millions of phones simultaneously). To meet this demand, an entire realtime industry has emerged, which we’ll define as Realtime Infrastructure as Service (Realtime IaaS).

Realtime Libraries

Here is a compilation of resources that are available for developers to build realtime applications based on specific languages / frameworks:

Realtime Infrastructure as a Service

According to Gartner, “Infrastructure as a service (IaaS) is a standardized, highly automated offering, where compute resources, complemented by storage and networking capabilities are owned and hosted by a service provider and offered to customers on-demand. Customers are able to self-provision this infrastructure, using a Web-based graphical user interface that serves as an IT operations management console for the overall environment. API access to the infrastructure may also be offered as an option.”

We often here PaaS (Platform as a Service) and SaaS (Software as a Service), so how are they different than IaaS?

  • Infrastructure as a Service (IaaS): hardware is provided by an external provider and managed for you.
  • Platform as a Service (PaaS): both hardware and your operating system layer are managed for you.
  • Software as a Service (SaaS): an application layer is provided for the platform and infrastructure (which is managed for you).

To power realtime, applications require a carefully architected system of servers, APIs, load balancers, etc. Instead of building these systems in-house, organizations are finding it more cost-effective and resource-efficient to purchase much of this systemic infrastructure and then drive it in-house. These systems, therefore, are not just IaaS, but typically provide both a platform and software layer to help with management. Foundationally speaking, their core benefit is that they provide realtime infrastructure, whether you host it internally or rely on managed instance

It all comes down to the simple truth that realtime is hard for a number of reasons:

  • Customer Uptime Demand – Customers that depend on realtime updates will immediately notice when your network is not performant.
  • Horizontal Scalability – You must be able to handle volatile and massive loads on your system or risk downtime. This is typically achieved through clever horizontal scalability and systems that are able to manage millions of simultaneous connections.
  • Architectural Complexity – Maintaining a performant realtime system is not only complex, but it requires extensive experience and expertise. This is expensive to buy, especially in today’s high demand engineering market.
  • Contingencies – Inevitably, your system will experience some downtime, whether due to an anticipated load spike or a newly released feature. It is important, therefore, to have multiple uptime contingencies in place to make sure that the system knows what to do, should your primary realtime mechanism fail to perform.
  • Queuing – When you’re sending a lot of data, then you likely need an intermediate queuing mechanism to ensure that your backend processes are not overburdened with increased message loads.

Realtime Application IaaS

Realtime app infrastructure sends data to browsers and clients. It typically uses pub/sub messaging, webhooks, and/or websockets — and is separate from an application or service’s main API. These solutions are best for organizations that are looking for realtime messaging without the need to build their own realtime APIs.

Pub-Subscribe PubSub Pattern for Realtime API

These systems also have more well-built platform/software management tools on top of their infrastructure offerings. For instance, the leading providers have built-in configuration tools like access controls, event delegation, debugging tools, and channel configuration.

Benefits of Realtime App IaaS

  • Speed – typically explicitly designed to deliver data with low latencies to end-user devices, including smartphones, tablets, browsers, and laptops.
  • Multiple SDKs for easier integration.
  • Uses globally distributed realtime data delivery platforms.
  • Multiple protocol adapters.
  • Well-tested in production environments.
  • Keeps internal configuration to a minimum.

Use Cases

While some of the platforms out there function differently, here are some of the most typical use cases:

  • Realtime Chat – In a microservice environment, a realtime API proxy makes it easy to listen for instant updates from other microservices without the need for a centralized message broker. Each microservice gets its own proxy instance, and microservices communicate with each other via your organization’s own API contracts rather than a vendor-specific mechanism.
  • IoT Device Control – Securely monitor, control, provision and stream data between Internet-connected devices.
  • Geotracking / Mapping Realtime Updates – Integrates with other realtime APIs like (Google Maps) to construct rich realtime updates.
  • Multiplayer Game Synchronization – Synchronize communications amongst multiple simultaneous players to keep play fluid.


Here are some realtime application IaaS providers (managed) to check out for further learning: PubNubPusher, and Ably.

Realtime API IaaS for API Development

Realtime API infrastructure specifically allows developers to build realtime data push into their existing APIs. Typically, you would not need to modify your existing API contracts, as the streaming server would serve as a proxy. The proxy design allows these services to fit nicely within an API stack. This means it can inherit other facilities from your REST API, such as authentication, logging, throttling, etc and, consequently, it can be easily combined with an API management system. In the case of WebSocket messages being proxied out as HTTP requests, the messages may be handled statelessly by the backend. Messages from a single connection can even be load balanced across a set of backend instances.

Realtime API Infrastructure as a service IaaS Proxy

All in all, realtime API IaaS is used for API development, specifically geared for organizations that need to build highly-performant realtime APIs like Slack, Instagram, Google, etc. All of these orgs build and manage their infrastructure internally, so the IaaS offering can be thought of as a way to extend these capabilities to organizations that lack the resources and technical expertise to build a realtime API from scratch.

Benefits of Realtime API IaaS

  • Custom build an internal API.
  • Works with existing API management systems.
  • Does not lock you into a particular tech stack.
  • Provides realtime capabilities throughout entire stack.
  • Usually proxy-based, with pub/sub or polling.
  • Add realtime to any API, no matter what backend language or database.
  • Cloud or self-hosted API infrastructure.
  • It can inherit facilities from your REST API, such as authentication, logging, throttling.

Use Cases

While some of the platforms out there function differently, here are some of the most typical use cases:

  • API development – As we’ve discussed, you can build custom realtime APIs on top of your existing API infrastructure.
  • Microservices – In a microservice environment, a realtime API proxy makes it easy to listen for instant updates from other microservices without the need for a centralized message broker. Each microservice gets its own proxy instance, and microservices communicate with each other via your organization’s own API contracts rather than a vendor-specific mechanism.
  • Message queue – If you have a lot of data to push, you may want to introduce an intermediate message queue. This way, backend processes can publish data once to the message queue, and the queue can relay the data via an adapter to one or more proxy instances. The realtime proxy is able to forward subscription information to such adapters, so that messages can be sent only to the proxy instances that have subscribers for a given channel.
  • API management – It’s possible to combine an API management system with a realtime proxy. Most API management systems work as proxy servers as well, which means all you need to do is chain the proxies together. Place the realtime proxy in the front, so that the API management system isn’t subjected to long-lived connections. Also, the realtime proxy can typically translate WebSocket protocol to HTTP, allowing the API management system to operate on the translated data.
  • Large scale CDN performance – Since realtime proxy instances don’t talk to each other, and message delivery can be tiered, this means the realtime proxy instances can be geographically distributed to create a realtime push CDN. Clients can connect to the nearest regional edge server, and events can radiate out from a data source to the edges.


Here are some realtime API IaaS providers (managed/open source) to check out for further learning: Fanout/, and LiveResource.


Realtime is becoming an emerging, omnipresent force in modern application development. It is not only a product differentiator, but is often sufficient for product success. It has accelerated the proliferation of widely-used apps like Google Maps, Lyft, and Slack. Whether you’re looking to build your own API from scratch or build on top of an IaaS platform, realtime capabilities are increasingly becoming a requirement of the modern tech ecosystem.