RSS streaming

RSS streaming

In the last year, we focused a lot on storing data from the feeds inside Superfeedr. We started by storing a lot of Google Reader content, using our Riak backend. When introducing our new PubSubHubbub endpoint, we had the opportunity to add things like subscribe and retrieve and later, params like before and after.

We also introduced a Jquery plugin for Superfeedr which made it extremely easy to add any RSS feed to any web page.

Streaming RSS

Today, we’re moving forward by adding HTTP streaming support to the RSS stored in Superfeedr. In English, this means, you can ask Superfeedr something like this:

Please, give me the last 5 items from that feed, but keep the connection open and give me any new item that’s coming for as long as I’m listening.

Translating in curl language that would be something like that:

curl "http://stream.superfeedr.com/?hub.mode=retrieve&wait=stream&hub.topic=http://push-pub.appspot.com/feed" 
-udemo:6f74cbf1c5d30fd0c668f2ac0592204c

You’re more than welcome to try that in your shell.

You’ll see that the connection is then hanging. You can easily update the feed by filling this form and you should see the new entry appear in your shell.

You can also get all this RSS/Atom converted to JSON by adding -H'Accept: application/json'.

Fanout

Of course building and maintaining an infrastructure to handle this kind of traffic and concurrent connections is far from trivial. In the same way that we would not write from scratch our very own database to store the content we process, it made sense to find a existing infrastructure and rely on their expertise to achieve that.

We picked Fanout because they provide a completely transparent approach by allowing us to use our very own CNAME’s and proxy calls made to our API.

The first step is to setup a sub domain and point it to Fanout’s servers. Fanout will proxy any call to our backend that it can’t handle. If your request to stream.superfeedr.com includes a wait=stream param, then, Fanout will proxy the request to Superfeedr’s main backend. We will serve the data to be returned to the client, as well as a GRIP. Fanout will serve the data, but keep the connection open.

Later, when the feed updates, we will notify Fanout and they will just serve the content to any existing connection, in a completely transparent way.

Long polling

One of the benefits of using Fanout is that they provide multiple options when building a Realtime API. HTTP streaming really works extremely well when used from a HTTP client, but browsers are not always great to deal with streams. In the browser, an option is to look at our wait=poll option, combined with the after parameter.

Basically, the first request will look like this:

curl -udemo:6f74cbf1c5d30fd0c668f2ac0592204c "https://stream.superfeedr.com?hub.mode=retrieve&wait=stream&hub.topic=http%3A%2F%2Fpush-pub.appspot.com%2Ffeed"

The response will come immediately with the current content of the feed. From there, you should extract the id element of the latest entry. At the time of writing this post, it is http://push-pub.appspot.com/feed/5637036128075776. We will re-use this element as the value for the after query parameter:

curl -udemo:6f74cbf1c5d30fd0c668f2ac0592204c "https://stream.superfeedr.com?format=json&hub.mode=retrieve&wait=poll&after=hhttp%3A%2F%2Fpush-pub.appspot.com%2Ffeed%2F5637036128075776&hub.topic=http%3A%2F%2Fpush-pub.appspot.com%2Ffeed"

If one (or more) new entry has been added during the small lag between the 2 queries, it will be served right away. However, in the more likely event that nothing was served, the connection will wait for a new item to be added to the feed. This technique will guarantee that no item is ever missed, even with a single concurrent HTTP request.

Liked this post? Read the archive or

On the same topic, check react and server sent events, tracking feeds and top feeds list.

Previously, on the Superfeedr blog: More analytics.