PubSubHubbub v0.4

The PubSubHubbub adoption is growing strong. There is not a week where a major publisher or a major subscriber eventually implements it. Yesterday, that was NewsBlur’s turn.

At the same time, the current PubSubHubbub spec (0.3) is a bit rusty and presents a few issues that prevented more platforms to use it. The 2 most significant issues are the following: arbitrary content support, opening the door for private feeds.

We recently worked on the version 0.4 of the spec, which, solves these problems.

Arbitrary Content Support

The current spec is coupled with the Atom and RSS specs very deeply. However, in theory, PubSubHubbub shouldn’t really care about what is being transported for as long as it’s available via HTTP.

Discovery

The current spec states that discovery is performed by adding a <link> element in the “head” part of the feed (the one which contains all the meta data about the feed). Luckily, HTTP also has a “head” part: the HTTP headers.
We proposed that doing discovery for arbitrary content resources is done thru the use of Link Headers.

They look like this:

HTTP/1.1 200 OK
Date: Tue, 03 Apr 2012 08:02:19 GMT
Content-Type: text/html
Content-Length: 11261
Link: <http://pubsubhubbub.superfeedr.com>; rel="hub"
Link: <http://blog.superfeedr.com/my-resource>; rel="self"

Please note, that similar to the 0.3 spec, we need both a self link and a hub link.
A consequence of using HTTP headers is that, instead of doing a full GET request, subscribers may just use the verb HEAD to perform the discovery of a resource.

Diffing

Feeds have a built-in diff mechanism. As they are collections of items, it is quite easy to identify what items have been added to the collection between to GET requests.

Now, using arbitrary formats means that it is not always easy or obvious to diff the resource in a meaningful way.

The 0.4 spec proposes to leave diffing up to the publisher an subscriber to identify what is the right behavior, between propagating the full document (when it was updated), or just propagating the diff, using the mechanism of their choice (text based diff, or content based diff).

Notification

When using feeds, the whole content is included in the feeds, including, as we’ve seen the discovery elements. Since most HTTP document do not have this feature, it means that the notification must include some of the HTTP headers served with the content, including, well, the Link headers.
This is an important difference with the 0.3 spec, and most not be overlooked, as some resources have meaningful information in the HTTP headers.

Signature

The only way to perform a secure notification and avoid “man-in-the-middle” attacks, it is important that the notification from the hub to the subscriber is signed, using an HMAC signature with a shared secret. This behavior is the behavior used in the version 0.3 of the spec.

In the version 0.4, we use the same mechanism, but please note that this will only sign the content, and not the headers. As such, if a subscriber wants to trust the complete notification, the only option is to use an HTTP*S* callback url.

There is another way, which is to provide a specific header which lists the signed headers, but we believe it’s overly complex. Please let us know if you disagree.

Private Resources

This is the other key aspect of this revised spec. Currently, the publisher doesn’t have any visibility on ‘who’ the subscribers are (and if there are any actually).

This is a very strong problem in the context of social networks where someone may want to distribute its content only to a restricted set of people: their friends, or family.

Typically, they want to approve or deny subscriptions, but also may want to later reject a previous approved subscription. They may also want to limit the subscription to a subset of the actual published feed.

Subscription

The current spec only uses the verification of intent to determine if a subscription should be accepted. We now need to use the publisher’s input to accept or deny subscriptions. It is also very likely that the publisher will wait for the actual user’s input before accepting or denying, which means that the subscription process now becomes asynchronous by default.

Publisher Verification

This is a brand new step, but it remains optional and subscribers should completely ignore this part. Using an agreed upon mechanism, the hub will ask the publisher whether the subscription should be accepted, by echo-ing the subscription request. The publisher can then decide to accept or deny the subscription, as well as include a reason for that.

This mechanism can, for example be combined with the use of a From HTTP header in the subscription. Then, using a webfinger discovery mechanism, the publisher can identify the actual user behind the subscription. Other mechanisms could also very well be used, but are beyond the scope of this spec.

Subscription Validation

Once the subscriber’s intent has been confirmed AND if the publisher agrees to the verification, then, the subscription should be validated and the hub will send a request to the subscriber to validate that subscription.

If not, the hub should also send a request to indicate denial, with, if applicable, the reason used by the publisher. The subscriber may then either retry (using additional parameters) or just abandon its subscription.

The subscription validation process can also happen later to deny an existing subscription.

Conclusions

The draft 0.4 clearly still has a points that needs to be clarified and improved, but we believe it solves the 2 current major issues that the current PubSubHubbub protocol has.

It is obvious that if a hub doesn’t respond to the 0.4 spec API calls, a subscriber should fallback to 0.3 for as long as the 0.4 spec hasn’t been widely adopted.

Please, send your feedback.