State of the Realtime Web : the Publishers

The realtime web is almost as hyped a word as geolocation. It actually groups a lot of different realities and doesn’t have the same meaning and implementation from service A to service B. Let’s first focus on publishers.

The first thing we should all acknowledge is that every publisher can and should be part of the realtime web movement. We’d all be in a better position if we stopped assuming that Realtime web is the same as Twitter ecosystem.

Definitions

These are obviously subject to debate, let me know if you disagree in the comments, but I’m posting them first so you can understand the rest of the post.

User : you, me, them, everybody.

Application : a piece of software on the internet, that can be used by users, but also by other applications.

Content : a piece of data available on the internet. A blog post, a job offer, a product description… etc. Some people call that a resource. This brings us to the fact that most content should have a URL (Universal Resource Location).

Publisher : an application that publishes content on the internet. In more words : an application that makes pieces of content available at a given URL.

Pushing : an action by a publisher which consist of actually distributing the published content. Distribution involves sending the content away.

Pinging : an action by a publisher which consist of telling to some other application that some content is available. It doesn’t send the content, it just tells that the content is available for pickup.

The Realtime web Publishers

A small minority (yep, the opposite of a vast majority) of the content is currently being published in realtime. A lot of publishers will just make their content available without caring about the fact that this content is distributed. For a few years a few publishers where pinging another (smaller) set of people when they updated their content. You might be familiar with services like Ping-O-matic.
It’s not until very recently (maybe 18 months) that some services started to push their content away and distribute it. The main motivation behind this was to stop being polled over and over again for the same content. The first company who did this was Twitter, and their infamous firehose.

Twitter

Twitter opened the pandora box. By pushing their content, they allowed a whole new breed of applications which would consume the firehose in realtime and show it to their users. I know several people who were consuming Twitter only thru Friendfeed interface. At the same time, this proved like a very important monetization scheme for Twitter : search engines like Google have a very hard time crawling the whole web, yet alone high frequency publishing services like Twitter, so they paid a lot to get access to this firehose.
Yet, Twitter failed at establishing an open and standard protocol that could be used by other publishers to represent and distribute their data in a similar fashion. Worse, if you want Twitter’s data in realtime, you need to have a specific business relationship, which, in a lot of ways is the opposite to being open.
Their main motivation is still very present, and their ~~Chirp~~ User Stream is just another iteration which aims at making desktop clients like Seesmic or Tweetdeck much more efficient by pushing (in a proprietary manner again) them all the data for a given user.

Facebook

Facebook jumped on the boat very recently. Their “private” approach has long prevented them to push their content, because it would mean that they don’t have any control on the data once it’s out the door. Until a few weeks ago, they were fighting this by forcing any person who got content from Facebook to delete it after 24 hours. In practice, it meant that any service would have to poll Facebook every 24 hours for all the data they care about. Broken.

This changed a few weeks ago, because Facebook started to push some content (obviously people who hate their boss don’t get it). Facebook is becoming very good at communicating stuff, they hired a bunch of Open Web advocates and they now claim that they are open. Listen to me : Facebook is Open. Facebook is Open, Facebook is Open. Got it? Facebook is Open. Except that it’s not.

Of course, they say they use something similar to PubSubHubbub. Except that hum, the point of PubSubHubbub is to be an open and standard protocol. The standard part is very important. It means that services could interact together, since they speak the same language. Currently, this is not true : the PubSubHubbub from Facebook is not the PubSubHubbub defined by the spec, for many good? reasons, but calling it PubSubHubbub has more marketing value, I guess.

The worst thing about all this is that Facebook does push a hell of a lot of data… to whom you may ask? To Google. You see the pattern : an open platform that pushes content to some people (just not you, because your too small), based on business relationships, again.

The rest of the world of realtime publishers

By reading the 2 previous paragraphs, you may think I’m some kind of socialist and I hate business relationships. Well, I’m french, so that could explain it, but I’d argue that business relationships are also walled-gardens. Anytime you have a one-to-one, it’s about the 2, not about the others, right? If you married your spouse, it’s not to have ~~sex~~ dinner every night with a different person, is it?

In many ways, the relationships that Twitter and Facebook have with a very small subset of other application is de-facto excluding the rest of the world, and the rest of the future world.

Interestingly enough both Twitter and Facebook hide behind their APIs. Having an API is great, but it’s just another form of slavery (thanks @brad for the analogy). The consumers of these APIs are at the mercy of the API provider. Twitter decides to shut down HTTP Basic Auth? Sorry little app maker, you’ll have to shut down your app or abide by this new commandment “Thou shall not ask for user credentials”¹. Facebook changes their TOS? Sorry Zynga, no more invites for you.

If we want to build the realtime web, like we built emails or even HTML sites, we need a protocol. We need a way for services to interact safely with one another as peers, not as client and servers.

Luckily we have this protocol. We even have several (that’s the beauty of it : people fight for the shared wealth!). At Superfeedr, we placed our bets on PubSubHubbub and XMPP PubSub.

Most of the smaller (but much numerous) publishers have chosen the protocol approach. Whether it’s because they genuinely believe that it’s the good approach, or because they’re actually too small to force their API down other people’s throats is irrelevant. What matters is that now, all the main blogging platforms : Wordpress, SixApart, Blogger, Tumblr, Posterous are using the same protocol : PubSubHubbub. Some other smaller social networks, like Cliqset or Status.net and Gowalla are using it as well. Even a few newspapers!

Combined, this is probably bigger than Twitter or Facebook’s data. Soon, it will be much much bigger, because so many publishers don’t push their content yet…

The non realtime publishers

This is the last category of site, and still the vast majority. When looking at Alexa’s top 50, it’s obvious. Yahoo!, Windows Live, Baidu, Wikipedia, Amazon, Ebay, LinkedIn, Flickr, Craigslist, RapidShare…

Where are the e-commerce website sharing their catalogs in realtime with the price comparison search engines?

Where are the classifieds website pushing their content in realtime to iPhones?

Where are the Sports site pushing their content in realtime to forums or chat services?

Where are the news outlets pushing data to the feed readers?

One could argue that pushing data is letting 3rd party application use it. That’s my point actually : pushing data away is at worse pushing it to services with users, which means that your content will eventually gain eyeballs. At best, nobody cares about your content, so you’re safe.

Of course, we talked about the hundreds of new usages that are yet to be seen, from sync, to mobile, from presence to notifications… we haven’t seen anything yet. Please, publishers, let others benefit from your data. Publishing your awesome content without distributing it is pretty much like making the best product in the world but leaving it in the factory.

It’s time for content publishers to make their content dynamic and push it so they can control its distribution.

¹ I’m not advocating for HTTP Basic Auth, right? Just saying that thousands of small apps will shut their door, just because Twitter decided so.