Thoughts on technology and social web

June 19, 2009

Protocols for the real-time web

Filed under: NextWeb, Social Networking — Ravikant Cherukuri @ 5:46 am

Today Collecta unveiled their real time search engine toady. One of the several players in this fastly evolving space. Others include twitter search, OneRiot, Tweetmeme, Facebook search, rumored google real-time search etc. The interesting thing about collecta is that they are true real time. You will see updates reach you within seconds (of collecta seeing them). This is because they use Jabber’s XMPP protocol to push updates to the client. This is one of many techniques used for real time communication on the web. Some invented as people needed them and others like XMPP that are standardized. What are these techniques and how do they stack up?

As early as 10 years back, we started seeing applications that tried to bring you information as it changes or as it is created on the web. This evolved into RSS/Atom based feeds. This is a polling based pull that simulates push. Your browser or blog reader will periodically poll for changes to your feeds and update you when they change. This evolved into feed aggregation services where all your feeds are aggregated into a single feed that you can poll from the client. This makes it efficient to poll on the client but the serivce that is the owner of the feed still gets the load. Consider this. For the web to be real time, the zillions of objects on web from product listings to blog posts to wikipedia articles have to be able to communicate to users in real time. The feed model just dosent scale to this.

As the web UI evolved we needed web applications to be more responsive and so AJAX was born. AJAX (Asynchronous Javascript And Xml) enabled javascript to make XML based calls to the web server and get data back to the current page without reloading it. This made web apps more responsive but the model is still the same as javascript now used XMLHttpRequest to poll feeds from the server.

Then consider real time collaboration scenarios like instant messages, collaborative document editing etc. These need real time responses as other users are watching the screen to see them. These applications need to be more realtime and constant polling will either overwhelm the servers (for short polling interval) or degrade user experience (with longer polling interval). Long polling / Comet comes to the  rescue here. The browser keeps long running connections open with the server so the server can send events to teh browser as they happen. The basic technique is for teh browser to make a request to the server for which the server does not respond till it has some data to send. Once the browser gets some data from the server, it make another request to the server. Many web apps like gmail, facebook, meebo etc use this technique to bring real time functionality to the web.

These techniques are also used by APIs that bring realtime to web by implementing web wrappers around existing proprietary realtime protocols like Messenger Web Toolkit,  Web AIM, Yahoo messenger SDK etc. The Messenger web toolkit provides a cool feature that allows you to send non IM messages that can be used to build higher level collaboration applications.

The Jabber/XMPP protocol is an extensible protocol that is used for publish-subscribe (mainly in instant messaging). This protocol is finding way to many real time web scenarios like –

  • real time search (Collecta), twitter (, friendfeed,
  • aggregators like PixelPipe and that let you interact with your social networks via XMPP,
  • WordPress firehose where partners like search engines and market intelligence providers who would like to ingest a real-time stream of new posts and comments the second they get published.
  • Twitter firehose where thrird parties can get the realtime stream of twitter data to mine and search.
  • Google wave extends XMPP to build a collaboration system

XMPP also has javascript API for the web like Strophe and xmpp4js. There is a technology similar to comet for XMPP to run on HTTP called BOSH. BOSH takes care of firewall traversal and tunnels XMPP over HTTP. Overall this is a fairly well designed and extensible protocol with a lot of good documentation and several reference implementations. This is becoming the protocol of choice for the real time web.

There is also the WebSockets API in the HTML5 specification. The HTML 5 specification introduces the Web Socket interface, which defines a full-duplex communications channel that operates over a single socket and is exposed via a JavaScript interface in HTML 5 compliant browsers. It tarverses firewalls and proxies and provides bi-directional transport with streaming capability without the long poll overhead. The javascript API is also very simple. COMET can surely take advantage of this and things become more straight forward without hidden iframes and arcane protocols. So can XMPP. Sounds like the holy grail in making the browser a two-way real-time medium.

This space is fast changing and a lot of smart folks are figuring out how to make the web more responsive and realtime. And the protocols keep evolving to acocomodate that.

Update : There is a good article about XMPP progress in 2009 at


Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Blog at

%d bloggers like this: