Thoughts on technology and social web

October 1, 2011

HTML5 in Apps

Filed under: development — Tags: — Ravikant Cherukuri @ 7:50 pm

HTML5 is definitely the top UX technology to come in the last few years and is set to dominate the space in coming years. I wrote a post a while back about the promise of HTML5. This post is a look into using HTML5 in apps, with a closer look into PhoneGap and Windows 8 Web Apps. Thanks to cross company efforts and a lot of hard work from standards groups, HTML5 is already well adapted. All the major browsers Chrome/IE/Safari/Opera/Firefox support HTML5 and are completing to keep current with the standards. It is also interesting to see that pieces of the standard are being adapted even before the standard is finalized giving a push to the standardization process itself. Google, Apple, Microsoft, Amazon, Netflix etc are the companies that are betting on the success of HTML5. Microsoft’s move towards HTML5, much to the chagrin of a considerable silverlight developer base emphasizes this.

With the success of iPhone and iPad, we have seen the explosion of mobile app marketplace and developer focus shifted to the platform. Though the development environment is relatively primitive on iOS, we saw hundreds of thousands of apps in Apple’s AppStore doing amazing things. If you have a successful company, chances are that you have an iOS strategy. The first generation of apps could target close to 100% of the market by just having a iOS version. Finally after years of apple domination, we are seeing a glimmer of hope for other viable platforms in mobile and tablet markets – Ice-cream sandwich, Amazon Fire, Windows 8. As tablets are reaching an inflection point, there are several strong choices emerging and this would mean that app developers need to worry about being cross platform. Instead of targeting web and iOS, we now need to think about targeting web/iOS/Android(Several flavors – Amazon Fire included)/Windows 8 along with a myriad phone OSes and  of course the desktop. This is where HTML5 steps in as a cross platform app framework.

PhoneGap is an HTML5 app platform that allows you to author native applications with web technologies and get access to APIs and app stores”.

If you are interested in mobile app development, you should check out PhoneGap. Its is a very easy to use toolkit that allows you to leverage the HTML5 capabilities that are supported most platforms to “write once and run everywhere”. The best thing I like about PhoneGap is its plugin architecture and the community around it. The plugin architecture lets you expose the capabilities traditionally available only in native apps into HTML5 and javascript. There are scores of plugins written and shared by people that will get you started on whatever magic your favorite gizmo can do. Given the power of HTML5 and the plugin architecture, there is little that your app cannot do.

Compared to developing in Objective-C or Java or C#, HTML5 + JavaScript is a breeze. As with any technology, you need to know what you are doing to get the best out of PhoneGap. There are several things to keep in mind.

  • PhoneGap is not an excuse to make your app look like a website. Apps still have to look native and preserve the flow of the device they are running on.
  • You need to take care to make sure your app UX is snappy. Browsers are getting better but there are still some places where you need to understand how to work around issues. Thankfully there is a big community that can help.
  • There are always those rare scenarios where you need native code. Recognize them and make a plugin.

There are several UI frameworks that can give you UI that looks natively mobile – jQTouch/Sencha/Dhtmlx. Look at these demos for to get the picture.

This one is my favorite –

Windows 8 has a different take on application development than iOS/Android and most other platforms. Instead of making HTML5 available to apps by embedding a browser control, Windows 8 makes HTML5 a first class citizen and provides a platform for building metro-style apps completely in HTML5 with full access to everything that the OS provides. This is similar to PhoneGap’s approach only supported from grounds up.

The Phone/Tablet market is evolving fast and there will be many players with enough market share for developer to take notice. Be it enterprise apps or games or front end apps for web services, taking a cross platform approach makes more sense than ever. has an impressive collection of apps written using PhoneGap.


March 2, 2010

Pubsubhubbub with .NET and WCF – Part 1

Filed under: Uncategorized — Ravikant Cherukuri @ 12:42 am

Pubsubhubbub (PuSH) has been gaining momentum in the real-time space as a protocol for server to server notifications on changes to feeds. Google Buzz API is based on PuSH. Several google services have adopted this and its now catching momentum among other services as well. Superfeedr is my favorite startup in this space. They host hubs for your feeds, handling the complexity of hub implementation for you. The real-time web is becoming truly real-time now and companies like superfeedr could provide the backbone for this new model.

The PuSH protocol itself simple to implement and defined in an open spec. Most of the code samples available for this are in ruby/python/php etc. I started working on a PuSH subscriber in C# with a WCF REST service. WCF provides an easy to use framework to build web services in .NET. To implement PuSH, there are some WCF quirks to getting this done. In this post I explain how to build a WCF subscriber service.

These are couple of good posts about how to use and implement PuSH.

We need a service that supports the following REST APIs to behave as a PuSH subscriber.

  • Verify
    This will receive a GET request to verify the subscription once we POST a subscribe to the hub.
  • Callback
    This will receive new content posted to the feed.

Note that PuSH is a server to server protocol and so the subscribing entity is also a server. There are two pieces of code we need to write for the subscriber. One is the client code that posts a subscribe onto the hub and the second is the service that responds to the verification and data callbacks. The service is implemented using WCF and the subscriber client library using HttpWebRequest. I used WcfContrib library to encode/decode requests and responses. The WCF service interface would look like this.

public interface IPubSubHubbub
    [WebInvoke(Method = Verbs.Get, UriTemplate = "/callback3?hub.mode={mode}&hub.topic={topic}&hub.challenge={challenge}&hub.lease_seconds={leaseSeconds}&hub.verify_token={verifyToken}")]
    Stream Verify(SubscriptionMode mode, Uri topic, string challenge, int leaseSeconds, string verifyToken);

    [WebInvoke(Method=Verbs.Post, UriTemplate = "/callback3")]
    void SubscriberCallback(SyndicationFeedFormatter formatter);

When a subscriber subscribes to a hub, the subscriber provides a callback URL to the hub. The hub calls this URL to verify the subscription. Later, when the topic has new content published, the hub will call the same URL with the content (PuSH does a fat ping). This is why both methods above gave the same base URI. Verification is a GET request while the data callback is a POST. Subscription mode is defined using DataContract as below. Notice that the Verify method uses a Stream return value. This is because the Verify method needs to return the challenge that the hub sends in “text/plan” format. The new content notification happens as a POST in ATOM format.

public enum SubscriptionMode
    [EnumMember(Value = "subscribe")]
    [EnumMember(Value = "unsubscribe")]

Below is the WCF server implementation.

[AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)]
[WebDispatchFormatterMimeType(typeof(WcfRestContrib.ServiceModel.Dispatcher.Formatters.FormUrlEncoded), "application/x-www-form-urlencoded")]
[WebDispatchFormatterMimeType(typeof(WcfRestContrib.ServiceModel.Dispatcher.Formatters.PoxDataContract), "application/atom+xml")]
public class Service1 : IPubSubHubbub
    public Stream Verify(SubscriptionMode mode, Uri topic, string challenge, int leaseSeconds, string verifyToken)
        WebOperationContext.Current.OutgoingResponse.ContentType = "text/plain";
        return new MemoryStream(Encoding.UTF8.GetBytes(challenge));

void Callback(SyndicationFeedFormatter formatter)
        ServiceReference1.Service1Client service = new ServiceReference1.Service1Client();

        foreach (SyndicationItem item in formatter.Feed.Items)

Things of note here.

  • The "application/x-www-form-urlencoded" encoding is handled by the WcfContrib library’s FormUrlEncoded formatter class.

  • The verify method returns the challenge back as a memory stream and sets the encoding as "text/plain" on the WebOperationContext. Without this, the default encoding will be "text/xml" and hub would fail the subscribe.

  • The Callback method would received a parsed feed. The SyndicationFeedFormatter should be able handle both ATOM and RSS feeds, though I tested it only with ATOM.

To establish the subscription, the subscriber needs to make a REST call to the hub. The initial input that the subscriber would have about the topic that needs to be subscribed to is the URL to the feed. The feed header would have a link of type “hub” that would point to the hub for this feed.

String GetHubFromFeed(String feed)
    WebRequest r = HttpWebRequest.Create(feedUrl);
    WebResponse re = r.GetResponse();

    using (StreamReader sr = new StreamReader(re.GetResponseStream()))
        XmlDocument doc = new XmlDocument();

        XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
        XmlElement hubElement = (XmlElement)doc.SelectSingleNode(
                                                                  "//atom:feed/atom:link[@rel=’hub’]", nsMgr);
        return hubElement.SelectSingleNode("@href", nsMgr).Value;

Once we have the hub URL, we need to setup a subscription to the hub for our topic URL. The topic URL is the URL in the atom:link with rel=”self” (There has been discussion in the PuSH group about being able to subscribe with any URL that points to the feed and in the latest version of the spec, its no longer a requirement to subscribe to the “self” URL).

public String Subscribe(string feedUrl, string hubUrl, string callbackUrl)
    StringBuilder contentBuilder = new StringBuilder();
    NameValueCollection collection = new NameValueCollection();
    byte[] responseContent = null;

    collection.Add(hubCallbackParamName, callbackUrl);
    collection.Add(hubMode, "subscribe");
    collection.Add(hubTopic, feedUrl);
    collection.Add(hubVerify, "sync");
    collection.Add(hubVerifyToken, "test_token");

    for (int i = 0; i < collection.Keys.Count; i++)

    int stringLength = contentBuilder.Length;
    byte[] requestContent = Encoding.UTF8.GetBytes(
    contentBuilder.ToString(0, stringLength > 0 ? stringLength – 1 : 0));

    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(hubUrl);
    request.Method = "POST";
    request.ContentType = "application/x-www-form-urlencoded";
    request.ContentLength = requestContent.LongLength;

    using (Stream requestStream = request.GetRequestStream())
        requestStream.Write(requestContent, 0, requestContent.Length);

(HttpWebResponse response = (HttpWebResponse)request.GetResponse())
        if (request.HaveResponse)
             responseContent = new byte[response.ContentLength];
             using (Stream responseStream = response.GetResponseStream())
                 responseStream.Read(responseContent, 0, responseContent.Length);

    return Encoding.UTF8.GetString(responseContent);

This completes a PuSH client implementation. In a later post I will go over the implementation of a hub in WCF.

October 19, 2009

Workflow with activity streams

Filed under: Uncategorized — Tags: — Ravikant Cherukuri @ 11:15 pm

For a while I have been thinking about how the social stream technologies (facebook newsfeed/twitter timeline) can be used to solve other problems. Facebook and the other 100 social networks out there familiarized millions of users to the concept of a life stream. The activity streams effort puts a standardized structure on the lifestream. This enables the activities to roam networks. There is a lot of energy being spent on how this roaming can be standardized and made to work as seemlessly as the telephone network works today. All these ideas are building a platform for the future web. It’s time to start buiding the applications for it.

Today we think of our lifestream as the flow of stuff we like, we want to share, we want to comment on etc. These are intended for people in our network to consume. This data is pretty much unstructured. Life would include much more than what you share with your friends. If the tasks that you (need to) accomplish on a day to day become a part of your activity stream, lifestream applications can help you achieve them. These streams could be private and you could share them with applications that you choose. For example,

  • your calendar could talk to you via your lifestream. Something like evite could be managed via people’s lifestream. You post the invite on your stream and make it visible to people you want to invite. They could accept the invitation right from your stream.
  • Another example – you drop your car at the dealer and you want him to contact you when its done. In a world where lifestreams are ubiquitous, you should be able to expose a private and time bound stream to him that he could write to (just like you share a telephone number today). He could talk back to you on this stream.
  • You could open your stream to your travel agent on which the agent could push itineraries to you that you could comment on or accept. Once you accept one, the stream could be handed over to the airlines and so on.

The trick is to do this in a way that is simple for users, secure in a way that this does not expose more data than that is necessary to the third parties and works in a network  independent manner. OAuth and activity streams look like the right technologies. Enabling the devices and services that you use on a day to day basis to get things done, to interface with you using your life stream seems like a logical step forward. These kind of things have happened on a crude level on email. Email still does not have a standardized content format and so does not tend well to programmability.

October 6, 2009

Service federation in the cloud – PushButton or Google Wave?

Filed under: Real time web, Social Networking — Tags: , , — Ravikant Cherukuri @ 6:54 pm

PushButton technologies – Pubsubhubbub (PuSH), Webhooks and ATOM , provide a lighter weight and web friendly alternative to a more heavy weight alternatives that Google wave uses. Mainly for sever to server federation. On a high level, this seems true as the pshb uses http posts to communicates vs persistent connections of xmpp. Seems more web friendly and easier to implement subscribers and publishers on a language of your choice.

Disclaimer: On one level, pubsubhubbub and wave are solving different problems. But at a higher level both implement real time federation with pub-sub. My attempt is to evaluate wave as a vehicle for service federation. I actually think that wave is a great idea but its too heavy for the web as we have it today. Web is evolutionary wave is too drastic and revolutionary.

What makes wave compelling?

There are several interesting aspects. The first is the shell (the UX). The presentation is like e-mail but is more real time. It takes a paradigm that everybody understands and then extends it. The second is the extensibility. With gadgets and robots you could extend the wave system. With embedding you could extend the way waves are visualized and consumed. The third is federation. Federation makes it possible to have different providers who own their own data to communicate and make it possible for their user bases to work with each other. This kind of federation is not new. FriendFeed inter ops with many sites like flickr using SUP. Many IM networks inter operate too. But wave federation tries to standardize that to provides for XMPP like universal federation where you dont have to know or work with all the providers to federate with them.

What makes Wave complex?

Wave tries to solve a bunch of problems to make real time persistent communication possible. The protocol that wave uses to federate dictates that all federated partners maintain complete copies of objects even though the object is owned by one of them. Operational transformation(OT) dictates that all the operations on the document are stored along with the latest copy of the document. When a user from a federated provider joins a wave, the provider gets a copy of the wave in terms of the operations. All these features provide for an e-mail like decentralization to data. Wave builds on top of this with real time collaborative document editing. All these make the protocol incredibly complex.

The wave protocol imposes heavy requirements of what a remote partner has to implement. The protocol data model (which has to implement OT) would require a third party to implement state in terms of operations on it. That is, you will have to keep track of not only the current state of teh wave but also the history operations that made it so. This is not a pattern that services normally use. For a large scale service inter-op, each partner has to maintain a full copy of the wave. This doesn’t sound very feasible at web scale. Imagine an enterprise wave user joining a large public wave. Now suddenly my enterprise wave server needs to handle this barrage of updates.

Could it be simpler?

Activity streams is an effort by several people involved in social networking to standardizing protocols for different networks to inter-operate. The great thing about activity streams is that it defines the protocol for expressing activities without defining how the providers should implement their services. This is how web is today. We cant (and should not try to) anticipate how other would use a protocol/service. Just define the interface and leave the rest to the rest.

PuSH with ATOM is more web friendly than pub-sub protocol than XEP-0060 that wave uses. Simple HTTP POSTs to make and break subscriptions and receive notifications makes the interface very natural. Can a collaborative data sync algorithm like OT be implemented on top of PuSH or is it even necessary to achieve federation? As the format of the data going over PuSH is ATOM, services could exchange ATOM items to synchronize, the world would be much simpler.

Without OT, if the federating providers want to maintain copes of data that is actually owned by another provider, there is still the complexity of synchronization. In most cases, this level of sync is not required. Its only for when an item of small granularity if being edited simultaneously by multiple people. If the systems can identify conflicts and either overwrite items or prompt users to correct, it could be acceptable for most cases. The FeedSync protocol that Live Mesh uses works this way. Your content is a feed of items and items are synchronized using a simple algorithm and conflicts are flagged.

Combining Activity Streams with PuSH and FeedSync can provide a much simpler, more web like infrastructure for a universal federation model that is easy to build on and participate in. Federation is nothing but mashups with a business model. While the mashup space delights us with innovations everyday, service federation moves at snails pace. A simpler model to federate at web scale (in size and diversity) would lead to seamless aggregation of your data from any service (that your friends might be using) to the services that you use.

September 7, 2009

pubsubhubub and rss-cloud : changing the way you read the web

Filed under: Cloud Computing, Real time web, Social Networking — Ravikant Cherukuri @ 11:39 pm

Today WordPress declared support for RSSCloudPubsubhubub was adopted by blogger/ google reader a few days back. These technologies are being adopted much faster than I expected. Much like AJAX a few of years back, the buzz is building up. Anil Dash’s posts on pushbutton talk of the potential of these technologies. The basic idea is very simple. A RESTful pubsub protocol that provides real time notifications for web content.

  • Subscribers register their interest in publisher content notifications to hubs.
  • Publishers send contents pings to the hub when new content is posted.
  • The hub reacts to the ping by fetching content from the publisher and posting the content to the subscribers.
  • All content is in RSS/Atom format.
  • Communication to and from the hub happens over HTTP using webhooks.

This has the potential to change web as we know it. It could bring Twitter-like real-time notifications to changes on all of web’s content. As an end user, this is exciting because this could finally bring (much deserved) death to the browser refresh button by chanelling all the content you are interested into streams that can be delivered to you in real time. Pushbutton with the DiSO/Activity-Streams, is great for social network federation. Activity stream’s atom format should work well with pushbutton.

Hubs federating and chaining subscriptions with each other provides a distributed and decentralized pub-sub infrastructure. This has some definite advantages over XMPP. XMPP’s XEP-0060 provides a similar generic pub-sub functionality. But still the REST based approach is simpler and more web savvy. Webhooks take the honors here with its REST based callbacks as opposed to XMPP’s connection based approach.

Add the bidirectional communication channels on web pages (using long poll today or web-sockets in HTML 5) to the mix and we have a means to deliver notifications to end users. When I am on facebook and I get a mail in my gmail account, the notification could be delivered to me through facebook.

Update : A few of good posts comparing pubsubhubub and rsscloud –

How to publish and receive blog posts in Real-time
PubSubHubbub vs. rssCloud
PubSubHubbub = rssCloud + ping ?
RSSCloud Vs. PubSubHubbub: Why The Fat Pings Win
There’s a Reason RSSCloud Failed to Catch On
The Reason RSS Cloud Can Work Now
The Web At a New Crossroads

August 13, 2009

Real time roundup [Part 2] — responsive web applications

Filed under: Uncategorized — Ravikant Cherukuri @ 2:22 pm

This is often a hard to appreciate aspect of the real time stack. But only till you experience it. Think about the instant delivery of notifications in Facebook or the character by character real time transmission in Google wave. On the face of it, it might look like a minor feature but from UX responsiveness, this is big. Once you use these apps, the rest of the world would look sluggish kind of like how a dial-up internet connection feels today. This is as much of a leap as the transition from refreshing web pages to see updates to letting the RSS reader do it for you.

This pattern can be applied to many web applications. This is the key that will bridge the gap that web applications have over desktop applications. Their ability to react to user/data events in real time. With HTML web sockets, this might be more standardized but even today, long poll/comet etc are pretty scalable and general purpose. Once the common web programming environments like ASP.NET/JSP/PHP etc embrace this concept fully, there is great potential here.

This could even be in the form of a generic pub-sub provided as a secure cloud service. As a web site developer, I need a pipe to transfer data to and from the browser in real time. But I dont want to invest in a pub-sub infrastructure which is very different from the HTTP programming model that I am used to and has entirely different scaling characterstics. If this is provided as a service on Microsoft Azure/Amazon ECS/Google AppEngine, if would be much easier to integrate into the current programming model. But what would such a reusable model include ? A few things off the top of my head :

  1. The long poll connection to the server with a javascript component that is embedded in the web pages. This will enable the duplex connectivity.
  2. A pub sub infrastructure that will provide some session storage and topic based subscriptions to the sessions. For example, when a user visits a rel time web page, a session will be created for it on the pub sub backend. Then the web site (instance) will register for topics that it is interested in.  The web server could publish these topics to the pub-sub system and the pub-sub system would push these to the browsers.
  3. Topics can be transient entries in the pub sub system that are created as pages subscribe to them.
  4. An API exposed to the web site backend that would push arbitrary topic updates to teh pub sub system.

This kind of an infrastructure component provided as a piece of code to be reused or as a cloud service would make it easy to build real time responsive web sites.

July 17, 2009

Real time roundup

Filed under: Cloud Computing, NextWeb, Real time web, Social Networking — Ravikant Cherukuri @ 6:50 pm

Real time web has been a favorite topic for me for a while now. I work on one of the large IM systems and am very interested in these developments. Real time web started emerging as the platform for the next web in the last year it two. The main idea here is that events happening on the web are brought to you as they happen. Think the coverage that the iranian election aftermath got with twitter. The more obvious scenarios are already a reality and there are several more subtle but equally impressive usages that many are working on. As with all technologies that are driven by guy-in-the-garage start ups, its tough to see where this is all leading (if at all). Real time web focuses on several aspects of the web.

  • filtered web streams
  • instant delivery of web posts
  • real time collaboration
  • responsive web applications
  • S2S data filtering

The lack of full duplex connectivity has been a down side of HTTP. This gave native applications a one-up. HTML5 is out to correct that with Web Sockets. Meanwhile, technologies like COMET, BOSH etc provide a near real time connectivity over HTTP. So, the technology seems to be in place for the real time web. But why does real time web matter? We already have near-real-time, polling based push technologies like RSS/Atom. In my mind the answer is user experience. Real time gives the most natural experience. Compare a walkie-talkie to a telephone. A walkie-talkie is near real time with a lot of user level sync. Its just a quirk of technology. A telephone conversation is a full duplex real life conversation. Makes a lot of difference.

More links to the real time web :

Introduction to the RealTimeWeb

The Real-Time Web – O’Reilly Broadcast

Is Real-time the Future of the Web?

Building Real Time Web Applications Using HTML 5 Web Sockets

How to Deal with the Real Time Web: Navigating the River

Real time search and filtered streams

For me the coolest part of microblogging is real time search. The significance of most information rapidly fades over time. So, getting short spurts of information as and when it happens is super userful. Twitter’s hash tags are a good example. You follow information rather than people. I find the following ways to track real-time information interesting.

  • Twitter hash tags. Send something like #iwant in the tweet and people can easily track your tweets and might contact you if they are selling what you want. Gives meta data  to your tweet and makes it trackable by other users. The #iwant and #ihave tags are a good example. There are third party sites that consume the twitter firehose and build a realtime marketplace .  Such mining verticals on real time data are fast emerging. Many hugely popular twitter games like spymaster are another example.
  • Google’s “e-mail me when a new item is found that matches my search criteria” feature. This is super useful. If you know what you are searching for, you can be the first to find information as it comes live. Google can deliver this to you with in a few hours of the informations getting posted on web. This is realtively real time (compared to others) but is still a push. When this becomes true real time, it will be awesome.
  • Aggregated real time search. There are several real time search engines out there that can get you real time feed search from twitter/facebook/identica etc in one interface. Some of these have AJAX powerd interfaces and some have true real time with XMPP (collecta).

The basic idea behind this is to subscribe to receive changes data on the web without tying in to specific web sites, in real time. Eventually, the bigger search engines like google/yahoo/bing will comeup with a tighter integration of real time into all of the web that they index. With that, real time search will seamlessly integrate with the normal web search as we know it.

Instant information delivery

This category of real time applications aim to deliver your data of interest to you as soon as its posted. Sample scenarios here include

  • Deliver blog posts and comments that you are interested in instantly. This builds on the current RSS based systems by bringing true real time to content delivery.
    • : Word press now allows you to keep track of other word press blogs and comments and comments on your wordpress blog over XMPP IM with clients like GTalk.
    • Tweet.IM delivers your twitter feed over IM in real time using XMPP
    • Friendfeed aggregates feeds form all your friends blogs, social streams and comments. Friendfeed has a feature that delivers these are friend feed finds them over XMPP IM.
    • Google Wave uses XMPP IM protocol to sync wave content in real time.
    • There are a couple of startups like iNezha that provide real time updates to all the blogs that you are interested in.
  • Enable efficient content aggregation uisng XMPP. Google PubSubHubub is a good example. There were also several experiments by companies like Gnip, FriendFeed etc to use XMPP for this purpose.
  • IM systems are integrated with email systems (by the same vendor) today. You get an email and you are presented with a toast my the IM system in real time.
  • Windows Live Messenger also supports alerts from different third party providers. This is real time events from arbitrary providers with whom you have registered your interest. You can click on the windows live alerts button on my blog and get alerts onto your messenger whenever I update my blog.

[More to come in a later post]

June 19, 2009

Protocols for the real-time web

Filed under: NextWeb, Social Networking — Ravikant Cherukuri @ 5:46 am

Today Collecta unveiled their real time search engine toady. One of the several players in this fastly evolving space. Others include twitter search, OneRiot, Tweetmeme, Facebook search, rumored google real-time search etc. The interesting thing about collecta is that they are true real time. You will see updates reach you within seconds (of collecta seeing them). This is because they use Jabber’s XMPP protocol to push updates to the client. This is one of many techniques used for real time communication on the web. Some invented as people needed them and others like XMPP that are standardized. What are these techniques and how do they stack up?

As early as 10 years back, we started seeing applications that tried to bring you information as it changes or as it is created on the web. This evolved into RSS/Atom based feeds. This is a polling based pull that simulates push. Your browser or blog reader will periodically poll for changes to your feeds and update you when they change. This evolved into feed aggregation services where all your feeds are aggregated into a single feed that you can poll from the client. This makes it efficient to poll on the client but the serivce that is the owner of the feed still gets the load. Consider this. For the web to be real time, the zillions of objects on web from product listings to blog posts to wikipedia articles have to be able to communicate to users in real time. The feed model just dosent scale to this.

As the web UI evolved we needed web applications to be more responsive and so AJAX was born. AJAX (Asynchronous Javascript And Xml) enabled javascript to make XML based calls to the web server and get data back to the current page without reloading it. This made web apps more responsive but the model is still the same as javascript now used XMLHttpRequest to poll feeds from the server.

Then consider real time collaboration scenarios like instant messages, collaborative document editing etc. These need real time responses as other users are watching the screen to see them. These applications need to be more realtime and constant polling will either overwhelm the servers (for short polling interval) or degrade user experience (with longer polling interval). Long polling / Comet comes to the  rescue here. The browser keeps long running connections open with the server so the server can send events to teh browser as they happen. The basic technique is for teh browser to make a request to the server for which the server does not respond till it has some data to send. Once the browser gets some data from the server, it make another request to the server. Many web apps like gmail, facebook, meebo etc use this technique to bring real time functionality to the web.

These techniques are also used by APIs that bring realtime to web by implementing web wrappers around existing proprietary realtime protocols like Messenger Web Toolkit,  Web AIM, Yahoo messenger SDK etc. The Messenger web toolkit provides a cool feature that allows you to send non IM messages that can be used to build higher level collaboration applications.

The Jabber/XMPP protocol is an extensible protocol that is used for publish-subscribe (mainly in instant messaging). This protocol is finding way to many real time web scenarios like –

  • real time search (Collecta), twitter (, friendfeed,
  • aggregators like PixelPipe and that let you interact with your social networks via XMPP,
  • WordPress firehose where partners like search engines and market intelligence providers who would like to ingest a real-time stream of new posts and comments the second they get published.
  • Twitter firehose where thrird parties can get the realtime stream of twitter data to mine and search.
  • Google wave extends XMPP to build a collaboration system

XMPP also has javascript API for the web like Strophe and xmpp4js. There is a technology similar to comet for XMPP to run on HTTP called BOSH. BOSH takes care of firewall traversal and tunnels XMPP over HTTP. Overall this is a fairly well designed and extensible protocol with a lot of good documentation and several reference implementations. This is becoming the protocol of choice for the real time web.

There is also the WebSockets API in the HTML5 specification. The HTML 5 specification introduces the Web Socket interface, which defines a full-duplex communications channel that operates over a single socket and is exposed via a JavaScript interface in HTML 5 compliant browsers. It tarverses firewalls and proxies and provides bi-directional transport with streaming capability without the long poll overhead. The javascript API is also very simple. COMET can surely take advantage of this and things become more straight forward without hidden iframes and arcane protocols. So can XMPP. Sounds like the holy grail in making the browser a two-way real-time medium.

This space is fast changing and a lot of smart folks are figuring out how to make the web more responsive and realtime. And the protocols keep evolving to acocomodate that.

Update : There is a good article about XMPP progress in 2009 at

June 18, 2009

HTML5 – the way ahead

Filed under: NextWeb — Ravikant Cherukuri @ 4:17 am

For many web developers, even though web 2.0 and AJAX was a gaint leap, we were stuck in a place where the HTML 4 (and its many interpretations) was restrictive and there is still a difference between the look and feel of a well designed desktop app and a well designed web-app. Adobe flash, AIR, SilverLight etc attempt to bridge this gap. But what if HTML could do it all. Can JavaScript be powerful enough to do all the things that C#/ActionScript can do? Can HTML5 carry forward the experience and knowledge collectively gained in the last 20 odd years of UX programming? The HTML5 standardization process is going to take a few more years, but browsers are already implementing the draft standard. For developers and companies to invest in HTML5, we need a firm commitment from the browser makers. Especially since the last iteration did not go well.

Browser Alert

In the current imperfect world, HTML4 itself did not realize its potential because of the interpretations that different browsers made of the standard. The end result being all the browser specific quirks that developers are forced to learn. IE (6/7/8), Firefox, Opera, Safari and now Chrome all of them need to get their act together. The user has choice today. So better get your act together or people will move to a different browser. This is the biggest hurdle for HTML5 and also a key test for browsers.

Flashy without flash

I started looking at HTML 5 like many others after watching the google wave demo. The fluidity and interactiveness of the interface is amazing.

A few more examples of what HTML + Javascript could do. Here is a visual studio like editor in HTML developed by firefox.

Firefox 3.5 implemented the video tag of HTML5. This is a really awesome demo.

OTOY is a company trying to take your console into the cloud. It does all the graphics processing on a server and just renders the picture onto your browser with no plugins or downloads. How cool is that?


What does HTML5 have that makes it powerful? There are several new features in HTML5 (list from wikipedia). It brings many patterns of usage today into the language.

  • New parsing rules oriented towards flexible parsing and compatibility
  • New elementssection, article, footer, audio, video, progress, nav, meter, time, aside, canvas, datagrid
  • New types of form controls – dates and times, email, url, search
  • New attributesping (on a and area), charset (on meta), async (on script)
  • Global attributes (that can be applied for every element) – id, tabindex, hidden
  • Deprecated elements dropped – center, font, strike, frames

The canvas tag makes it possible to get flexible drawing into the realm of HTML. I have used other javascript apis that simulate a canvas. These work by drawing 1 pixel divs that are absolutely positioned. The canvas tag is way more economical and elegant.

At last there are combo boxes that are native in HTML using the datagrid tag.

The input tag now has many new types – datetime, datetime-local, date, month, week, time, number, range, e-mail, url, search, color. The progress tag will display a progress bar. It was so dumb that there were 100 different implementations for these. Many other new tags provide HTML native versions of elements that were not natively supported and resourceful javascript junkies had to hand code (figure, footer, header, meter etc)

The video/audio tag (as seen in the firefox demo above), provides rich video/audio embedding and manipulation capabilities natively in HTML.

WebSockets API that provides in-built support for two way communication. This is a good idea with the proliferation of real-time web applications with long-poll/COMET etc. This might take us to a true real-time duplex connectivity. An interesting comparision of AJAX/COMET to HTML5 can be found here.

The <script> tag has an async attribute that makes the page load to continue while the script is loading. This will speed up page loading and improve user experience. With the complexity and size of the scripts on the rise, this is very handy.

The browsing context defines the algorithm that browsers use for navigation and the related sessions history traversal. A browsing context is an environment in which Document objects are presented to the user.

Session history and navigation using javascript APIs.

Cross domain communication using Window.postMessage. This is already in most browsers and what a relief. The earlier cross domain communication techniques (hacks) relied on changing the hashes (the test that follows #) in a URL and a background timer. eeeks!

Offline web application caches. Google gears/Windows live mesh etc have build this with no browser support in a non standard way. Now HTML5 defines this as a standard. The ononline and onoffline events give javascript the ability to track connectivity and will enable the AJAX application to behave accordingly (detailed article).

DOM Storage as a way to store meaningful amounts of client-side data in a persistent and secure manner. John Resig of the jQuery fame wrote this article that explains this in detail. Looks very powerful and can be used to enable and optimize many desktop app like scenarios.

Editing API in combination with a new global contenteditable attribute and something called an UndoManager that will support undo/redo for content edits. Sounds good for a totally in browser content editors and development environments.

Support for spelling and grammar checking. Some drag and drop support. Many link types to give meaning to links. This looks like part of micro-format and semantic web support.

Frankly i thought there would be more to support semantic web, linked data and such. But may be there is. This spec is a work in progress (a mess). Somewhere it mentions that the HTML5 spec has a lot of things that should have their own specs but due to the lack of volunteers to own the specs, all the stuff got dumped into one. Sad for such an important piece of work.

With all these features and many more that i could not get to, HTML5 is aiming to bring the flexibility and performance of desktop applications to the web. I am sure over time, these features will get better defined so that they can be interpreted without ambiguity. This is important to make web application work cross browser. The more I read about this stuff, the more I am convinced that this will succeed and take over many of the scenarios that are served by Flash, AIR and SilverLight. There might still be some applications where these will stay relevant. But if you are looking for state of the art presentation and interactivity, HTML5 will do it.

Developer’s Woes

jQuery and prototype are two JavaScript libraries with a lot of promise. Take a look at jQuery Tools and what HTML4 can do. With HTML5, and all the new functionality, javascript now is a very complicated (and powerful) language. How do you handle your application complexity and size as yout javascript, HTML and CSS grow in size? There are many solutions for managing HTML like ASP.NET, JSP, PHP etc. How about your javascript? How would you unit test your javascript code? How would you make it reusable?

One option is to go with libraries like prototype and jQuery that make it very easy to do common tasks and if you buy into their design philosophy, its a very compelling was to design your code as well. JavaScript is powerful and elegant. If you know how to write good code that is. Its not very difficult to get a hang of it. The advantage of this approach is that you stay true and antive to JavaScript and can take advantage of the fast pace of innovation here. The prototype library has unit test support too. Never used it myself though.

The other option is to use the “more evolved” server side languages like java or C# to write your code and use a cross compiler to generate the javascript for you. GWT (Java -> JavaScript) and Script# (C# to JavaScript) are good examples here. There are also other cross compilers like pyjamas (Python -> JavaScript), Objective-J (Objective-C to JavaScript). The advantage of using one of these approaches is that you could use your native language (if its not already JavaScript). You can use a development enviroment that is built to handle size and complexity (like Visual Studio/Eclipse etc) and take advantage of features like intellisense and refactoring tools. You could also leverage existing unittesting frameworks. Quite a few complex AJAX applications on the web today use this model.

If HTML5 is to replace the other desktop application development technologies, it would need more powerful and integrated tools so that you dont have to be an expert to develop these apps. The HTML WYSIWYG tools have to evolve to support not only the language features but also the common patterns so well designed software will be easy to do.

Will update this blog with more exciting HTML5 thingies as I find them.

June 14, 2009

Ubiquitous, rich and real-time – the future of communication

Filed under: Social Networking — Ravikant Cherukuri @ 11:47 pm

Communication models on the internet have been changing at a good pace in the last 20 years. This pace has picked up quite a bit lately and a new model is emerging. Based on federation, content models and the real time web, the near future promises great innovations in this space. Faster machines, unlimited bandwidth, cloud computing and the faster more agile mindset of developers will hasten these changes and the gold rush is on to get there. Social streams, Google wave, web integration of instant messaging, Open ID/OAuth are all driving this change. Google Wave in particular is ambitious in getting to the next paradigm. Some of what I discuss below is already in the realm of what google is going to release this fall.


Ranging from BBS, news groups, e-mail, instant messaging, SMS and lately social streams (from myriad social networking sites), each of these services provide some niche service and have their entrenched users. More users are using more than one of these services and the integration between these is in minimal and piecemeal. Aggregation services (like friendfeed) are going to play a big role going forward. Unifying these means of communications so they seamlessly inter-operate is a challenge not yet solved though many companies worked on this for decades. With web (HTML / HTTP / javascript) being the point where different technologies and platforms merge and gel together, this is more possible than ever. The race to be the frond-end where the users look to consume all this data is on. Facebook, friendfeed, google, microsoft are all looking to retain their users by bringing in all the users data into their presentation realm. Federation with other services is imperative in this more nuanced version of walled gardens.

Being ubiquitous is being able to roam your identity across several technologies (this is becoming a reality with wide spread acceptance of OpenID/OAuth). But perhaps more subtly, it also means that your conversations roam with you. For example, you should be able to continue a conversation you are having on a blog, on twitter. You should be able to reply to a comment on your facebook wall via an instant messaging conversation window. Taking this one step further, you should be able to take your conversation into the context of any web resource that you are browsing. Being ubiquitous also means that on my cell phone I have just one app that gets me all my communications. One push interface that gets me my information from all my channels. One UX for me to look at everything.


What else can make this communication bus more effective? Most information shared and exchanges in the older models (like e-mail) is mostly text with some urls and images. The tools to express yourself are evolving. I can share much more than text and links in my facebook feed. I can share richer application specific posts with which my friends can interact easily. Still what I can share and how easily I can do it depends on a lot of factors. Being able to share from desktop applications and web sites to any of my streams at different venues (hotmail in-box, facebook news-feed, twitter feed etc) easily and uniformly  is the next big step. What I am sharing has to be decoupled from how I am sharing it. That is, the technology that knows about what I am sharing needs to inter-operate with the technology that is the transport that gets the shared information from one place to another. With the complexity and richness of today’s (and tomorrows) apps and the simplicity of SMTP. Being able to embed objects into conversations, gives context and enhances the value of the communique to the users.

Another aspect of richness of communication is persistence and the conversation object model. Let me explain a bit here. Lot of today’s technologies are basic transports. They just get the information that you share from you to your buddy. Most of the semantics of the conversation is lost. Today, in case of email, you can  manually organize your mail into folders and make it easy on yourself.  Gmail broke this paradigm by making mail searchable (which works very well for me). Google Wave promises to bring a wiki like collaboration model to the mix. IMHO, this is a big step. Being able to have the information exchanged in a group conversation automatically available and organized for future reference is big. Today, most of the technologies, model the user and the buddies as entities that you can interact with. But conversation content is not treated that well. Its mostly treated as blobs passed between users. If the semantics of this conversation is understood and included in the object model of the application, a lot of possibilities open up. Semantic web itself is progressing (with micro-formats and now common-tags) in the direction of empowering the users to define semantic concepts within the context of their content and link it to the greater web. The same could be done with conversations too. Tools for the participants to organize the ideas and concepts in discussion would help.


In most of today’s communication applications, real time integrates as an after thought. Like integrating instant messaging into e-mail. You still have to distinct apps here, its just that you can access them from one UX. This is often overlooked because a few minutes delay (as in e-mail) doesn’t sound that bad for most conversations. Also, we are used to technologies that poll and pull data for us. But the real time aspect with protocols like XMPP are the new gold standard for responsiveness and interactive nature of collaboration. Being able to collaboratively edit a document (floor-plans/blueprint/health records) and see each others changes in real time, reliably is vital. As important is the ability to preserve this data in the rich conversation repository that can be edited and enhanced later.

There are several products today that provide a complete stack for real time collaboration. The problem with these is that they are constrained to their domain and are not built for building upon. They are not built as a platform. By far, XMPP has evolved into the most extensible and standards based real-time transport. This should be treated as TCP was for the network and as HTTP was to the web. Application level protocols are being built on top of XMPP as extensions (jingle is an XMPP extension for voice for gtalk). Rich and real-time should be built on such standards based stack to be able to scale and federate with the myriad technologies and social networks.

Real-time takes some effort out of following all the information you are interested in on the web. You can rest assured that if some thing happens you will get to know and you will get to know as soon as it happens. You can put the tired F5 key to rest. Real-time also takes some effort out of what the services need to do to keep you updated. Polling takes a lot of resources, especially at web scale. Imagine 50 million people each following a 100 object on the web and each want to know the moment they object changes. Here is an interesting presentation of how friendfeed crawled flickr 3 million times for 45000 users, only 6K of whom were logged in.

Older Posts »

Blog at