Thoughts on technology and social web

October 6, 2009

Service federation in the cloud – PushButton or Google Wave?

Filed under: Real time web, Social Networking — Tags: , , — Ravikant Cherukuri @ 6:54 pm

PushButton technologies – Pubsubhubbub (PuSH), Webhooks and ATOM , provide a lighter weight and web friendly alternative to a more heavy weight alternatives that Google wave uses. Mainly for sever to server federation. On a high level, this seems true as the pshb uses http posts to communicates vs persistent connections of xmpp. Seems more web friendly and easier to implement subscribers and publishers on a language of your choice.

Disclaimer: On one level, pubsubhubbub and wave are solving different problems. But at a higher level both implement real time federation with pub-sub. My attempt is to evaluate wave as a vehicle for service federation. I actually think that wave is a great idea but its too heavy for the web as we have it today. Web is evolutionary wave is too drastic and revolutionary.

What makes wave compelling?

There are several interesting aspects. The first is the shell (the UX). The presentation is like e-mail but is more real time. It takes a paradigm that everybody understands and then extends it. The second is the extensibility. With gadgets and robots you could extend the wave system. With embedding you could extend the way waves are visualized and consumed. The third is federation. Federation makes it possible to have different providers who own their own data to communicate and make it possible for their user bases to work with each other. This kind of federation is not new. FriendFeed inter ops with many sites like flickr using SUP. Many IM networks inter operate too. But wave federation tries to standardize that to provides for XMPP like universal federation where you dont have to know or work with all the providers to federate with them.

What makes Wave complex?

Wave tries to solve a bunch of problems to make real time persistent communication possible. The protocol that wave uses to federate dictates that all federated partners maintain complete copies of objects even though the object is owned by one of them. Operational transformation(OT) dictates that all the operations on the document are stored along with the latest copy of the document. When a user from a federated provider joins a wave, the provider gets a copy of the wave in terms of the operations. All these features provide for an e-mail like decentralization to data. Wave builds on top of this with real time collaborative document editing. All these make the protocol incredibly complex.

The wave protocol imposes heavy requirements of what a remote partner has to implement. The protocol data model (which has to implement OT) would require a third party to implement state in terms of operations on it. That is, you will have to keep track of not only the current state of teh wave but also the history operations that made it so. This is not a pattern that services normally use. For a large scale service inter-op, each partner has to maintain a full copy of the wave. This doesn’t sound very feasible at web scale. Imagine an enterprise wave user joining a large public wave. Now suddenly my enterprise wave server needs to handle this barrage of updates.

Could it be simpler?

Activity streams is an effort by several people involved in social networking to standardizing protocols for different networks to inter-operate. The great thing about activity streams is that it defines the protocol for expressing activities without defining how the providers should implement their services. This is how web is today. We cant (and should not try to) anticipate how other would use a protocol/service. Just define the interface and leave the rest to the rest.

PuSH with ATOM is more web friendly than pub-sub protocol than XEP-0060 that wave uses. Simple HTTP POSTs to make and break subscriptions and receive notifications makes the interface very natural. Can a collaborative data sync algorithm like OT be implemented on top of PuSH or is it even necessary to achieve federation? As the format of the data going over PuSH is ATOM, services could exchange ATOM items to synchronize, the world would be much simpler.

Without OT, if the federating providers want to maintain copes of data that is actually owned by another provider, there is still the complexity of synchronization. In most cases, this level of sync is not required. Its only for when an item of small granularity if being edited simultaneously by multiple people. If the systems can identify conflicts and either overwrite items or prompt users to correct, it could be acceptable for most cases. The FeedSync protocol that Live Mesh uses works this way. Your content is a feed of items and items are synchronized using a simple algorithm and conflicts are flagged.

Combining Activity Streams with PuSH and FeedSync can provide a much simpler, more web like infrastructure for a universal federation model that is easy to build on and participate in. Federation is nothing but mashups with a business model. While the mashup space delights us with innovations everyday, service federation moves at snails pace. A simpler model to federate at web scale (in size and diversity) would lead to seamless aggregation of your data from any service (that your friends might be using) to the services that you use.


September 7, 2009

pubsubhubub and rss-cloud : changing the way you read the web

Filed under: Cloud Computing, Real time web, Social Networking — Ravikant Cherukuri @ 11:39 pm

Today WordPress declared support for RSSCloudPubsubhubub was adopted by blogger/ google reader a few days back. These technologies are being adopted much faster than I expected. Much like AJAX a few of years back, the buzz is building up. Anil Dash’s posts on pushbutton talk of the potential of these technologies. The basic idea is very simple. A RESTful pubsub protocol that provides real time notifications for web content.

  • Subscribers register their interest in publisher content notifications to hubs.
  • Publishers send contents pings to the hub when new content is posted.
  • The hub reacts to the ping by fetching content from the publisher and posting the content to the subscribers.
  • All content is in RSS/Atom format.
  • Communication to and from the hub happens over HTTP using webhooks.

This has the potential to change web as we know it. It could bring Twitter-like real-time notifications to changes on all of web’s content. As an end user, this is exciting because this could finally bring (much deserved) death to the browser refresh button by chanelling all the content you are interested into streams that can be delivered to you in real time. Pushbutton with the DiSO/Activity-Streams, is great for social network federation. Activity stream’s atom format should work well with pushbutton.

Hubs federating and chaining subscriptions with each other provides a distributed and decentralized pub-sub infrastructure. This has some definite advantages over XMPP. XMPP’s XEP-0060 provides a similar generic pub-sub functionality. But still the REST based approach is simpler and more web savvy. Webhooks take the honors here with its REST based callbacks as opposed to XMPP’s connection based approach.

Add the bidirectional communication channels on web pages (using long poll today or web-sockets in HTML 5) to the mix and we have a means to deliver notifications to end users. When I am on facebook and I get a mail in my gmail account, the notification could be delivered to me through facebook.

Update : A few of good posts comparing pubsubhubub and rsscloud –

How to publish and receive blog posts in Real-time
PubSubHubbub vs. rssCloud
PubSubHubbub = rssCloud + ping ?
RSSCloud Vs. PubSubHubbub: Why The Fat Pings Win
There’s a Reason RSSCloud Failed to Catch On
The Reason RSS Cloud Can Work Now
The Web At a New Crossroads

July 17, 2009

Real time roundup

Filed under: Cloud Computing, NextWeb, Real time web, Social Networking — Ravikant Cherukuri @ 6:50 pm

Real time web has been a favorite topic for me for a while now. I work on one of the large IM systems and am very interested in these developments. Real time web started emerging as the platform for the next web in the last year it two. The main idea here is that events happening on the web are brought to you as they happen. Think the coverage that the iranian election aftermath got with twitter. The more obvious scenarios are already a reality and there are several more subtle but equally impressive usages that many are working on. As with all technologies that are driven by guy-in-the-garage start ups, its tough to see where this is all leading (if at all). Real time web focuses on several aspects of the web.

  • filtered web streams
  • instant delivery of web posts
  • real time collaboration
  • responsive web applications
  • S2S data filtering

The lack of full duplex connectivity has been a down side of HTTP. This gave native applications a one-up. HTML5 is out to correct that with Web Sockets. Meanwhile, technologies like COMET, BOSH etc provide a near real time connectivity over HTTP. So, the technology seems to be in place for the real time web. But why does real time web matter? We already have near-real-time, polling based push technologies like RSS/Atom. In my mind the answer is user experience. Real time gives the most natural experience. Compare a walkie-talkie to a telephone. A walkie-talkie is near real time with a lot of user level sync. Its just a quirk of technology. A telephone conversation is a full duplex real life conversation. Makes a lot of difference.

More links to the real time web :

Introduction to the RealTimeWeb

The Real-Time Web – O’Reilly Broadcast

Is Real-time the Future of the Web?

Building Real Time Web Applications Using HTML 5 Web Sockets

How to Deal with the Real Time Web: Navigating the River

Real time search and filtered streams

For me the coolest part of microblogging is real time search. The significance of most information rapidly fades over time. So, getting short spurts of information as and when it happens is super userful. Twitter’s hash tags are a good example. You follow information rather than people. I find the following ways to track real-time information interesting.

  • Twitter hash tags. Send something like #iwant in the tweet and people can easily track your tweets and might contact you if they are selling what you want. Gives meta data  to your tweet and makes it trackable by other users. The #iwant and #ihave tags are a good example. There are third party sites that consume the twitter firehose and build a realtime marketplace .  Such mining verticals on real time data are fast emerging. Many hugely popular twitter games like spymaster are another example.
  • Google’s “e-mail me when a new item is found that matches my search criteria” feature. This is super useful. If you know what you are searching for, you can be the first to find information as it comes live. Google can deliver this to you with in a few hours of the informations getting posted on web. This is realtively real time (compared to others) but is still a push. When this becomes true real time, it will be awesome.
  • Aggregated real time search. There are several real time search engines out there that can get you real time feed search from twitter/facebook/identica etc in one interface. Some of these have AJAX powerd interfaces and some have true real time with XMPP (collecta).

The basic idea behind this is to subscribe to receive changes data on the web without tying in to specific web sites, in real time. Eventually, the bigger search engines like google/yahoo/bing will comeup with a tighter integration of real time into all of the web that they index. With that, real time search will seamlessly integrate with the normal web search as we know it.

Instant information delivery

This category of real time applications aim to deliver your data of interest to you as soon as its posted. Sample scenarios here include

  • Deliver blog posts and comments that you are interested in instantly. This builds on the current RSS based systems by bringing true real time to content delivery.
    • : Word press now allows you to keep track of other word press blogs and comments and comments on your wordpress blog over XMPP IM with clients like GTalk.
    • Tweet.IM delivers your twitter feed over IM in real time using XMPP
    • Friendfeed aggregates feeds form all your friends blogs, social streams and comments. Friendfeed has a feature that delivers these are friend feed finds them over XMPP IM.
    • Google Wave uses XMPP IM protocol to sync wave content in real time.
    • There are a couple of startups like iNezha that provide real time updates to all the blogs that you are interested in.
  • Enable efficient content aggregation uisng XMPP. Google PubSubHubub is a good example. There were also several experiments by companies like Gnip, FriendFeed etc to use XMPP for this purpose.
  • IM systems are integrated with email systems (by the same vendor) today. You get an email and you are presented with a toast my the IM system in real time.
  • Windows Live Messenger also supports alerts from different third party providers. This is real time events from arbitrary providers with whom you have registered your interest. You can click on the windows live alerts button on my blog and get alerts onto your messenger whenever I update my blog.

[More to come in a later post]

Blog at