Jabber/XMPP pubsub

Most people who know about Jabber/XMPP think of it as an instant messaging platform. Of course, that is the primary use for Jabber at present but that may not always be the case.

The Jabber/XMPP network forms an XML based overlay network. Each message or packet of information carried by this overlay network is an XML stanza. You can think of Jabber servers as being XML routers and the clients as end nodes. In fact, the instant messaging portions of the XMPP standards are defined in a separate RFC from the core XML streaming technology (RFC 3920, RFC 3921).

One example of a non-IM use of Jabber is defined in JEP-0072: SOAP over XMPP. This document specifies how SOAP, which is normally used with HTTP to form web services, can be carried on top of Jabber/XMPP.

Another interesting non-IM use of Jabber comes from JEP-0060: Publish-Subscribe (aka pubsub). Pubsub is basically an event notification system that runs on top of Jabber/XMPP. In pubsub, a user publishes some XML data to a Jabber server which supports JEP-0060. Other users are then able to “subscribe” to this node. Whenever the node changes, a notification will be sent to all subscribed users.

There are lots of interesting things that could be done with pubsub. Off hand, here are a few of examples:

  • You want to checkout a book from the local Library. Unfortunately, someone else already has the book. In order to find out when the book has been returned, you subscribe to the node that represents that book on the library’s pubsub server. Once the book is returned you will know instantly.
  • You plan on purchasing a large, expensive TV in the near future. Rather than manually looking at the websites for several major retailers every few days, you subscribe to the pubsub node at each retailer for the particular TV model you are interested in. If any of the retailers have a sale, you find out instantly.
  • If like many people you use a RSS reader to keep up with new posts on your favourite blogs, you know that RSS readers periodically poll all feeds on your list. Often there are no new posts and this polling is a waste of resources. Instead, a pubsub enabled blog could notify interested readers of a new post. Not only do you find out about the new post sooner, network resources are saved.

In all of the above examples, subscribing to the particular pubsub node could be as simple as clicking on a link (JEP-0147: XMPP URI Scheme Query Components).

Also interesting is JEP-0163: Personal Eventing Protocol which defines a subset of the full pubsub (JEP-0060) specification which can be used for simpler instant messaging related tasks such as providing current geographic location information (JEP-0080: User Geolocation) or providing contacts with information about the music you are currently listening too (JEP-0118: User Tune).

It will be interesting to see how pubsub will be integrated into other network applications such as RSS readers and Jabber IM clients. It seems likely that pubsub notifications will be handled either by a Jabber client separate from the one that is used for IM or at least the Jabber IM client will have to distinguish these events from normal IM traffic.

For a nice overview of pubsub (with pretty pictures) see Jive Software: All About Pubsub.

Blogs, search engines and WordPress

One problem with the blog format is that the same content can show up on several URLs. This content layout is nice for humans. In the case of my blog, the same post content can show up on the main page, a category URL and an archive URL.

Unfortunately, what is convenient for humans is not so good for search engines. There are two aspects of the standard blog format which cause search engine problems. The first is the dynamic nature of some blog URLs. Consider the main page of an active blog. Most only show about ten posts; older posts are removed as newer ones are created. Often this results in a particular URL not containing the content the search engine thinks it does. Personally, I find this incredibly annoying since I often have to search the site using a local search engine after Google has directed me to the main page of a blog. The second problem of the blog format with respect to search engines is that some URLs, like a category URL, contain many posts which are not directly related to a particular search. This results in having to search the page with the browser’s find function after the search engine gets you there.

Both of these problems have been annoying me for some time now. So today I did a little digging. Fortunately there is a solution, the robots meta tag. This tag specifies, on a page by page basis, whether or not the content on the current page should be indexed by the search engine and if links on the current page should be followed.

The solution then, is simple. URLs which contain multiple posts should be marked “noindex,follow” while individual posts should be marked “index,follow”. This should result in the content of each post only being in the search engine database once. I also found a post called A critical SEO Tip for WordPress which describes a way to accomplish this in WordPress. The slightly modified version of this solution which I have added to my WordPress theme’s header.php is below. Unless there are downsides to this approach that I don’t know of, I think every theme author should add something like this to their theme.

if (is_single() || is_page() || is_author()) {
echo "<meta name=\"robots\" content=\"index,follow\"/>\n";
} else {
echo "<meta name=\"robots\" content=\"noindex,follow\"/>\n";