A common archive format for web forums and email lists?

Here’s a little wish list idea for someone with more time than I to work on.

Since the idea came from the use of Usenet it is probably best to start with a short description of what exactly Usenet is. Usenet is a method for large groups of people to communicate about particular subjects. These discussion groups are divided into hierarchies, similar to how domains are divided. For example, the comp hierarchy contains comp.os.linux.advocracy, comp.os.solaris etc. Whatis.com has a definition of Usenet that may be useful. Anything you can possibly imagine, and more, is discussed on Usenet. In the earlier days of the Internet Usenet was the primary place for technical discussions. Unfortunately, this has changed as more and more people use email lists and web forums.

Google maintains a huge archive of Usenet posts going back many years. They claim to have over 1 billion messages in their archive. Using groups.google.com you can search this archive. Anytime I have a technical question, particularly for programming and networking problems, I always start by searching Usenet. The main reason for this is the fact that all discussions are archived in such a way that you can always see the entire thread and easily move between messages. This is particularly useful when searching for a question. Finding a post that asks the same question is useless if the associated replies that may contain a solution cannot be found. Try a search for “aes vs twofish” at groups.google.com. Clicking on any one of the results will allow you to view the entire discussion thread.

Fast forward to 2005. As the technical abilities of the average Internet user has dropped discussions have moved from Usenet to mailing lists and web forums. This change is happening because users already understand their email client and web browser and have little desire to find a Usenet client or discover the Usenet features of their email client. The problem with this trend is that finding information is now much harder. Try the “aes vs twofish” search with the Google web search. The first result I get is a message called “AES256 vs Twofish performance (Was: twofish keysize)”. This is a email that was sent to the GnuPG users mailing list. Once you follow the link Google can no longer help you. You are limited to whatever features the mailing list archive offers. Some mailing list software provides decent search features but most do not. Web based forums are usually even more difficult to use. Many are ugly, slow and certainly do not present a consistent interface across archives that would make finding information easier.

In order to bring these discussions back into a form where search engines can do what they do best we need a mailing list and web forum archive format. Search engines could pull the archives for each list or forum and present a consistent interface like groups.google.com does.

So that is the task I set out. Define a discussion archive standard and convince all web based forums and email list software providers to support it. The search engines will follow soon after.

One thought on “A common archive format for web forums and email lists?

Leave a Reply

Your email address will not be published. Required fields are marked *