Category Archives: Musings

Things I am thinking about.

Bank websites

It boggles my mind that the banks haven’t figured out how to make their websites more useful than they are. Why are these sites limited to online versions of bank tellers?

Here’s what I want my bank’s site to provide:

  • Allow me to categorize or tag every transaction. I want to mark that Subway is “Eating out”.
  • Learn from previous transactions and automatically suggest classifications for me. If I marked “89328374 Ontario Inc.” as car repairs two months ago there is a good chance it goes into the same category this month.
  • Summarize the totals for each category and show trends. Am I spending more on eating out every month?
  • Allow me to set targets or thresholds for each category and send me notifications if I cross them. If my budget is $300/month for eating out I want to know when I’m beyond that.

The banks already have access to all of my account activity and I don’t want to provide a third party, especially one outside of Canada, with my online banking credentials to get this functionality.

Why would a bank build this?

  1. It provides an incentive to move all of your accounts to one bank.
  2. I’d switch banks to get these features so I’m sure others would too.

End-to-end in standards and software

Two things. Both relate to Microsoft but that is just by coincidence.

The first

Apparently IE8 will allow the HTML author to specify the name and version number of the browser that the page was designed for. For example, the author can add a meta tag that says essentially “IE6”. IE8 will see this tag and switch to rendering pages like IE6 does. Apparently this came about because IE7 became more standards compliant thereby ‘breaking’ many pages, especially those on intranets which require the use of IE. The new browser version tag will allow MS to update the browser engine without breaking old pages. As a result they will be forced to maintain the old broken HTML rendering engine (or at least its behavior) for a very long time. This will consume development resources that could otherwise be put into improving IE. It will also increase the size, complexity and undoubtedly the number of bugs. As for the pages broken by newer more standards compliant browsers, what is their value? Any information in a corporate Intranet or otherwise that has value will be updated to retain its value. If no one bothers to update the page is was probably nearly worthless anyway. Also, most of the HTML pages now in use are generated by a templating system of some kind. It’s not like each and every page will have to be edited by hand.

The second

The Linux kernel development process is notorious for improving (breaking) the kernel’s internal driver APIs. This means that a driver written for version 2.6.x might not even compile against 2.6.x+1 let alone be binary compatible. This of course causes all kinds of trouble for companies not willing to open source their drivers. However, the advantages of this process are huge. It is completely normal that during the development process the author will learn a lot about how the particular problem can be solved. By allowing the internal APIs to change the Linux kernel development model allows the authors to apply this new found knowledge and not be slowed down by past mistakes. As I already mentioned this causes problems for binary only kernel drivers but if the product has value the manufacturer will update the driver to work with the new kernel release. If it doesn’t have value the driver it won’t get updated and the kernel doesn’t have to carry around the baggage of supporting the old inferior design. How does this relate to Microsoft? From Greg Kroah-Hartman:

Now Windows has also rewritten their USB stack at least 3 times, with Vista, it might be 4 times, I haven’t taken a look at it yet. But each time they did a rework, and added new functions and fixed up older ones, they had to keep the old api functions around, as they have taken the stance that they can not break backward compatibility due to their stable API viewpoint. They also don’t have access to the code in all of the different drivers, so they can’t fix them up. So now the Windows core has all 3 sets of API functions in it, as they can’t delete things. That means they maintain the old functions, and have to keep them in memory all the time, and it takes up engineering time to handle all of this extra complexity. That’s their business decision to do this, and that’s fine, but with Linux, we didn’t make that decision, and it helps us remain a lot smaller, more stable, and more secure.

So what was the point?

I don’t know what to make of these two little stories but the later has been bothering me for some time. Where does the responsibility for dealing with change belong? The Internet has taught us that we should push as much work as possible to the ends of the network. The alternative is rapidly growing complexity and inflexibility in the core. It seems to me that this applies to both of the situations I outlined here as well.

EBay and voice service

So EBay thinks that voice calls will be free in the future because calls will be subsidized by advertising. This was their justification for paying way too much for Skype. How they came to this conclusion is beyond me. If anything current trends seem to me to indicate that consumers will use whatever technology they can to avoid ads.

An obvious example is the success that Google has enjoyed with Adwords. Google’s Adwords advertising system is far less intrusive than the previous favourite Internet advertising mechanism, the graphical banner ad. The fact that there are many pieces of software available whose sole purpose is to block banner ads provides another example.

The growing success of PVRs that make it quick and easy to time shift content and skip commercials also shows this trend. One of the main reasons I hear from people for downloading TV shows instead of watching them on normal TV is that it allows them to skip the commercials.

I can’t help but wonder, and hope, that we are entering an era when the Internet has reduced distribution costs to the point that news and even entertainment content will no longer need to be subsidized by advertising. At present advertisers have a lot more control over the content than most people would like to believe. News outlets may be hesitant to report something that is critical of a major advertising customer. Some TV shows have been canceled not because of lack of audience but because advertisers decided they didn’t want to buy ads during the show.

Personally, I will be quite happy to continue paying for my voice service if it means I don’t have to listen to an ad before making a call. Maybe someday I will also be able to pay for a TV show with money instead of my free time.

Coping with human error in the router world

The November 2004 issue of ACM Queue contains an article entitled Coping with Human Error in IT Systems by Aaron B. Brown of IBM Research. This article got me thinking about how modern routers cope with human errors.

One of the first nuggets of knowledge comes early in the article.

Human error happens for many reasons, but in the end it almost always comes down to a mismatch between a human operator’s mental model of the IT environment and the environment’s actual state.

This statement is as obvious as it is important. From my experiences managing large, complex networks it couldn’t be more true. Thinking about it, I can trace almost all of my mistakes that have caused service interruptions to not properly understanding the network topology or even the hardware and software involved. This idea also emphasizes how important it is that the people working together to manage a network or any other complex system maintain close contact and always communicate changes to ensure each individual understands the current state of the system.

The article discusses four approaches for coping with human error: error prevention, spatial replication, temporal replication and temporal replication with re-execution. Error prevention in this context refers to better training of the error prone humans as well as tools that reduce errors. Spatial replication involves having multiple copies of the data; think RAID here. In temporal replication the system state is replicated in time. For example saving the system state every five minutes would provide temporal replication. Your daily backups (you do daily backups right?) are temporal replication. Temporal replication with re-execution adds the ability to replay the changes that have happened since the last replica was saved to recover from human errors.

While discussing error prevention the author says:

A good example of this error interception can be seen in the way that many e-mail clients can be configured to batch and delay sending outgoing mail for several minutes, providing a recover window during which an erroneously or rashly sent message can be recalled, discarded, or edited. This and similar buffering-based strategies are particularly effective because they leverage the human ability to self-detect errors: psychologists report that 70 to 86 percent of errors can be detected immediately after they are committed, even if they cannot be anticipated.

This paragraph got me thinking about the differences in the command line interfaces (CLIs) used by the various router vendors.

By far the best known and understood router CLI is from Cisco IOS. This interface has so much momentum that many vendors basically copy the IOS CLI and proudly state that their CLI interface is almost identical to IOS. Foundry is one such vendor. The IOS CLI goes completely against the above quoted paragraph. Any issued command is executed immediately. On many occasions I have been bitten by this behavior. There is nothing worse than hitting the enter key on command that changes an interface IP just as you realize that you fat fingered the IP address. More complex changes, such as modifications to network routes, also seem to have a way of becoming obvious problems just as you hit enter. The instant apply CLI also has a nasty way of making two inter-dependent commands very difficult to execute.

Contrast this with the CLI used on the Alteon application switches. Changes made in this CLI do not take effect immediately. At any point the ‘diff’ command allows the operator to see all of the pending configuration changes. The ‘apply’ command makes the pending changes take effect. The article goes on to illuminate the primary problem that I have with delayed apply CLIs like the Alteon’s:

Error interception can also create confusion by breaking the immediate-feedback loop that people expect in interactive scenarios – imagine the havoc that a two-minute command execution delay would cause for an operator working at a command line to troubleshoot a system outage.

On many occasions I have issued commands to the Alteon switch and waited for them to take effect having forgotten to run ‘apply’. Whether this is simply because I too have become accustomed to the IOS way of doing things I do not know. On reflection, the ability to use the ‘diff’ command to preview all pending changes has probably prevented some of my errors from causing operational problems.

Even if we assume that the non-instant apply CLIs do prevent some errors from becoming operational problems there are network changes that no amount of previewing will prevent. These errors are not typos or the occasional brain-dead moment but changes that interact with other systems in unexpected ways. To prevent errors of this type some form of replication is required. In this respect, the router vendors do not seem to be very advanced. If a command is executed that makes the router unreachable from all other nodes on the network there are only two options: connect a console cable or power cycle the device. Connecting a console cable isn’t all that hard if you happen to be at the same physical location or if there is some from of out of band access to the console port. A good example of console port out of band access is the console servers produced by Cyclades. By having one of these units at each location with an attached modem the administrator can dial-in to diagnose and repair the problem remotely. Of course, this requires the existence of a separate network for out of band access. With the convergence of the IP and PSTN networks I wonder where this out of band access will come from in the future.

For now lets assume there is no out of band communication to the device. How can we recover from router configuration mistakes? A common network administration practice is to save the known good configuration to the device’s flash memory and then schedule the device to reboot after some short time interval; usually this time period would be five minutes or less. At this point changes can be made. If the changes are successful the scheduled reboot can be canceled. If the changes were not successful the scheduled reboot will bring the network back to a functional state but will result in a temporary loss of service. This method gives network administrators a crude form of temporal redundancy. It is possible that spatial redundancy (having multiple links and routers serve each customer) will hide the fact that the router was temporarily out of service. However, spatial redundancy can often be prohibitively expensive in the network world. Good routers and bridges are not cheap and neither is burying new fibre.

So what can the router vendors do to cope with human errors?

One possibility is the addition of a low level administrative interface that operates at the link layer between routers. Such an interface allows communication with the effected node from an adjacent node as long as the link layer is still operational. I have seen this feature on some business class DSL shelves and modems. Though useful in many situations, a link layer administrative interface does not allow the administrator to recover from changes that negatively effect link layer connectivity. In reality, this is just another form of out of band access anyway.

Even when out of band access methods exist, human intervention is still necessary for the system to recover. Something more automatic is required.

One possibility is leaving the CLI world and using some form of administrative client program. This would allow for a more human error tolerant communication channel between the administrator and the router. For example the router could respond to every command with a ‘command completed’ message sent to the client. The client would then acknowledge this message. If the router does not receive this acknowledgment in a set amount of time the change is then automatically reversed restoring connectivity between the administrator and the router. I know of no system that implements this idea but I wouldn’t be surprised if it has been implemented somewhere.

Another option is to have the router take a snapshot of current network traffic and other operational statistics immediately before a command is executed. If these operational statistics change negatively after a configuration change has been applied the router could automatically undo the change. The biggest problem with this approach is defining exactly what would constitute a negative effect.

Both of the above solutions have some merit but the solution I am most fond of is adding a feature that simply allows all changes to be reversed after a set amount of time. This is very similar to the scheduled reboot approach that was discussed earlier. The advantage of automatically reversing the change is that it would not have all of the negative effects of a complete system restart. Most modern, high-end network equipment has the ability to function as both a router and a bridge. This allows the same physical interfaces that carry routed layer three packets to also be carrying VLANs (layer two bridging). In many situations, configuration changes can effect the ability of the device to route packets but not effect the forwarding of layer two packets. The most common example of this is making a typo when entering the IP address during a device re-addressing. The scheduled reboot feature will allow for recovery in this situation but it also means that the forwarding of the VLAN traffic stops. If it were possible to schedule a short interval after which the router undoes any changes that have been applied, the interruption of layer two forwarding could be avoided. I expect this feature would be more difficult to implement reliably that it at first appears. Many commands could be issued after the undo has been scheduled. Not all of them can be undone in the opposite order from which they were applied. Some kind of command dependency data may be required to compute a safe set of commands to return the device to the previous state. In situations where no safe undo commands could be calculated the router could simply fall back on rebooting. Perhaps this feature has been implemented in a router model that I have not yet had the opportunity to manage.

Like the rest of the IT world it appears that router vendors have a long way to go before their products can cope with human errors automatically.

A common archive format for web forums and email lists?

Here’s a little wish list idea for someone with more time than I to work on.

Since the idea came from the use of Usenet it is probably best to start with a short description of what exactly Usenet is. Usenet is a method for large groups of people to communicate about particular subjects. These discussion groups are divided into hierarchies, similar to how domains are divided. For example, the comp hierarchy contains comp.os.linux.advocracy, comp.os.solaris etc. Whatis.com has a definition of Usenet that may be useful. Anything you can possibly imagine, and more, is discussed on Usenet. In the earlier days of the Internet Usenet was the primary place for technical discussions. Unfortunately, this has changed as more and more people use email lists and web forums.

Google maintains a huge archive of Usenet posts going back many years. They claim to have over 1 billion messages in their archive. Using groups.google.com you can search this archive. Anytime I have a technical question, particularly for programming and networking problems, I always start by searching Usenet. The main reason for this is the fact that all discussions are archived in such a way that you can always see the entire thread and easily move between messages. This is particularly useful when searching for a question. Finding a post that asks the same question is useless if the associated replies that may contain a solution cannot be found. Try a search for “aes vs twofish” at groups.google.com. Clicking on any one of the results will allow you to view the entire discussion thread.

Fast forward to 2005. As the technical abilities of the average Internet user has dropped discussions have moved from Usenet to mailing lists and web forums. This change is happening because users already understand their email client and web browser and have little desire to find a Usenet client or discover the Usenet features of their email client. The problem with this trend is that finding information is now much harder. Try the “aes vs twofish” search with the Google web search. The first result I get is a message called “AES256 vs Twofish performance (Was: twofish keysize)”. This is a email that was sent to the GnuPG users mailing list. Once you follow the link Google can no longer help you. You are limited to whatever features the mailing list archive offers. Some mailing list software provides decent search features but most do not. Web based forums are usually even more difficult to use. Many are ugly, slow and certainly do not present a consistent interface across archives that would make finding information easier.

In order to bring these discussions back into a form where search engines can do what they do best we need a mailing list and web forum archive format. Search engines could pull the archives for each list or forum and present a consistent interface like groups.google.com does.

So that is the task I set out. Define a discussion archive standard and convince all web based forums and email list software providers to support it. The search engines will follow soon after.

History

I recently completed a summer term history course. The title of the course was Europe 1715 to present. Wow, I actually thought I had a clue about European history before this course. Was I ever wrong. The amazing mess that was Europe in the 1800s brings the current problems in other areas of the world into perspective. I think it’s a little too easy for people who grew up in ‘modern’ countries to think that our society has always been as stable and sane as now (or at least as stable as we think it is). This course showed me that this is certainly not true.

Also, I found the Enlightenment to be particularly interesting. The fact that ideas that we now take for granted such as people should be ruled by laws not rulers, equality of all people and the concept of individual identity come from only ~225 years ago (1750->1800 mostly) is amazing to me. These are just a few examples of Enlightenment ideas that are now central to our liberal (in a classical sense) societies.

The French revolutions of 1789 were the first major political events that centered around Enlightenment ideas. The backlash against this revolution resulted in the European Congress system that was designed to put the lid back on and restore Europe to pre-1789. That it took until the end of World War I for the ideas of the Enlightenment to come to the forefront of European politics is very alarming.

I can’t stop myself from seeing similarities between the Enlightenment and the current conversations about intellectual property. There were many entrenched interests who did their best to stop the ideas of the Enlightenment. These ideas were so powerful that even a hundred years of crushing attempts could not make them go away. The Internet and other digital technologies have fundamentally changed our world. The law hasn’t caught up to this fact yet. Companies and individuals who profit under the old system of scarcity and control are doing their best to make sure the law never does catch up. Sounds a lot like the last major intellectual revolution western society went through. I am pretty confident how this will end. The question is, do we need another century of innocent people getting jailed or worse before we see the end of the tunnel?

I need to be a little careful here. As my history professor said, despite popular belief history does not actually repeat itself. That doesn’t mean there are not any lessons to be learned though.

Police fund raising

Yesterday I got a call from the Police Association of Ontario. It was basically a telemarketing call as they were trying to raise funds. This really bothers me for a couple of reasons.

I always find it very intimidating when the police associations or the fire department equivalent call looking for money. As a society we definitely owe these people something. They are the reason why we have law and order (not the show) and good emergency response. However, they are also in a position that makes them very intimidating. What happens if I don’t give money to one of these groups? Does my name get put on a little list so next time I am speeding I get a ticket instead of a warning? Will the fire department be just a little bit slower getting to my house if there is a call? Sure, all of these ideas sound pretty far fetched. What if the person calling is not a hired sales droid but is an off duty police officer who lives down the street? Or a off duty fireman who recognizes the address on the emergency radio as the address of the person who was rude to him on the phone last night. Perhaps these feelings are more acute for me because I grew up in a small community where if the local departments did the calling these coincidences could easily come true. These feelings may not be based on any kind of logic but they do exist, at least for me. Personally, I have a lot harder time saying no to the police and fire department telemarketing fund raisers than I do any other group. In fact, yesterday might have been the first time I actually did say no to them and that’s because of my second point.

One of the reasons they were asking for donations is to lobby for law changes. The pitch went immediately to sex offenders. “Do you watch the news? Yes. Then you have heard about the recent problems in Brampton.” For those outside of Ontario he was referring to this. The nice tele-sales guy then preceded to explain that the laws are too lax and that criminals who are released just re-offend so the law should be changed to keep them in jail. Another example of the Police Association of Ontario’s politics can be found on their website; the little ‘Club fed’ logo. Obviously, they think the prison system is too easy on criminals. Police officers are given a special role by our society. The role of upholding the law. With this role comes respect and power. By using this position to raise funds and lobby for changes to the legal system they exert unfair influence on our society. It is not the role of the police to make laws, we have the elected officials and the judiciary to do that.

I’m sure the Police Association believes what they are doing is correct and it’s probably not illegal. Whether they realize it or not they abusing the power that their role in society gives them.

Why are new laws our first reaction?

For those who are new to the story last year there was a horrible abduction and murder of a young girl in Toronto named Holly Jones. See CBC — Holly Jones for a time line and some background information.

On June 16th the trial of Michael Briere, the accused murderer began. He pleaded guilty. The twist to all of this is that in he plea he explained how he was looking at child pornography on the Internet the same day he abducted and killed Holly. CTV News has an article covering this. As expected this admission has resulted in calls for new laws to punish people who possess child pornography.

Canada already has laws that cover child pornography. If Briere would have been caught with these materials on his computer or viewing them on the Internet he would have been punished by the justice system. How would stricter laws have saved Holly when the enforcement of existing laws failed to find and punish Briere?

Especially troubling are the people who believe that ISPs should be filtering all ‘bad’ content. Obviously child pornography is bad but where is the line drawn? Coming from the technical side of things it’s also pretty much impossible. Good old fashioned police work is whats needed. We need police forces capable of working with new technology and not new laws that are unenforcible.

Easy to what?

Is the term “Easy to use” in the computer user interface (UI) world overloaded? I am starting to think so. Before I go any further take note that the Unix shell is a UI. Graphical UIs (GUIs) are generally considered to be the easiest way to use a computer. But are they? Here is a list of command shell steps to change the email address that gets root’s email on a Fedora Core 2 system:

  • su to root.
  • cd /etc.
  • vim aliases
  • Search for root: (/ is the search key in vim)
  • $ to go to the end of the line.
  • bdw to delete the last word on the line (this is the email address or account name).
  • A to enter insert mode.
  • Type the email address, then press ESC
  • :wq to save the file and exit vim.
  • newaliases to update the aliases database.

That seems like a lot but I timed myself and I can easily accomplish all of these steps in under 30 seconds. What could be easier? I doubt this task could be accomplished in under 30 seconds with a GUI. If you are not fluent in the Unix shell you are probably getting quite angry at me right now. “But I don’t know those commands” you say. This is where the term “easy to use” breaks down. The average computer user is not looking for easy to use. They are looking for easy to discover. The normal computer user does not care if a task takes a little longer than the optimal way. All a normal computer user cares about is the ability to easily easily re-discover the steps necessary to accomplish the task the next time they need to do it. These users don’t want to learn the skills necessary to optimally control their computer. Instead of talking about computer UIs with the term “easy to use” I think it’s time we start talking about “easy to do” and “easy to discover”.