Tag Archives: Linux

Linux/Fedora PPPoE problems and solutions

This weekend I’ve been doing some network experimentation on my little DSL connection. I’ve learned a couple of things the hard way so I figured a quick blog post is in order in the hopes that it will save someone else time.

PPP interface errors

Over the last while my Internet connection has been a little slow. I noticed that there were occasionally packet drops but I didn’t take the time to figure out where they were occurring. The testing I was doing this weekend was very sensitive to packet loss so I had to get to the bottom of this.

There were two symptoms. The first was a bunch of log entries like the following.

Apr 19 12:03:21 titan pppoe[26690]: Bad TCP checksum 109c
Apr 19 12:10:35 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:10:35 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:10:36 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:10:36 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:24:50 titan pppoe[26690]: Bad TCP checksum 3821
Apr 19 12:31:54 titan pppoe[26690]: Bad TCP checksum 9aeb
Apr 19 12:33:22 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:33:49 titan pppd[26689]: Protocol-Reject for unsupported protocol 0xb00
Apr 19 12:33:57 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x2fe5
Apr 19 12:33:58 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:01 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:02 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:12 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x58e6
Apr 19 12:34:14 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:17 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:27 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:29 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:30 titan pppd[26689]: Protocol-Reject for unsupported protocol 0xb00
Apr 19 12:34:31 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x800
Apr 19 12:34:33 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x0
Apr 19 12:34:36 titan pppd[26689]: Protocol-Reject for unsupported protocol 0x7768

The bad TCP checksum entries hinted at some kind of packet corruption. However, I didn’t know if this was coming from packets being transmitted or received. Since I don’t know the inner workings of PPP as well as I’d like, the Protocol-Reject messages were harder to get a handle on. I grabbed a capture on the Ethernet interface underlying ppp0 so I could look at the PPP messages in Wireshark.

PPP Unknown protocol

Suspect PPP message

My PPPoE client sent a message with the protocol field set to 0. Wireshark doesn’t know what 0 is supposed to mean.

PPP reject

PPP rejection message

And the remote PPPoE device is sending a message back rejecting the transmitted message. And it’s even nice enough to return the entire payload thereby wasting download bandwidth as well. From this packet capture I became pretty confident that the problem was on my end not the ISP’s. After this I wasted a bunch of time playing around with the clamp TCP MSS PPP option because the data size in the above messages (1412) matched clamp TCP MSS setting in my PPP interface configuration file.

The second symptom was a large number of receive errors on the ppp0 interface – the underlying Ethernet interface did not have any errors. Opposite to the PPP errors above, the receive errors made it look like the problem was in the PPP messages being received by my PPPoE client.

After several unsuccessful theories I finally figured out what the problem was. The PPPoE implementation on Linux has two modes: synchronous and asynchronous. Synchronous mode uses less CPU but requires a fast computer. I guess the P3-450 that I use as a gateway doesn’t qualify as fast because as soon as I switched to the asyncronous mode all of the errors went away.

Fixing the problem was good but this still didn’t make sense to me because I’ve been using this computer as a gateway for years. Then I discovered this Fedora bug. It turns out that Fedora 10 shipped with a version of system-config-network which contained a bug that defaulted all PPPoE connections to synchronous mode. This bug has since been fixed and pushed out to all Fedora users but that didn’t fix the problem for me because the PPP connection configuration was already generated.

In summary, this was a real pain but I did learn more about PPP than I’ve ever had reason to in the past.

Dropping PPP connections

Some of the experimentation I’ve been doing this weekend required completely congesting the upload channel of my DSL connection. I don’t just mean a bunch of TCP uploads; this doesn’t cause any problems. What I was doing is running three copies of the following.

ping -f -s 1450 alpha.coverfire.com

This generates significantly more traffic than my little 768Kbps upload channel can handle. During these tests I noticed that occasionally the PPPoE connection would die and reconnect. Examples of the log entries associated with these events are below.

Apr 19 20:02:31 titan pppd[15627]: No response to 3 echo-requests
Apr 19 20:02:31 titan pppd[15627]: Serial link appears to be disconnected.

Since I had already been looking at PPP packet captures in Wireshark I recognized the following.

PPP echo

PPP echo

It appears that too much upload traffic causes enough congestion that the PPP echos fail and the PPP connection is dropped after a timeout. I would have thought the PPP daemon would prioritize something like this over upper layer packets but nevertheless this appears to be the case. For the purposes of my testing this problem was easy to avoid by modifying the following lines in /etc/sysconfig/network-scripts/ifcfg-INTERFACE. I increased the failure count from 3 to 10.

LCP_FAILURE=10
LCP_INTERVAL=20

Magazine titles and operating systems

I little while ago I was standing in front of the computer magazine section at my local Chapters when I noticed something interesting. There were three magazines with “Windows” in the title, three with “Mac” in the title, and four with “Linux” in the title. Of course this is hardly statistically significant in terms of the magazine industry as a whole but it does show how Linux is becoming much more mainstream.

End-to-end in standards and software

Two things. Both relate to Microsoft but that is just by coincidence.

The first

Apparently IE8 will allow the HTML author to specify the name and version number of the browser that the page was designed for. For example, the author can add a meta tag that says essentially “IE6”. IE8 will see this tag and switch to rendering pages like IE6 does. Apparently this came about because IE7 became more standards compliant thereby ‘breaking’ many pages, especially those on intranets which require the use of IE. The new browser version tag will allow MS to update the browser engine without breaking old pages. As a result they will be forced to maintain the old broken HTML rendering engine (or at least its behavior) for a very long time. This will consume development resources that could otherwise be put into improving IE. It will also increase the size, complexity and undoubtedly the number of bugs. As for the pages broken by newer more standards compliant browsers, what is their value? Any information in a corporate Intranet or otherwise that has value will be updated to retain its value. If no one bothers to update the page is was probably nearly worthless anyway. Also, most of the HTML pages now in use are generated by a templating system of some kind. It’s not like each and every page will have to be edited by hand.

The second

The Linux kernel development process is notorious for improving (breaking) the kernel’s internal driver APIs. This means that a driver written for version 2.6.x might not even compile against 2.6.x+1 let alone be binary compatible. This of course causes all kinds of trouble for companies not willing to open source their drivers. However, the advantages of this process are huge. It is completely normal that during the development process the author will learn a lot about how the particular problem can be solved. By allowing the internal APIs to change the Linux kernel development model allows the authors to apply this new found knowledge and not be slowed down by past mistakes. As I already mentioned this causes problems for binary only kernel drivers but if the product has value the manufacturer will update the driver to work with the new kernel release. If it doesn’t have value the driver it won’t get updated and the kernel doesn’t have to carry around the baggage of supporting the old inferior design. How does this relate to Microsoft? From Greg Kroah-Hartman:

Now Windows has also rewritten their USB stack at least 3 times, with Vista, it might be 4 times, I haven’t taken a look at it yet. But each time they did a rework, and added new functions and fixed up older ones, they had to keep the old api functions around, as they have taken the stance that they can not break backward compatibility due to their stable API viewpoint. They also don’t have access to the code in all of the different drivers, so they can’t fix them up. So now the Windows core has all 3 sets of API functions in it, as they can’t delete things. That means they maintain the old functions, and have to keep them in memory all the time, and it takes up engineering time to handle all of this extra complexity. That’s their business decision to do this, and that’s fine, but with Linux, we didn’t make that decision, and it helps us remain a lot smaller, more stable, and more secure.

So what was the point?

I don’t know what to make of these two little stories but the later has been bothering me for some time. Where does the responsibility for dealing with change belong? The Internet has taught us that we should push as much work as possible to the ends of the network. The alternative is rapidly growing complexity and inflexibility in the core. It seems to me that this applies to both of the situations I outlined here as well.

scponly, rsync and Fedora

A few years ago I wrote about the backup script that I use to do daily and weekly backups of my computers. Since this script must run unattended it makes use of a passphrase-less SSH key. The SSH key in question only exists on my main workstation and is used to login as a user which does not own any other files. While this isn’t a big security problem it would be nice to limit the privileges of this user. To this end I started using scponly some time ago. Scponly is a restricted shell which limits a logged in user to only executing a few commands such as scp, sftp and rsync. This small set of available programs greatly reduces the chances that the user will be able to find a local exploit. Scponly is already packaged for Fedora so installing it is simple.

yum install scponly

Setting an user’s shell to scponly is accomplished with the usermod command.

usermod -s /usr/bin/scponly backup

Like any shell, scponly must also be added to /etc/shells. Just add “/usr/bin/scponly” (without the quotes) to the end of this file.

As I mentioned when describing the backup script, the script works great except for large amounts of data such as media collections. Over time my photo collection has grown to over nine thousand images and now consumes more than eighteen gigabytes of disk space. So today I decided to cron up rsync to synchronize my photos to the same location where my backups are sent every night. Unlike my backup script, rsync will only send the changes to the remote server not the entire archive.

After much debugging I discovered that the most recently released version of scponly does not work with rsync. The thread where this problem was first discussed started in March 2006. More related posts can be found in subsequent months. Fortunately the scponly authors have fixed this bug in their CVS repository so I built a RPM for the CVS version.

scponly-4.7CVS20071229-1.fc8.x86_64.rpm

scponly-debuginfo-4.7CVS20071229-1.fc8.x86_64.rpm

scponly-4.7CVS20071229-1.fc8.src.rpm

This package successfully upgrades the scponly package provided by Fedora. Hopefully these RPMs are useful to someone.

Downloading source RPMs in Fedora

The main yum executable doesn’t have an option for downloading source RPMs. Fortunately, this task is made easy by yumdownloader which can be found in the yum-utils package.

yum install yum-utils
yumdownloader --source scponly

This will leave a copy of the scponly source RPM in the current directory.

Ontario Linux Fest

This past Saturday I spent the day at the Ontario Linux Fest which was held at the Toronto Congress Centre. Despite this being the inaugural year for the event it was very well organized and I think, well attended. The number I heard was approximately 350 attendees. The most enjoyable aspect of the event was that it had a really nice community feel. Everywhere you looked there were groups of people chatting and having a good time. The only negative thing I can say is that many of the presentations were very high level. Given the broad audience this is not necessarily a bad thing but personally I was hoping for more technical detail. I really hope the organizers are able to this again next year because I’ll definitely be there.

I didn’t have a real camera along so the best I can offer is this picture of Jon ‘maddog’ Hall‘s closing presentation taken with my N800.

Picture from the Ontario Linux Fest

Torvalds interview

Q&A: Torvalds on Linux, Microsoft, software’s future

CW: Lots of researchers made millions with new computer technologies, but you preferred to keep developing Linux. Don’t you feel you missed the chance of a lifetime by not creating a proprietary Linux?

Torvalds: No, really. First off, I’m actually perfectly well off. I live in a good-sized house, with a nice yard, with deer occasionally showing up and eating the roses (my wife likes the roses more, I like the deer more, so we don’t really mind). I’ve got three kids, and I know I can pay for their education. What more do I need? . . . So instead, I have a very good life, doing something that I think is really interesting, and something that I think actually matters for people, not just me. And that makes me feel good.

Ottawa, OLS and the war museum

Arrived in Ottawa today for OLS. Managed to get in early enough to make it over to the new (2005?) Canadian War Museum. Unfortunately, there was only two hours left before close. Two hours was not nearly long enough to do the museum justice. Even if you have been to the previous war museum you should go again. The new building is gorgeous and there is lot more stuff to look at. If you like to read everything in a museum, you need to budget a LOT more than two hours.

For those new to Ottawa, walking to the war museum from OLS will take under 30 minutes.

Photo 20060718-cwm-1.jpg from the Canadian war museum
Photo 20060718-cwm-2.jpg from the Canadian war museum
Photo 20060718-cwm-3.jpg from the Canadian war museum
Photo 20060718-cwm-4.jpg from the Canadian war museum
Photo 20060718-cwm-5.jpg from the Canadian war museum
Photo 20060718-cwm-6.jpg from the Canadian war museum
Photo 20060718-cwm-7.jpg from the Canadian war museum
Photo 20060718-cwm-8.jpg from the Canadian war museum
Photo 20060718-cwm-9.jpg from the Canadian war museum
Photo 20060718-cwm-10.jpg from the Canadian war museum

RedHat summit videos

Red Hat has posted videos of the keynotes from the Red Hat summit in Nashville. So far, I have only watched two of the three videos. Both were excellent.

Eben Moglen: Discusses the philosophical and political ideas behind free software. He argues that free software is about allowing individual creativity. If you don’t ‘get’ free software you need to watch this speech.

Cory Doctorow: Provides a bit of history on copyright change and how the incumbent industries always try to stop progress. Lots of good DRM discussion as well.

There is no future in which bits will be harder to copy than they are today … Any business model that based on the idea that bits will be harder to copy is doomed. [Cory Doctorow (2006 RedHat summit in Nashville)]

I found both of these speeches to be inspiring. Free software is the start of a wider revolution. As Moglen says in his keynote (paraphrasing), it is an incredible privilege to live through a revolution.