<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule">

<channel>
	<title>Dan Siemon &#187; Internet</title>
	<atom:link href="http://www.coverfire.com/archives/tag/internet/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.coverfire.com</link>
	<description>Thoughts and musings</description>
	<lastBuildDate>Sun, 22 Jan 2012 11:20:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<creativeCommons:license>http://creativecommons.org/licenses/by-sa/2.5/ca/</creativeCommons:license>		<item>
		<title>System virtualization moves the edge of the network</title>
		<link>http://www.coverfire.com/archives/2011/11/18/system-virtualization-moves-the-edge-of-the-network/</link>
		<comments>http://www.coverfire.com/archives/2011/11/18/system-virtualization-moves-the-edge-of-the-network/#comments</comments>
		<pubDate>Sat, 19 Nov 2011 01:47:00 +0000</pubDate>
		<dc:creator>Dan Siemon</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[Virtualization]]></category>

		<guid isPermaLink="false">http://www.coverfire.com/?p=1091</guid>
		<description><![CDATA[One of the biggest innovations of the Internet was moving the intelligence from the network to the edge devices. Making the end host responsible for data delivery and creating a network architecture that is application agnostic were radical and incredibly &#8230; <a href="http://www.coverfire.com/archives/2011/11/18/system-virtualization-moves-the-edge-of-the-network/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>One of the biggest innovations of the Internet was moving the intelligence from the network to the edge devices. Making the end host responsible for data delivery and creating a network architecture that is application agnostic were radical and incredibly successful ideas. Although much of the architecture made this switch the demarcation between the network owner and its users forced some features such as access control and provisioning to remain in the access routers and switches. Given the relationship of consumers to their service providers this will probably never change in the consumer Internet market but something very interesting is happening within data centres due to virtualization.</p>
<p><a title="Moving to the edge" href="http://queue.acm.org/detail.cfm?id=1836149">Moving to the Edge: An ACM CTO Roundtable on Network Virtualization</a></p>
<p>One of the most interesting ideas in the discussion linked to above is that the advent of system virtualization necessarily moves the point of enforcement, or intelligence, from the network access layer into the host itself. This comes in the form of the networking features of hypervisors. Hypervisors implement switching and routing but what&#8217;s really interesting is that they are also the best location for functions such as firewalls because implementing these functions as separate devices greatly limits the flexibility of the virtualized data centre. Imagine migrating a VM anywhere the data center and having its firewall rules follow automatically to the new host vs having to choose amongst N hosts which are behind the same firewall.</p>
<p>Two groups may be affected greatly by this change: network equipment vendors and IT networking professionals.</p>
<p style="padding-left: 30px;">I do not believe that owners of existing network infrastructure need to worry about the hardware they already have in place. Chances are your existing network infrastructure provides adequate bandwidth. Longer term, networking functions are being pulled into software, and you can probably keep your infrastructure. The reason you buy hardware the next time will be because you need more bandwidth or less latency. It will not be because you need some virtualization function. (Martin Casado)</p>
<p>The above argues that existing network switches and routers are already good enough for this new architecture. That is, networking equipment will become further commoditized which may not be good from the perspective of Cisco and other equipment vendors.</p>
<p>What about switch and router experts?</p>
<p style="padding-left: 30px;">The people who will be left out in the cold are the folks in IT who have built their careers tuning switches. As the edge moves into the server where enforcement is significantly improved, there will be new interfaces that we&#8217;ve not yet seen. It will not be a world of discover, learn, and snoop; it will be a world of know and cause. (Lin Nease)</p>
<p>and</p>
<p style="padding-left: 30px;">There&#8217;s a contention over who&#8217;s providing the network edge inside the server. It&#8217;s clearly going inside the server and is forever gone from a dedicated network device. A server-based architecture will eventually emerge providing network-management edge control that will have an API for edge functionality, as well as an enforcement point. The only question in my mind is what will shake out with NICs, I/O virtualization, virtual bridges, etc. Soft switches are here to stay, and I believe the whole NIC thing is going to be an option in which only a few will partake. The services provided by software are what is of value here, and Moore&#8217;s law has cheapened CPU cycles enough to make it worthwhile to burn switching cycles inside the server.</p>
<p style="padding-left: 30px;">If I&#8217;m a network guy in IT, I better much more intensely learn the concept of port groups, how VMware, Xen, etc. work, and then figure out how to get control of the password and get on the edge. Those folks now have options that they have never had before.</p>
<p style="padding-left: 30px;">The guys managing the servers are not qualified to lead on this because they don&#8217;t understand the concept of a single shared network. They think in terms of bandwidth and VPLS (virtual private LAN service) instead of thinking about the network as one system that everybody shares and is way oversubscribed. (Lin Nease)</p>
<p>Of course networking experts will still be required but this new world may involve spending a lot more time managing servers than at the router/switch CLI.</p>
<p>The simple network continues to win.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coverfire.com/archives/2011/11/18/system-virtualization-moves-the-edge-of-the-network/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Making the Linux flow classifier tunnel aware</title>
		<link>http://www.coverfire.com/archives/2011/10/16/making-the-linux-flow-classifier-tunnel-aware/</link>
		<comments>http://www.coverfire.com/archives/2011/10/16/making-the-linux-flow-classifier-tunnel-aware/#comments</comments>
		<pubDate>Sun, 16 Oct 2011 22:28:09 +0000</pubDate>
		<dc:creator>Dan Siemon</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[Latency]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Networking]]></category>

		<guid isPermaLink="false">http://www.coverfire.com/?p=1025</guid>
		<description><![CDATA[Flow Classifier The Linux kernel has many different tools for managing traffic. One of them is the flow classifier which allows the user to configure which fields of the packet headers should be used to create a hash which is &#8230; <a href="http://www.coverfire.com/archives/2011/10/16/making-the-linux-flow-classifier-tunnel-aware/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h1>Flow Classifier</h1>
<p>The Linux kernel has many different tools for managing traffic. One of them is the <a title="Flow classifier" href="http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=net/sched/cls_flow.c;h=6994214db8f84d652360d6edcf30f9bb7f94af7d;hb=HEAD">flow classifier</a> which allows the user to configure which fields of the packet headers should be used to create a hash which is then used to identify flows and manage them. For example, if the user selects src,dst,proto,proto-src,proto-dst they get a unique value for each flow (within the limits of the hash). Alternatively, using only src as the key will result in all flows being grouped by the source IP address.</p>
<h1>The Problem</h1>
<p>Below is a slightly simplified version of my home network.</p>
<div id="attachment_1029" class="wp-caption alignleft" style="width: 459px"><a href="http://www.coverfire.com/wp-content/uploads/2011/10/home_network_tunnels.png"><img class="size-full wp-image-1029" title="Simplified home network" src="http://www.coverfire.com/wp-content/uploads/2011/10/home_network_tunnels.png" alt="" width="449" height="523" /></a><p class="wp-caption-text">Figure 1: Simplified home network</p></div>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>All of the traffic, both IPv4 and IPv6, is tunnelled through a Linux router which lives at my service provider. The reason for this complicated setup is that it gives me control of the traffic in both the upstream and downstream. By shaping the traffic to just below the maximum rate in each direction I am able to avoid <a title="Bufferbloat" href="http://www.bufferbloat.net">Bufferbloat</a> problems and prioritize latency sensitive traffic such as SSH, DNS and Vonage. Especially under load, my QoS scripts make marked difference in how fast the Internet feels.</p>
<p>The multiple tunnels present a problem for implementing my QoS scheme because from the perspective of the underlying interface there are only two flows on the network. One for the IP-IP tunnel and one for the IP-IPv6 tunnel. A work around I used for a while was to apply the QoS rules to the IP-IP tunnel interface because that&#8217;s where the bulk of the traffic flows. However, this meant that IPv6 traffic was not properly controlled and any time I had a significant amount of IPv6 traffic I lost all the advantages of my QoS scheme.</p>
<p>To solve this properly I needed a way to look into the tunnels in order to identify the inner network flows. So I&#8217;ve extended the flow classifier with the keys in the following table. IP-IP, IP-IPv6, IPv6-IP and IPv6-IPv6 tunnels are supported.</p>
<table border="0">
<tbody>
<tr>
<td><strong>Key</strong></td>
<td><strong>Description</strong></td>
</tr>
<tr>
<td>tunnel-src</td>
<td>Extract the source IP from the inner header</td>
</tr>
<tr>
<td>tunnel-dst</td>
<td>Extract the destination IP from the inner header</td>
</tr>
<tr>
<td>tunnel-proto</td>
<td>Extract the protocol from the inner header</td>
</tr>
<tr>
<td>tunnel-proto-src</td>
<td>Extract the transport protocol source port from the inner header</td>
</tr>
<tr>
<td>tunnel-proto-dst</td>
<td>Extract the transport protocol destination port from the inner header</td>
</tr>
</tbody>
</table>
<h1> Results</h1>
<p>In order to validate that this works I started a couple SCP uploads to max the upstream bandwidth and then ran <a title="Ping-exp" href="http://www.coverfire.com/projects/ping-exp/">ping-exp</a> to measure the latency. At the start of the test the flow classifier keys were src,dst,proto,proto-src,proto-dst. Approximately half way through I changed the keys to src,dst,proto,proto-src,proto-dst,tunnel-src,tunnel-dst,tunnel-proto,tunnel-proto-src,tunnel-proto-dst. The advantage of keeping the non-tunnel keys is that any traffic created by the router itself is still classified properly. Here is the <a href="http://www.coverfire.com/wp-content/uploads/2011/10/ppp0-simple-drr.txt">tc script </a>I used. You can see the results of this test in the figure 2 below.</p>
<div id="attachment_1069" class="wp-caption alignleft" style="width: 610px"><a href="http://www.coverfire.com/wp-content/uploads/2011/10/before_and_after_tunnel_keys.png"><img class="size-medium wp-image-1069" title="Before and after tunnel keys" src="http://www.coverfire.com/wp-content/uploads/2011/10/before_and_after_tunnel_keys-600x600.png" alt="" width="600" height="600" /></a><p class="wp-caption-text">Figure 2: Before and after tunnel keys</p></div>
<p>For the first half of the test you can see the high latency. This is due to all the traffic from the SCP upload and ICMP pings being placed into the same queue because from the perspective of the flow classifier there is only one flow. In the second half of the test the addition of the tunnel keys allows the flow classifier to place the ICMP packets into a different queue which is not affected by the SCP upload and therefore has much lower latency. The large amount of packet loss during the key change is because the script I used creates a large number of queues. While these queues are being created packets are dropped.</p>
<p>While my network setup may be a bit unique I think it&#8217;s likely that many home networks will have some form of tunnelling in the near future as tunnels are part of several IPv6 migration strategies. So hopefully this little addition will be useful in many different contexts.</p>
<p>Below are links to the two patches that are required. I&#8217;ll post them to Netdev for review shortly.</p>
<p><a href="http://www.coverfire.com/wp-content/uploads/2011/10/clsflow-tunnel-20111016.patch_.txt">clsflow-tunnel-20111016.patch</a></p>
<p><a href="http://www.coverfire.com/wp-content/uploads/2011/10/iproute-clsflow-tunnel-20111016.patch_.txt">iproute-clsflow-tunnel-20111016.patch</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.coverfire.com/archives/2011/10/16/making-the-linux-flow-classifier-tunnel-aware/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Linux flow classifier proto-dst and TOS</title>
		<link>http://www.coverfire.com/archives/2011/10/15/linux-flow-classifier-proto-dst-and-tos/</link>
		<comments>http://www.coverfire.com/archives/2011/10/15/linux-flow-classifier-proto-dst-and-tos/#comments</comments>
		<pubDate>Sat, 15 Oct 2011 15:08:27 +0000</pubDate>
		<dc:creator>Dan Siemon</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[Latency]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Networking]]></category>

		<guid isPermaLink="false">http://www.coverfire.com/?p=1032</guid>
		<description><![CDATA[Recently I&#8217;ve been playing around with the Linux flow classifier on my gateway. The flow classifier provides the ability to group network flows by configuring which parts of the packet headers (referred to as keys) are used in a hash &#8230; <a href="http://www.coverfire.com/archives/2011/10/15/linux-flow-classifier-proto-dst-and-tos/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Recently I&#8217;ve been playing around with the Linux flow classifier on my gateway. The flow classifier provides the ability to group network flows by configuring which parts of the packet headers (referred to as keys) are used in a hash calculation which chooses the output queue.</p>
<p>All of my Internet traffic travels over an IPIP tunnel to another Linux box. I do this so I have control of the QoS in both the upstream and the downstream. A result of this configuration is that from the perspective of the output interface there is only a single network flow.</p>
<p>I configured the flow classifier to use the src,dst,proto,proto-src,proto-dst keys which aims to provide 5-tuple flow fairness. Here&#8217;s the <a href="http://www.coverfire.com/wp-content/uploads/2011/10/ppp0-simple-drr.sh_.txt">simple tc script</a> I used. Due to the IPIP tunnel I expected to see that all traffic would be placed into the same queue. Strangely, the below is what my little <a title="Ping-exp" href="http://www.coverfire.com/projects/ping-exp/">ping-exp</a> utility showed when running at the same time as an SCP upload.</p>
<div id="attachment_1034" class="wp-caption alignleft" style="width: 610px"><a href="http://www.coverfire.com/wp-content/uploads/2011/10/broken-flow-classifier.png"><img class="size-medium wp-image-1034" title="Unexpected flow classifier behavior" src="http://www.coverfire.com/wp-content/uploads/2011/10/broken-flow-classifier-600x600.png" alt="" width="600" height="600" /></a><p class="wp-caption-text">Figure 1: Unexpected flow classifier behaviour</p></div>
<p>Coincidentally I ran ping-exp configured to send three different streams of ICMP traffic with different IP TOS values. Note that SCP automatically sets the IP TOS to the equivalent of the &#8220;Low&#8221; stream in the test.</p>
<p>Notice that the pings using the high and default TOS values appear to be unaffected by low priority ping and SCP traffic. This was unexpected because none of src,dst,proto,proto-src or proto-dst keys should be affected by the TOS value.</p>
<p>After a bit of experimentation I determined that the proto-dst key was the source of the problem. If you spend a bit of time with the <a title="flow_proto_dst" href="https://github.com/torvalds/linux/blob/master/net/sched/cls_flow.c#L156">flow_get_proto_dst()</a> function in <a title="cls_flow.c" href="https://github.com/torvalds/linux/blob/master/net/sched/cls_flow.c">cls_flow.c</a> you&#8217;ll see that if the protocol is ICMP or IPIP, as it is in my test, then the following value is returned:</p>
<pre id="LC195">return addr_fold(skb_dst(skb)) ^ (__force u16)skb-&gt;protocol;</pre>
<p>skb_dst() returns a pointer to a dst_entry structure. Since Linux maintains separate dst_entry structures for each destination,TOS pair the source of the unexpected behaviour is obvious.</p>
<p>I&#8217;m not knowledgeable enough about the Linux network stack to be certain but I don&#8217;t see any value in returning a value for proto-dst which is random with respect to the actual traffic on the wire. At the very least this is not intuitive behaviour.</p>
<p>If you look at <a title="flow_get_proto_src()" href="https://github.com/torvalds/linux/blob/master/net/sched/cls_flow.c#L114">flow_get_proto_src()</a> you&#8217;ll see something similar:</p>
<div>
<pre id="LC153">return addr_fold(skb-&gt;sk);</pre>
<p>In this case a pointer to the local socket structure is used as a fallback. Again, this has no relation to the actual packets on the wire and if the packet does not originate at the local machine then no socket exists which causes this value to be zero anyway.</p>
<p>It seems to me that the most intuitive behaviour would be to have the proto-src and proto-dst keys return zero when they are applied to traffic that doesn&#8217;t have the notion of transport layer ports.</p>
</div>
<p>I&#8217;ll post to <a title="Netdev" href="http://vger.kernel.org/vger-lists.html#netdev">Netdev</a> about this and see what the kernel devs have to say.</p>
<p>Related to this, I have a patch to the flow classifier that adds tunnel awareness which I plan post to Netdev this weekend as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coverfire.com/archives/2011/10/15/linux-flow-classifier-proto-dst-and-tos/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>pfifo_fast and ECN</title>
		<link>http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/</link>
		<comments>http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/#comments</comments>
		<pubDate>Sun, 13 Mar 2011 21:57:30 +0000</pubDate>
		<dc:creator>Dan Siemon</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://www.coverfire.com/?p=954</guid>
		<description><![CDATA[Summary The default queuing discipline used on Linux network interfaces deprioritizes ECN enabled flows because it uses a deprecated definition of the IP TOS byte. The problem By default Linux attaches a pfifo_fast queuing discipline (QDisc) to each network interface. &#8230; <a href="http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h2>Summary</h2>
<p>The default queuing discipline used on Linux network interfaces deprioritizes ECN enabled flows because it uses a deprecated definition of the IP TOS byte.</p>
<h2>The problem</h2>
<p>By default Linux attaches a <a title="pfifo_fast description" href="http://lartc.org/howto/lartc.qdisc.classless.html">pfifo_fast</a> queuing discipline (QDisc) to each network interface. The pfifo_fast QDisc has three internal classes (also known as bands) numbered zero to two which are serviced in priority order. That is, any packets in class zero are sent before servicing class one, any packets in class one are sent before servicing class two. Packets are selected for each class based on the TOS value in the IP header.</p>
<p>The TOS byte in the IP header has an interesting history having been redefined several times.Â Pfifo_fast is based on the <a title="RFC 1349" href="http://www.ietf.org/rfc/rfc1349.txt">RFC 1349</a> definition.</p>
<pre>   0     1     2     3     4     5     6     7
+-----+-----+-----+-----+-----+-----+-----+-----+
|   PRECEDENCE    |       TOS             | MBZ |  <a href="http://tools.ietf.org/html/rfc1349">RFC 1349</a> (July 1992)
+-----+-----+-----+-----+-----+-----+-----+-----+</pre>
<p>Note that in the above definition there is a TOS field within the TOS byte.</p>
<p>Each bit in the TOS field indicates a particular QoS parameter to optimize for.</p>
<table>
<tbody>
<tr>
<td><strong>Value</strong></td>
<td><strong>Meaning</strong></td>
</tr>
<tr>
<td>1000</td>
<td>Minimize delay (md)</td>
</tr>
<tr>
<td>0100</td>
<td>Maximize throughput (mt)</td>
</tr>
<tr>
<td>0010</td>
<td>Maximize reliability (mr)</td>
</tr>
<tr>
<td>0001</td>
<td>Minimize monetary cost (mmc)</td>
</tr>
</tbody>
</table>
<p>Pfifo_fast uses the TOS bits to map packets into the priority classes using the following table. The general idea is to map high priority packets into class 0, normal traffic into class 1, and low priority traffic into class 2.</p>
<table>
<tbody>
<tr>
<td><strong>IP TOS field value</strong></td>
<td><strong>Class</strong></td>
</tr>
<tr>
<td>0000</td>
<td>1</td>
</tr>
<tr>
<td><strong>0001</strong></td>
<td><strong>2</strong></td>
</tr>
<tr>
<td>0010</td>
<td>1</td>
</tr>
<tr>
<td><strong>0011</strong></td>
<td><strong>1</strong></td>
</tr>
<tr>
<td>0100</td>
<td>2</td>
</tr>
<tr>
<td><strong>0101</strong></td>
<td><strong>2</strong></td>
</tr>
<tr>
<td>0110</td>
<td>2</td>
</tr>
<tr>
<td><strong>0111</strong></td>
<td><strong>2</strong></td>
</tr>
<tr>
<td>1000</td>
<td>0</td>
</tr>
<tr>
<td><strong>1001</strong></td>
<td><strong>0</strong></td>
</tr>
<tr>
<td>1010</td>
<td>0</td>
</tr>
<tr>
<td><strong>1011</strong></td>
<td><strong>0</strong></td>
</tr>
<tr>
<td>1100</td>
<td>1</td>
</tr>
<tr>
<td><strong>1101</strong></td>
<td><strong>1</strong></td>
</tr>
<tr>
<td>1110</td>
<td>1</td>
</tr>
<tr>
<td><strong>1111</strong></td>
<td><strong>1</strong></td>
</tr>
</tbody>
</table>
<p>This approach looks reasonable except that RFC 1349 has been deprecated by <a title="RFC 2474" href="http://www.rfc-editor.org/rfc/rfc2474.txt">RFC 2474</a> which changes the definition of the TOS byte.</p>
<pre>   0     1     2     3     4     5     6     7
+-----+-----+-----+-----+-----+-----+-----+-----+
|               DSCP                |    CU     |  <a title="RFC 2474" href="http://www.rfc-editor.org/rfc/rfc2474.txt">RFC 2474</a> (October 1998) and
+-----+-----+-----+-----+-----+-----+-----+-----+    RFC 2780 (March 2000)</pre>
<p>In this more recent definition, the first six bits of the TOS byte are used for the <a title="Diffserv" href="http://en.wikipedia.org/wiki/Differentiated_services">Diffserv</a> codepoint (DSCP) and the last two bits are reserved for use by <a title="ECN" href="http://en.wikipedia.org/wiki/Explicit_Congestion_Notification">explicit congestion notification</a> (ECN). ECN allows routers along a packet&#8217;s path to signal that they are nearing congestion. This information allows the sender to slow the transmit rate without requiring a lost packet as a congestion signal. The meanings of the ECN codepoints are outlined below.</p>
<pre>   6     7
+-----+-----+
|  0     0  |  Non-ECN capable transport
+-----+-----+

   6     7
+-----+-----+
|  1     0  |  ECN capable transport - ECT(1)
+-----+-----+

   6     7
+-----+-----+
|  0     1  |  ECN capable transport - ECT(0)
+-----+-----+

   6     7
+-----+-----+
|  1     1  |  Congestion encountered
+-----+-----+</pre>
<p>[Yes, the middle two codepoints have the same meaning. See <a title="RFC 3168" href="http://tools.ietf.org/html/rfc3168">RFC 3168</a> for more information.]</p>
<p>When ECN is enabled Linux sets the ECN codepoint to ECT(1) or 10 which indicates to routers on the path that ECN is supported.</p>
<p>Since most applications do not modify the TOS/DSCP value, the default of zero is by far the most commonly used. A zero value for the DSCP field combined with ECT(1) results in the IP TOS byte being set to 00000010.</p>
<p>Looking pfifo_fast&#8217;s TOS field to class mapping table (above), we can see that that a TOS field value of 00000010 results in ECN enabled packets being placed into the lowest priority (2) class. However, packets which do not use ECN, those with TOS byte 00000000, are placed into the normal priority class (1). The result is that ECN enabled packets with the default DSCP value are unduly deprioritized relative to non-ECN enabled packets.</p>
<p>The rest of the mappings in the pfifo_fast table effectively ignore the MMC bit so this problem is only present when the DSCP/TOS field is set to the default value (zero).</p>
<p>This problem could be fixed by either changing pfifo_fasts&#8217; default priority to class mapping in <a title="sch_generic.c" href="http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=net/sched/sch_generic.c;hb=HEAD">sch_generic.c</a> or changing the ip_tos2prio lookup table in <a title="route.c" href="http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=net/ipv4/route.c">route.c</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Network latency experiments</title>
		<link>http://www.coverfire.com/archives/2011/02/21/network-latency-experiments/</link>
		<comments>http://www.coverfire.com/archives/2011/02/21/network-latency-experiments/#comments</comments>
		<pubDate>Mon, 21 Feb 2011 21:08:55 +0000</pubDate>
		<dc:creator>Dan Siemon</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Bufferbloat]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[Latency]]></category>

		<guid isPermaLink="false">http://www.coverfire.com/?p=865</guid>
		<description><![CDATA[Recently a series of blog posts by Jim Gettys has started a lot of interesting discussions and research around the Bufferbloat problem. Bufferbloat is the term Gettys&#8217; coined to describe huge packet buffers in network equipment which have been added &#8230; <a href="http://www.coverfire.com/archives/2011/02/21/network-latency-experiments/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Recently a <a title="Bufferbloat blog posts" href="http://en.wordpress.com/tag/bufferbloat/">series of blog posts</a> by <a title="Jim Gettys" href="http://en.wikipedia.org/wiki/Jim_Gettys">Jim Gettys</a> has started a lot of interesting discussions and research around the <a title="Bufferbloat" href="http://www.bufferbloat.net/projects/bloat/wiki/Bufferbloat">Bufferbloat</a> problem. Bufferbloat is the term Gettys&#8217; coined to describe huge packet buffers in network equipment which have been added through ignorance or a misguided attempt to avoid packet loss. These oversized buffers have the affect of greatly increasing latency when the network is under load.</p>
<p>If you&#8217;ve ever tried to use an application which requires low latency, such as VoIP or a SSH terminal at the same time as a large data transfer and experienced high latency then you have likely experienced Bufferbloat. What I find really interesting about this problem is that it is so ubiquitous that most people think this is how it is supposed to work.</p>
<p>I&#8217;m not going to repeat all of the details of the Bufferbloat problem here (see <a title="bufferbloat.net" href="http://www.bufferbloat.net/">bufferbloat.net</a>) but note that Bufferbloat occurs at may different places in the network. It is present within network interface device drivers, software interfaces, modems and routers.</p>
<p>For many the first instinct of how to respond to Bufferbloat is add traffic classification, which is often referred to simply as QoS. While this can also be a useful tool on top of the real solution it does not solve the problem. The only way to solve Bufferbloat is a combination of properly sizing the buffers and <a title="Active Queue Management" href="http://en.wikipedia.org/wiki/Active_queue_management">Active Queue Management</a> (AQM).</p>
<p><span style="color: #444444; line-height: 24px;">As it turns out I&#8217;ve been mitigating the effects of Bufferbloat (to great benefit) on my home Internet connection for some time. This has been accomplished through traffic shaping, traffic classification and using sane queue lengths with Linux&#8217;s queuing disciplines. I confess to not understanding, until the recent activity, that interface queues and driver internal queues are also a big part of the latency problem. I&#8217;ve since updated my network configuration to take this into account. </span></p>
<p>In the remainder of this post I will show the effects that a few different queuing configurations have on network latency. The results will be presented using a little utility I developedÂ called <a title="Ping-exp" href="http://www.coverfire.com/projects/ping-exp/">Ping-exp</a>. The name is a bit lame but Ping-exp has made it a lot easier for me to compare the results of different network traffic configurations.</p>
<p><span id="more-865"></span></p>
<h1><span style="color: #444444; line-height: 24px;">My Home Network</span></h1>
<p><span style="color: #444444; line-height: 24px;">The figure below outlines what my home network looks like. It&#8217;s pretty standard except for the addition of the Linux server (Dest2) which lives at my local ISP. By tunneling traffic between Dest2 and my home router I can control the downstream traffic in a way not available to most Internet users.</span></p>
<div id="attachment_890" class="wp-caption alignnone" style="width: 429px"><a href="http://www.coverfire.com/wp-content/uploads/2011/02/home_network.png"><img class="size-full wp-image-890" title="Home network diagram" src="http://www.coverfire.com/wp-content/uploads/2011/02/home_network.png" alt="" width="419" height="439" /></a><p class="wp-caption-text">Home network diagram</p></div>
<p><span style="color: #444444; line-height: 24px;">My Internet connection is PPPoE based and a operates at a the modest speed of ~4Mbps download and ~768Kbps upload.</span></p>
<h1>Experimental Design</h1>
<p>In each of the experiments below <a title="Ping-exp" href="http://www.coverfire.com/projects/ping-exp/">Ping-exp</a> was run on both hosts A and B, pinging both Dest1 and Dest2, five times per second, with a total of 400 pings (80 seconds).</p>
<pre>ï»¿ï»¿ï»¿./ping-exp.py -w test.data -t 'dest1,&lt;IP1&gt;,0' -t 'dest2,&lt;IP2&gt;,0' -i .2 -c 400</pre>
<p>[Replace &lt;IP1&gt; and &lt;IP2&gt; if cutting and pasting the above]</p>
<p>In the results below I only show data from HostA because the results are very similar on HostB. I had originally planned to perform some experiments which involved per-host fairness but that will have to wait for another time.</p>
<p><span style="color: #444444; line-height: 24px;">Each network configuration was tested under the four network load scenarios outlined in the table below. In each case there was a short interval of approximately one minute between starting the load and running Ping-exp. All load was generated from HostA.</span></p>
<table>
<tbody>
<tr>
<td><strong>Load</strong></td>
<td><strong>Description</strong></td>
</tr>
<tr>
<td>Empty</td>
<td>No traffic</td>
</tr>
<tr>
<td>Upload</td>
<td>2 TCP uploads to C</td>
</tr>
<tr>
<td>Download</td>
<td>3 TCP downloads from C</td>
</tr>
<tr>
<td>Both</td>
<td>2 TCP uploads to C and 3 TCP downloads from C</td>
</tr>
</tbody>
</table>
<h1>Experiment 1: Defaults</h1>
<p>In this experiment all device queues and traffic management configurations were unmodified from the Linux defaults.</p>
<h2>Network Load: Empty</h2>
<div id="attachment_870" class="wp-caption alignright" style="width: 610px"><a href="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-empty.png"><img class="size-medium wp-image-870" title="Host A - defaults - empty" src="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-empty-600x600.png" alt="" width="600" height="600" /></a><p class="wp-caption-text">Figure 1: Host A - defaults - empty</p></div>
<p>Figure 1 provides a baseline for the network with no traffic.</p>
<h2>Network Load: Upload</h2>
<div id="attachment_873" class="wp-caption alignright" style="width: 610px"><a href="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-upload.png"><img class="size-medium wp-image-873" title="Host A - defaults - upload" src="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-upload-600x600.png" alt="" width="600" height="600" /></a><p class="wp-caption-text">Figure 2: Host A - defaults - upload</p></div>
<p>Comparing <a title="Figure 1" href="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-empty.png">figure 1</a> and figure 2 shows that when the upload portion of the link is under load both the latency and jitter greatly increase. However the maximum observed RTT at ~180ms isn&#8217;t terribly high which seems to indicate the upstream portion of the network does not suffer from extream Bufferbloat.  Notice that the amount of packet loss increased from 0% to ~10%.</p>
<p>Even with the sane maximum latency as mentioned above this amount of packet loss would likely make any interactive service which relies on TCP (such as an SSH terminal) unusable due to the time required to recover from lost packets. I suspect 10% packet would also make VoIP conversations unusable but I don&#8217;t have any data to back that up.</p>
<h2>Network Load: Download</h2>
<div id="attachment_874" class="wp-caption alignright" style="width: 610px"><a href="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-download.png"><img class="size-medium wp-image-874" title="Host A - defaults - download" src="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-download-600x600.png" alt="" width="600" height="600" /></a><p class="wp-caption-text">Figure 3: Host A - defaults - download</p></div>
<p>It&#8217;s hard to look at Figure 3 and not immediately be drawn to the interesting latency pattern in the top chart. I suspect that the relatively slow latency increases followed by a sudden latency drop are the result of the network queues filling and then quickly emptying after TCP reduces the send rate when it finally receives a loss event.  There are two other things to note in Figure 3:</p>
<ul>
<li>The overall latency is much worse in the download path.</li>
<li>There is no packet loss experienced by the test flows.</li>
</ul>
<p>I&#8217;m at a loss to explain the latter point.</p>
<h2>Network Load: Both</h2>
<div id="attachment_879" class="wp-caption alignright" style="width: 610px"><a href="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-both.png"><img class="size-medium wp-image-879" title="Host A - defaults - both" src="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-both-600x600.png" alt="" width="600" height="600" /></a><p class="wp-caption-text">Figure 4: Host A - defaults - both</p></div>
<p>Figure 4 shows the latency results when under both upload and download load. Here the average latency increases beyond either the upload or download only case and the &#8216;latency sawtooth&#8217; identified in Figure 3 becomes even more pronounced. The packet loss in Figure 4 is very close to the upload case (Figure 2) which matches well with the download results (Figure 3) where no packet loss was observed.</p>
<h1>Experiment 2: Reduce buffers and shape</h1>
<p>In this experiment the buffers on both ends of the Internet connection were reduced as follows:</p>
<ul>
<li>Gateway
<ul>
<li>The TX queue length (ifconfig txqueuelen) was reduced to 1 packet for the ppp0 interface as well as the underlying eth2 interface.</li>
<li>The queueing discipline on eth2 was changed to a PFIFO with a buffer size of a single packet.</li>
<li>The queueing discipline on the ppp0 interface was changed to a PFIFO with buffer size of 3 packets which is equal to approximately 50ms at the maximum link rate.</li>
</ul>
</li>
<li>Dest2
<ul>
<li>The TX queue length of the eth0 interface was reduced from the default of 1000 to 10.</li>
<li>The queueing discipline on the interface towards the home network was set as a PFIFO with a buffer size of 18 packets which is equal to approximately 50ms at the maximum link rate.</li>
</ul>
</li>
</ul>
<p>In addition to the above buffer changes, traffic shapers were also added to each end of the IPIP tunnel and set to a value below the available network throughput. This removes the effects of any buffering in intermediate network elements.</p>
<h2>Network Load: Empty</h2>
<p>As expected performing the test with no network load obtains <a title="Host A - low buffers - empty" href="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-empty.png">results</a> very similar to <a title="Figure 1" href="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-empty.png">Figure 1</a>.</p>
<h2>Network Load: Upload</h2>
<div id="attachment_883" class="wp-caption alignright" style="width: 610px"><a href="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-upload.png"><img class="size-medium wp-image-883" title="Host A - low buffers - upload" src="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-upload-600x600.png" alt="" width="600" height="600" /></a><p class="wp-caption-text">Figure 5: Host A - low buffers - upload</p></div>
<p>Comparing Figure 5 with <a href="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-upload.png">Figure 2</a> shows a much better latency and jitter profile but the amount of packet loss is somewhat higher (~9%-&gt;14%).</p>
<h2>Network Load: Download</h2>
<div id="attachment_884" class="wp-caption alignright" style="width: 610px"><a href="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-download.png"><img class="size-medium wp-image-884" title="Host A low buffers - download" src="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-download-600x600.png" alt="" width="600" height="600" /></a><p class="wp-caption-text">Figure 6: Host A - low buffers - download</p></div>
<p>Figure 6 shows no sign of the latency sawtooth behavior observed in <a title="Figure 3" href="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-download.png">Figure 3</a>. This appears to confirm the theory that the large buffers were the cause of this phenomenon. Also note that latency and jitter are vastly improved but there is slightly higher packet loss (0%-&gt;2%).</p>
<h2>Network load: Both</h2>
<div id="attachment_886" class="wp-caption alignright" style="width: 610px"><a href="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-both.png"><img class="size-medium wp-image-886" title="Host A - low buffers - both" src="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-both-600x600.png" alt="" width="600" height="600" /></a><p class="wp-caption-text">Figure 7: Host A - low buffers - both</p></div>
<p>Figure 7 shows a much better latency profile when compared with <a title="Figure 4" href="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-both.png">Figure 4</a> but note that the packet loss has increased to a very high level.</p>
<h1>Experiment #3: Reduce interface buffers and shape plus SFB</h1>
<p><a title="SFB" href="http://www.pps.jussieu.fr/~jch/software/sfb/">Stochastic Fair Blue</a> is an active queue management (AQM) scheme which has been <a title="SFB on Bufferbloat mailing list" href="https://lists.bufferbloat.net/pipermail/bloat/2011-February/000034.html">suggested as a possible solution</a> to the buffer bloat problem.  Due to the fact that SFB currently requires a custom kernel module I chose not to install it on Dest2. This server isn&#8217;t easy for me to get access to should something go wrong. I did however install it on the local gateway to obtain upload results.</p>
<p>The only change between this experiment (#3) and #2 is the replacement of the PFIFO queue on the gateway with a SFB queue.  SFB has the ability to mark IP packets via <a title="ECN" href="http://en.wikipedia.org/wiki/Explicit_Congestion_Notification">ECN</a> when they experience congestion. Both Host A and Dest2 have ECN enabled and ECN marked IP and TCP packets were observed during this test.</p>
<div id="attachment_904" class="wp-caption alignleft" style="width: 610px"><a href="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-sfb-upload.png"><img class="size-medium wp-image-904" title="Host A - low buffers - sfb - upload" src="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-sfb-upload-600x600.png" alt="" width="600" height="600" /></a><p class="wp-caption-text">Figure 8: Host A - low buffers - sfb - upload</p></div>
<p>Comparing figure 8 against figures <a title="Figure 2" href="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-upload.png">2</a> and <a title="Figure 5" href="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-upload.png">5</a> shows that SFB does have a better latency profile vs a simple FIFO but the packet loss is also significantly higher.  I haven&#8217;t done enough experimentation with SFB to really understand how to use it effectively but these results do seem to indicate that it&#8217;s worth spending more time with.</p>
<h1>Experiment #4: Sane buffers plus SFQ</h1>
<p>This experiment maintains the buffer size optimizations from experiments 2 and 3 but adds aÂ <a title="SFQ" href="http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=net/sched/sch_sfq.c;hb=HEAD">SFQ</a> queue to each end of the connection. SFQ aims to create per-flow fairness. On the downstream the SFQ queue length (limit parameter) was set to eighteen and on the upstream it was set to three. Both of these values result in approximately 50ms transmission time.</p>
<h2>Network load: Upload</h2>
<div id="attachment_923" class="wp-caption alignleft" style="width: 610px"><a href="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-sfq-upload.png"><img class="size-medium wp-image-923" title="Host A - low buffers - sfq - upload" src="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-sfq-upload-600x600.png" alt="" width="600" height="600" /></a><p class="wp-caption-text">Figure 9: Host A - low buffers - sfq - upload</p></div>
<p>Figure 9&#8242;s direct comparisons are figures <a title="Figure 2" href="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-upload.png">2</a>, <a title="Figure 5" href="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-upload.png">5</a>, <a title="Figure 8" href="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-sfb-upload.png">8</a>. Only the SFB case (Figure 8) has lower average latency and all other scenarios have far higher packet loss.</p>
<h2>Network load: Download</h2>
<div id="attachment_924" class="wp-caption alignleft" style="width: 610px"><a href="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-sfq-download.png"><img class="size-medium wp-image-924" title="Host A - low buffers - sfq - download" src="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-sfq-download-600x600.png" alt="" width="600" height="600" /></a><p class="wp-caption-text">Figure 10: Host A - low buffers - sfq - download</p></div>
<p>Figure 10&#8242;s comparisons are Figures <a title="Figure 3" href="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-download.png">3</a> and <a title="Figure 6" href="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-download.png">6</a>. Figure 10 shows a better latency profile vs the other scenarios with equivalent or better packet loss.</p>
<h2>Network load: Both</h2>
<div id="attachment_925" class="wp-caption alignleft" style="width: 610px"><a href="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-sfq-both.png"><img class="size-medium wp-image-925" title="Host A - low buffers - sfq - both" src="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-sfq-both-600x600.png" alt="" width="600" height="600" /></a><p class="wp-caption-text">Figure 11: Host A - low buffers - sfq - both</p></div>
<p>Comparisons: Figures <a title="Figure 4" href="http://www.coverfire.com/wp-content/uploads/2011/02/A-defaults-both.png">4</a> and <a title="Figure 7" href="http://www.coverfire.com/wp-content/uploads/2011/02/A-lowbuffers-both.png">7</a>.</p>
<p>In this test the bidirectionally loaded link shows a lot of packet loss but latency is better than the other scenarios.</p>
<h1>Throughput</h1>
<p>During these experiments I did not make any effort to compare the overall network throughput. This would be a worthwhile endeavour but anecdotally at least any throughput difference is relatively minor and is a cost worth paying in order to achieve improved latency.</p>
<h1>Summary</h1>
<p>These experiments show the dramatic difference in network latency which can be obtained by modifying the size of packet buffers and adding a bit of traffic classification (SFQ). Among the tested scenarios, SFQ + sane buffers sizes gives the best performance.</p>
<p>When I started this post I had hoped to go through a few more scenarios but that will have to wait for another time. Specifically I wanted to show the results of the somewhat more complicated scheme that I use on a daily basis. This scheme gives much better results than any of the ones presented above. If you are interested and understand how to use <a title="tc" href="http://linux.die.net/man/8/tc">tc</a> you can see the scripts <a title="My QoS scripts" href="http://git.coverfire.com/?p=linux-qos-scripts.git;a=summary">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coverfire.com/archives/2011/02/21/network-latency-experiments/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Next is Now</title>
		<link>http://www.coverfire.com/archives/2010/07/20/next-is-now/</link>
		<comments>http://www.coverfire.com/archives/2010/07/20/next-is-now/#comments</comments>
		<pubDate>Wed, 21 Jul 2010 00:39:13 +0000</pubDate>
		<dc:creator>Dan Siemon</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Internet]]></category>

		<guid isPermaLink="false">http://www.coverfire.com/?p=805</guid>
		<description><![CDATA[Just stumbled on this video created by Rogers. One of a few good quotes: &#8220;10 years ago it took 72 hours to download Godfather&#8230; &#8211; Today it takes 10 minutes &#8211; It still takes 3 hours to watch&#8221;]]></description>
			<content:encoded><![CDATA[<p>Just stumbled on this video created by Rogers.</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="385" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/sM-kGGURWCE&amp;hl=en_US&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="640" height="385" src="http://www.youtube.com/v/sM-kGGURWCE&amp;hl=en_US&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>One of a few good quotes:</p>
<p>&#8220;10 years ago it took 72 hours to download Godfather&#8230; &#8211; Today it takes 10 minutes &#8211; It still takes 3 hours to watch&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coverfire.com/archives/2010/07/20/next-is-now/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Some infrastructure links for Canada 3.0</title>
		<link>http://www.coverfire.com/archives/2009/06/07/some-infrastructure-links-for-canada-30/</link>
		<comments>http://www.coverfire.com/archives/2009/06/07/some-infrastructure-links-for-canada-30/#comments</comments>
		<pubDate>Sun, 07 Jun 2009 16:23:19 +0000</pubDate>
		<dc:creator>Dan Siemon</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Canada 3.0]]></category>
		<category><![CDATA[Internet]]></category>

		<guid isPermaLink="false">http://www.coverfire.com/?p=600</guid>
		<description><![CDATA[Tomorrow the Canada 3.0 conference starts. Since I am attending the infrastructure track I thought it might be useful to collect a bunch of links relating to the Internet as infrastructure. http://www.linuxjournal.com/content/why-internet-infrastructure-need-be-fields-study http://hakpaksak.wordpress.com/2008/09/22/the-etymology-of-infrastructure-and-the-infrastructure-of-the-internet/ http://lafayetteprofiber.com/FactCheck/OpenSystems.html http://news.cnet.com/Fixing-our-fraying-Internet-infrastructure/2010-1034_3-6212819.html http://www.interesting-people.org/archives/interesting-people/200904/msg00168.html http://www.interesting-people.org/archives/interesting-people/200904/msg00175.html http://cis471.blogspot.com/2009/04/why-is-connectivty-in-stockholm-so-much.html http://www.linuxjournal.com/xstatic/suitwatch/2006/suitwatch19.html http://publius.cc/2008/05/16/doc-searls-framing-the-net &#8230; <a href="http://www.coverfire.com/archives/2009/06/07/some-infrastructure-links-for-canada-30/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Tomorrow the <a title="Canada 3.0" href="http://canada30.uwaterloo.ca/">Canada 3.0</a> conference starts. Since I am attending the infrastructure track I thought it might be useful to collect a bunch of links relating to the Internet as infrastructure.</p>
<p><a href="http://www.linuxjournal.com/content/why-internet-infrastructure-need-be-fields-study">http://www.linuxjournal.com/content/why-internet-infrastructure-need-be-fields-study</a></p>
<p><a href="http://hakpaksak.wordpress.com/2008/09/22/the-etymology-of-infrastructure-and-the-infrastructure-of-the-internet/">http://hakpaksak.wordpress.com/2008/09/22/the-etymology-of-infrastructure-and-the-infrastructure-of-the-internet/</a></p>
<p><a href="http://lafayetteprofiber.com/FactCheck/OpenSystems.html">http://lafayetteprofiber.com/FactCheck/OpenSystems.html</a></p>
<p><a href="http://news.cnet.com/Fixing-our-fraying-Internet-infrastructure/2010-1034_3-6212819.html">http://news.cnet.com/Fixing-our-fraying-Internet-infrastructure/2010-1034_3-6212819.html</a></p>
<p><a href="http://www.interesting-people.org/archives/interesting-people/200904/msg00168.html">http://www.interesting-people.org/archives/interesting-people/200904/msg00168.html</a></p>
<p><a href="http://www.interesting-people.org/archives/interesting-people/200904/msg00175.html">http://www.interesting-people.org/archives/interesting-people/200904/msg00175.html</a></p>
<p><a href="http://cis471.blogspot.com/2009/04/why-is-connectivty-in-stockholm-so-much.html">http://cis471.blogspot.com/2009/04/why-is-connectivty-in-stockholm-so-much.html</a></p>
<p><a href="http://www.linuxjournal.com/xstatic/suitwatch/2006/suitwatch19.html">http://www.linuxjournal.com/xstatic/suitwatch/2006/suitwatch19.html</a></p>
<p><a href="http://publius.cc/2008/05/16/doc-searls-framing-the-net">http://publius.cc/2008/05/16/doc-searls-framing-the-net</a></p>
<p><a href="http://free-fiber-to-the-home.blogspot.com/">http://free-fiber-to-the-home.blogspot.com/</a></p>
<p><a href="http://communityfiber.org/cringely.html">http://communityfiber.org/cringely.html</a></p>
<p><a href="http://www.linuxjournal.com/article/10033">http://www.linuxjournal.com/article/10033</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.coverfire.com/archives/2009/06/07/some-infrastructure-links-for-canada-30/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IPv6</title>
		<link>http://www.coverfire.com/archives/2008/06/29/ipv6/</link>
		<comments>http://www.coverfire.com/archives/2008/06/29/ipv6/#comments</comments>
		<pubDate>Sun, 29 Jun 2008 18:45:19 +0000</pubDate>
		<dc:creator>Dan Siemon</dc:creator>
				<category><![CDATA[Internet]]></category>

		<guid isPermaLink="false">http://www.coverfire.com/?p=323</guid>
		<description><![CDATA[For the first time in almost month I had a bit of free time for experimentation today so I decided it was time I set up my home network to use IPv6. I&#8217;ve tried to keep up on the development &#8230; <a href="http://www.coverfire.com/archives/2008/06/29/ipv6/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>For the first time in almost month I had a bit of free time for experimentation today so I decided it was time I set up my home network to use IPv6. I&#8217;ve tried to keep up on the development and deployment of IPv6 but besides setting up a few internal network nodes with IPv6 addresses I haven&#8217;t played with it much in the past.</p>
<h1>Background</h1>
<p>Before I get into configuration here are a few links to some of the better articles and videos that I discovered today. Some of these are pretty technical.</p>
<p><a href="http://www.potaroo.net/ispcol/2007-07/v4end.html">The End of the (IPv4) World</a></p>
<p>Current projections on the time frame for IPv4 address exhaustion.</p>
<p style="padding-left: 30px;">What this prediction is saying is that some time between late 2009 and late 2011, and most likely in mid-2010, when you ask your local RIR for another allocation of IPv4 addresses your request is going to be denied.</p>
<p><a href="http://www.potaroo.net/ispcol/2008-02/tui.html">IPv6 Transition Tools and Tui</a></p>
<p>This article describes 6to4 and Teredo which are two technologies that aim to ease the transition from IPv4 to IPv6.</p>
<p><a href="http://www.potaroo.net/ispcol/2008-04/ipv6.html">IPv6 Deployment: Just where are we?</a></p>
<p>Current IPv6 usage estimations.</p>
<p><a class="blue" href="http://www.circleid.com/posts/google_ipv6_conference_2008/">Google IPv6 Conference 2008</a></p>
<p>Videos from Google&#8217;s IPv6 conference in May 2008.</p>
<h1>IPv6 content</h1>
<p>Other than gaining some experience with IPv6 there really isn&#8217;t a lot of benefit to using IPv6 yet. However, if you are in any way involved with IP networking it may be time to start learning about IPv6. Current projections have the IPv4 address space being exhausted in mid-2010 (<a href="http://www.potaroo.net/ispcol/2007-07/v4end.html">The End of the (IPv4) World</a>).</p>
<p>At the moment there are very few sites available which are accessible via IPv6 and even fewer are IPv6 only. Google has setup an IPv6 version of their main search site at ipv6.google.com. It is unfortunate that Google does not have an IPv6 (AAAA) record for www.google.com yet but given that providing an AAAA record to some hosts without IPv6 connectivity can cause problems, their choice is not surprising. Hopefully these kinks can be worked out as more people gain experience with IPv6. Rather then trying to remember to type ipv6.google.com all of the time I have locally aliased www.google.com to ipv6.google.com. Everything seems to be working normally so far.</p>
<p>SixXS maintains a list of IPv6 content at <a title="Cool IPv6 Stuff" href="http://www.sixxs.net/misc/coolstuff/">Cool IPv6 Stuff</a>. A couple of highlights from this list are the official Beijing 2008 Olympic website and some IPv6 only BitTorrent trackers.</p>
<p>It has been said that the availability of porn is what really drove the adoption of the Internet. In this spirit <a title="The Great IPv6 Experiment" href="http://www.ipv6experiment.com/">The Great IPv6 Experiment</a> is collecting copyright licenses to a large amount of commercial pornography and regular television shows for distribution only via IPv6. The project is due to launch sometime &#8220;soon&#8221;. What a great idea for an experiment.</p>
<h1>A short tutorial for IPv6 and 6to4 on Fedora 9</h1>
<p>Getting things setup turned out to be pretty easy on Fedora. I expect the same is true of any Linux distribution although the details will differ. Since most ISPs do not have native IPv6 support, special technologies are required to connect to other IPv6 nodes over IPv4. I chose 6to4 to connect to the IPv6 network. 6to4 requires a public IP address so if you are behind NAT look into using Teredo instead. Incidentally, Teredo is supported by Windows Vista.</p>
<p>The first step is to enable IPv6 and 6to4 on the publicly facing network interface. The required configuration file can be found in /etc/sysconfig/network-scripts/ifcfg-XXX where XXX is the name of your publicly facing network interface. Some or all of this may be configurable through system-config-network and other GUI tools but I tend to stick to configuration files. I added the following lines:</p>
<pre>IPV6INIT=yes
IPV6TO4INIT=yes
IPV6_CONTROL_RADVD=yes</pre>
<p>A few new entries were also required in the global network configuration (/etc/sysconfig/network).</p>
<pre>NETWORKING_IPV6=yes
IPV6FORWARDING=yes
IPV6_ROUTER=yes
IPV6_DEFAULTDEV="tun6to4"</pre>
<p>Note that if the computer you are configuring is not going to act as a IPv6 gateway for other hosts on your network you probably don&#8217;t want to add IPV6FORWARDING and IPV6_ROUTER. After editing these files restart the network service.</p>
<pre>/sbin/service network restart</pre>
<p>You should now have a new network interface named tun6to4 with an IPv6 address starting with 2002 assigned to it. 2002 (hexadecimal notation) is the first sixteen bits of the IPv6 address space dedicated to 6to4.</p>
<pre>/sbin/ifconfig tun6to4</pre>
<p>Now try pinging an IPv6 addresses.</p>
<pre>ping6 ipv6.google.com</pre>
<p>If you can reach ipv6.google.com you have working IPv6 connectivity. If you are not configuring an IPv6 gateway you can ignore everything below this point.</p>
<p>In order for the configured host to act as a gateway for IPv6 traffic it needs to advertise the IPv6 network prefix to the rest of your network. IPv6 doesn&#8217;t require DHCP for automatic address configuration but does require prefix announcement so the local node can figure out its IPv6 address. Prefix advertisements are handled by the <a title="radvd" href="http://www.litech.org/radvd/">radvd</a> daemon. Below is the configuration I used (/etc/radvd.conf). Note the leading zeros in the prefix. This indicates that radvd should create the IPv6 prefix using the special 6to4 format.</p>
<pre>interface eth0
{
        AdvSendAdvert on;
        MinRtrAdvInterval 30;
        MaxRtrAdvInterval 100;

        prefix 0:0:0:0001::/64
        {
                AdvOnLink on;
                AdvAutonomous on;
                AdvRouterAddr off;
                Base6to4Interface ppp0;
                AdvPreferredLifetime 120;
                AdvValidLifetime 300;
        };
};</pre>
<p>After restarting radvd the other IPv6 capable nodes on your local network should also be automatically assigned an IPv6 address starting with 2002.</p>
<pre>/sbin/service radvd start</pre>
<p>I&#8217;m not sure if this is the way it is supposed to work or not, but eth0 on my gateway never obtains a 2002 IPv6 address automatically (this is box radvd is running on). As a result, I assigned the IPv6 address manually. Since my external IPv4 address never changes this isn&#8217;t a problem for me but it seems wrong to have to manually change the interface address if the external IPv4 address changes even though radvd will correctly advertise the new IPv6 prefix to the rest of the network automatically.</p>
<p>If you already know how IPv6 addresses are constructed skip this paragraph. In what will likely be the most common deployment model, IPv6 addresses are constructed of two parts: a 64-bit network identifier (prefix) and a 64-bit host identifier. The network identifier is assigned by the ISP. In the case of 6to4 the network prefix is constructed by using your public IPv4 address in combination with the first sixteen bits of the address being set to 2002. The host or node identifier is constructed by extending the 48-bit MAC address to 64-bits.</p>
<p>Determining the IPv6 address to assign to the internal interface (eth0) is a little tricky. First get the network prefix portion of the IPv6 address assigned to the tun6to4 interface. You want everything before the /16. This is the first 64-bits of your IPv6 address. Then look at the link-local IPv6 address which is automatically created on eth0. This address will start with fe80. The last 64-bits of this address is also the last 64-bits of the new address because this is the MAC address of the network interface. Copy everything after &#8220;fe80::&#8221;. Append this to the previously obtained network prefix separating the values with a colon. You now have the IPv6 address. Append an &#8220;IPV6ADDR=&#8221; line to /etc/sysconfig/network-scripts/ifcfg-eth0 and restart the network service (or the interface only if you like). You should now be able to ping6 between network nodes using the 2002 prefixed IPv6 addresses.</p>
<p>Once you have established connectivity between the nodes try ping6ing ipv6.google.com from the internal network nodes. If the ping fails you will likely have to investigate the iptables and ip6tables rules on both the gateway and the internal nodes.</p>
<p><span style="font-family: Verdana,Times,Times New Roman; color: #cc6633; font-size: x-small;"><strong> </strong> </span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.coverfire.com/archives/2008/06/29/ipv6/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Amazon EC2 from a network administration perspective</title>
		<link>http://www.coverfire.com/archives/2008/03/26/amazon-ec2-from-a-network-administration-perspective/</link>
		<comments>http://www.coverfire.com/archives/2008/03/26/amazon-ec2-from-a-network-administration-perspective/#comments</comments>
		<pubDate>Thu, 27 Mar 2008 02:17:47 +0000</pubDate>
		<dc:creator>Dan Siemon</dc:creator>
				<category><![CDATA[Amazon]]></category>
		<category><![CDATA[AWS]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[Internet]]></category>

		<guid isPermaLink="false">http://www.coverfire.com/archives/2008/03/26/amazon-ec2-from-a-network-administration-perspective/</guid>
		<description><![CDATA[There has been lots of discussion and buzz around the Amazon Web Services (AWS) lately. I posted a few links about this last week. Most of the articles that I have read on AWS speak of it from a high &#8230; <a href="http://www.coverfire.com/archives/2008/03/26/amazon-ec2-from-a-network-administration-perspective/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>There has been lots of discussion and buzz around the <a href="http://www.amazon.com/gp/browse.html?node=3435361" title="Amazon Web Services">Amazon Web Services (AWS)</a> lately. I posted a <a href="http://www.coverfire.com/archives/2008/03/16/could-computing/" title="Cloud computing links">few links</a> about this last week. Most of the articles that I have read on AWS speak of it from a high level. General discussions about how the service allows your web application to increase capacity as required are interesting but I was curious about the interface that these services present application developers and to the Internet. More specifically, how do the AWS interfaces compare with normal server colocation services.</p>
<p>Amazon AWS is actually a collection of services. The <a href="http://www.amazon.com/b/ref=sc_fe_l_2?ie=UTF8&amp;node=201590011&amp;no=3440661&amp;me=A36L942TSJ2AJA" title="Elastic Compute Cloud">Elastic Compute Cloud</a> (EC2) is the service most commonly discussed. Other interesting services that are part of AWS include the <a href="http://www.amazon.com/S3-AWS-home-page-Money/b/ref=sc_fe_l_2?ie=UTF8&amp;node=16427261&amp;no=3440661&amp;me=A36L942TSJ2AJA" title="Simple Storage Service">Simple Storage Service</a> (S3), <a href="http://www.amazon.com/b/ref=sc_fe_l_2?ie=UTF8&amp;node=342335011&amp;no=3440661&amp;me=A36L942TSJ2AJA" title="SimpleDB">SimpleDB</a> and the <a href="http://www.amazon.com/Simple-Queue-Service-home-page/b/ref=sc_fe_l_2?ie=UTF8&amp;node=13584001&amp;no=3440661&amp;me=A36L942TSJ2AJA" title="Simple Queue Service">Simple Queue Service</a> (SQS). This article will only discuss EC2 but does not aim to be a EC2 tutorial. Amazon provides a good user guide if you are sufficiently interested.</p>
<p>Everything below comes from an afternoon of experimentation with EC2. Please leave a comment with any corrections or other useful bits of information you might have.</p>
<h1>Signing up</h1>
<p>EC2 operates on a pay for what you use model. As a result you need a credit card to use EC2 so Amazon can bill you once per month based on your usage. The first step is to sign-up for an AWS account. This account will give you access to AWS documentation and other content. After you have an AWS account you can then enroll in EC2. It is at this point that the credit card is required.</p>
<p>All interaction with EC2 occurs over web service APIs. Both REST and SOAP style interfaces are supported. Web service authentication occurs via X.509 certificates or secret values depending on the web service API used. Amazon nicely offers to generate an X.509 certificate and public/private keys for you. Letting Amazon create the keys and the certificate is probably a good idea for most people since it is not an entirely trivial task. However, depending on how paranoid you are you might want to create the keys locally. Amazon says they don&#8217;t store the private keys they generate and I have no reason to doubt them but generating the keys locally reduces the possibility that your private key will be compromised.</p>
<h1>It&#8217;s all about virtual machines</h1>
<p>The fundamental unit in EC2 is a virtual machine. If you have experience with <a href="http://en.wikipedia.org/wiki/Xen" title="Xen">Xen</a> or VMWare you can think of EC2 as a giant computer capable of hosting thousands of virtual machines. In fact, the <a href="http://en.wikipedia.org/wiki/Virtualization" title="Virtualization">virtualization</a> technology used by EC2 is Xen. At present only Linux based operating systems are supported but Amazon says that they are working towards supporting additional OSsÂ  in the future. Since Xen already has the capability to host Windows and other operating systems this certainly should be possible.</p>
<p>All virtual machine images in EC2 are stored in Amazon&#8217;s S3 data storage service. Think of S3 as a file system in this context. Each virtual machine image stored in S3 is assigned an Amazon Machine Image (AMI) identifier. It is this identifier that serves as the name of the virtual machine image within EC2.</p>
<p>Virtual machine images within EC2 can be instantiated to become a running instance. Many instances of an image can be running at any one time.Â  Each instance has its own disks, memory, network connection etc so it is completely independent from the other instances booted from the same image. Think of the virtual machine image as an operating system installation disk. This is all very similar to VMWare and other virtualization technologies.</p>
<p>Amazon and the AWS community provide a large number of AMIs for various Linux distributions. Some are general images while others are configured to immediately run a Ruby on Rails application or fill some other specialized role. Of course it is also possible to create new AMIs either for public or private use. Private images are encrypted such that only EC2 has access to them. Since private images will likely contain proprietary code this is a necessary feature.</p>
<p>For an example of why you might want multiple images consider a three tier web application which consists of a web server tier, application tier and a database tier. By having an AMI for each of these machine types the application author can quickly bring new virtual machines in any tier online without having to make configuration changes after the new instance has booted. EC2 also allows a small amount of data to be passed to new instances. This data can be used like command line arguments. For example the address of a database server could be passed to the new instance.</p>
<h1>Interacting with EC2</h1>
<p>All interaction with EC2 occurs via very extensive web service APIs. Creating and destroying new instances is trivial as is obtaining information on the running instances. There is even a system in place for instances to obtain information about themselves such as their public IP address. Where applicable, such as when starting a new virtual machine instance, these web service calls must be authenticated via a X.509 certificate or a secret value.</p>
<p>Since not everyone will want to write their own EC2 management software Amazon provides a set of command line utilities (written in Java) which wrap the web service APIs. This allows the user to start, stop and manage EC2 instances from the command line.</p>
<p>Creating a new instance is as simple as:</p>
<pre>./ec2-run-instances ami-f937d290 -k amazon</pre>
<p>The &#8216;-k amazon&#8217; specifies the name of the SSH private key to use. I&#8217;ll come back to this in a bit. Starting ten instances of this image can be accomplished by adding &#8216;-n 10&#8242;</p>
<pre>./ec2-run-instances ami-f937d290 -n 10 -k amazon</pre>
<p>It is also possible to look at the virtual machines console output. Unfortunately, this is read-only. Management activities are not possible via the console. In this case the instance identifier is passed not the AMI.</p>
<pre>./ec2-get-console-output i-8fad57e6</pre>
<p>Again, all of the management activities happen via web service APIs so you can build whatever management software you require.</p>
<h1>What do the VMs look like?</h1>
<h2>Hardware platforms</h2>
<p>At present Amazon offers three different virtual hardware platforms.</p>
<ol>
<li>Small instance:Â  <span class="small">1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform.</span></li>
<li><span class="small">Large instance: </span><span class="small">7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform.</span></li>
<li><span class="small">Extra large instance: </span><span class="small">15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform.</span></li>
</ol>
<h2>Data storage</h2>
<p><span class="small">The storage layout of a small instance running a Fedora 8 image looks like the following:</span></p>
<pre>Â -bash-3.2# cat /proc/partitions
major minorÂ  #blocksÂ  name

Â Â  8Â Â Â Â  2Â  156352512 sda2
Â Â  8Â Â Â Â  3Â Â Â Â  917504 sda3
Â Â  8Â Â Â Â  1Â Â Â  1639424 sda1</pre>
<pre>-bash-3.2# df -h
FilesystemÂ Â Â Â Â Â Â Â Â Â Â  SizeÂ  Used Avail Use% Mounted on
/dev/sda1Â Â Â Â Â Â Â Â Â Â Â Â  1.6GÂ  1.4GÂ  140MÂ  91% /
noneÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  851MÂ Â Â Â  0Â  851MÂ Â  0% /dev/shm
/dev/sda2Â Â Â Â Â Â Â Â Â Â Â Â  147GÂ  188MÂ  140GÂ Â  1% /mnt</pre>
<p>Output from top:</p>
<pre>Tasks:Â  49 total,Â Â  1 running,Â  48 sleeping,Â Â  0 stopped,Â Â  0 zombie
Cpu(s):Â  0.6%us,Â  1.0%sy,Â  0.0%ni, 96.1%id,Â  0.6%wa,Â  0.0%hi,Â  0.0%si,Â  1.7%st
Mem:Â Â  1740944k total,Â Â Â  88904k used,Â  1652040k free,Â Â Â Â  4520k buffers
Swap:Â Â  917496k total,Â Â Â Â Â Â Â  0k used,Â Â  917496k free,Â Â Â  33424k cached</pre>
<h3>Data persistence</h3>
<p>The disk partitions attached to each instance are allocated when the reservation is created. While these file systems will survive a reboot they will not survive shutting the instance down. Also note that Amazon makes it clear that internal maintenance may shut down virtual machines. This basically means that you cannot consider the disks attached to the reservations as anything more then temporary storage. It is expected that applications running on the EC2 platform will make use of the S3 data storage service for data persistence. In fact, signing up for the EC2 service automatically gives access to S3.</p>
<h2>Network configuration</h2>
<p>Once instantiated each virtual machine has a single Ethernet interface and is assigned two IP addresses. The IP address assigned to the Ethernet interface is a RFC-1918 (private) address. This address can be used for communication between EC2 instances. The second address is a globally unique IP address. This address is not actually assigned to an interface on the virtual machine. Instead NAT is used to map the external address to the internal address. This allows the instance to be directly addressed from anywhere on the Internet but does limit communication to using the protocols supported by Amazon&#8217;s NAT system. At present traffic to and from the virtual machines is limited to the common transport layer protocols (TCP and UDP) making it impossible to use other transport protocols such as SCTP or DCCP.</p>
<p>Both the internal and external IP addresses are assigned to new instances at boot time. EC2 does not support static IP address assignment.</p>
<h1>Authentication and Security</h1>
<h2>Firewall</h2>
<p>Amazon implements firewall functionality in the NAT system which handles all public Internet traffic going to and from the EC2 instances. When instantiated each instance can be assigned a group name or use the default group. The group name functions like an access list. Changing the access rules associated with a group is accomplished with the ec2-authorize command. The following example allows SSH, HTTP and HTTPS to a group named &#8216;webserver&#8217;.</p>
<pre>Â ec2-authorize webserver -P tcp -p 22
Â ec2-authorize webserver -P tcp -p 80
Â ec2-authorize webserver -P tcp -p 443</pre>
<h2>Instance authentication</h2>
<p>The authentication method used to connect to an EC2 instance depends on whether or not you build your own images. If you build your own image you can use whatever authentication or management solution you like. Obvious examples include configuring the image with predefined usernames and passwords and using SSH or perhaps Webadmin. Installing SSH keys for each user and disabling password authentication is probably the best choice.</p>
<p>Authentication when using the publicly available images is a little more complicated.Â  Having a default user/password combination or even default user SSH keys would allow other users to easily login to an instance booted from a publicly available image. To get around this problem Amazon has created a system whereby you can register an SSH key with EC2. During the virtual machine imaging process the public portion of this SSH key is installed as the user key for the root user.</p>
<h1>How is this different from normal server co-location?</h1>
<p>The biggest difference between server colocation and EC2 is the ephemeral nature of the resources in EC2. This is a positive property in that it is trivial to obtain new resources in EC2. On the negative side of things the fact that &#8216;machines&#8217; can disappear and that other resources such as IP address assignments are unpredictable adds new complexities.</p>
<h2>Machine failure</h2>
<p>Amazon states that servers can be shut down during maintenance periods and of course hardware failures will happen. Both of these events will result in virtual machine instances &#8216;failing&#8217;. Since disks and therefore the data that they contain disappear when instances die it seems that the complete failure of individual virtual servers is going to be a more common event than one might expect with traditional server co-location. Consider that a massive power failure event in Amazon&#8217;s data center(s) will be the equivalent to a traditional colocation facility being destroyed. Not only do you temporarily lose operational capability but each and every server and the data they were processing and storing would be gone.</p>
<p>In reality every large scale web service should plan for large failure events and individual server failure is also expected to happen regularly given enough nodes. Perhaps deployment on EC2 will make these events just enough more likely to force developers to address them rather than implicitly assuming that they will never occur.</p>
<p>If anyone reading this has experience using EC2 I would love to hear about how often you experience virtual machine failure.</p>
<h2>HTTP load balancing</h2>
<p>Another interesting complication comes from the fact that EC2 does not support static IP address assignments. Often large web deployments include a device operating as a load balancer in front of many web servers. This may be a specialized device or another server running something like mod_proxy. Using example.com as an example, a typical deployment would point the DNS A records for www.example.com to the load balancer devices. When colocating a server it is normal to be assigned a block of IP addresses for your devices. This makes it easy to replace a failed load balancer node without requiring DNS changes. However, in the case of EC2 you do not know the IP address of your load balancer node until it has booted. As already discussed, this node can disappear and when its replacement comes back online it will be assigned a different IP address.</p>
<p>This presents a problem because the Internet&#8217;s DNS infrastructure relies on the ability of DNS servers to cache information. The length of time that a particular DNS record is cached is called the time to live (TTL). Within the TTL time a DNS server will simply return the last values it obtained for www.example.com rather than traversing the DNS hierarchy to obtain a new answer. The dynamic nature of IP address assignment inside EC2 does not mix well with long TTL values. Imagine a TTL value of one day for www.example.com and the failure of the load balancer node. The result would be up to a full day where portions of the Internet would be unable to reach www.example.com. Perhaps more inconvenient would be the user seeing another EC2 customer&#8217;s site if the address was reassigned.</p>
<p>In order to work around this problem one solution is to use a very low TTL value. This is the approach taken by AideRSS.</p>
<pre>Â $ dig www.aiderss.com

; &lt; &lt;&gt;&gt; DiG 9.5.0b1 &lt; &lt;&gt;&gt; www.aiderss.com
;; global options:Â  printcmd
;; Got answer:
;; -&gt;&gt;HEADER&lt; &lt;- opcode: QUERY, status: NOERROR, id: 3721
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 5, ADDITIONAL: 5

;; QUESTION SECTION:
;www.aiderss.com.Â Â Â Â Â Â Â Â Â Â Â Â Â Â  INÂ Â Â Â Â  A

;; ANSWER SECTION:
www.aiderss.com.Â Â Â Â Â Â Â  3600Â Â Â  INÂ Â Â Â Â  CNAMEÂ Â  aiderss.com.
aiderss.com.Â Â Â Â Â Â Â Â Â Â Â  60Â Â Â Â Â  INÂ Â Â Â Â  AÂ Â Â Â Â Â  72.44.48.168
. . .</pre>
<pre>$ host 72.44.48.168
168.48.44.72.in-addr.arpa domain name pointer ec2-72-44-48-168.compute-1.amazonaws.com.</pre>
<p><a href="http://www.aiderss.com" title="AideRSS">AideRSS</a> is using a sixty second TTL for the aiderss.com A record. This means that every sixty seconds all DNS servers must expire the cached value and go looking for a new value.</p>
<p>Another site hosted on EC2 is <a href="http://www.mogulus.com/" title="Mogulus">Mogulus</a> (just found them when looking for EC2 customers). They take a slightly nicer approach to this problem.</p>
<pre>$ dig www.mogulus.com

; &lt; &lt;&gt;&gt; DiG 9.5.0b1 &lt; &lt;&gt;&gt; www.mogulus.com
;; global options:Â  printcmd
;; Got answer:
;; -&gt;&gt;HEADER&lt; &lt;- opcode: QUERY, status: NOERROR, id: 46281
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 2, ADDITIONAL: 2

;; QUESTION SECTION:
;www.mogulus.com.Â Â Â Â Â Â Â Â Â Â Â Â Â Â  INÂ Â Â Â Â  A

;; ANSWER SECTION:
www.mogulus.com.Â Â Â Â Â Â Â  7200Â Â Â  INÂ Â Â Â Â  AÂ Â Â Â Â Â  67.202.12.112
www.mogulus.com.Â Â Â Â Â Â Â  7200Â Â Â  INÂ Â Â Â Â  AÂ Â Â Â Â Â  72.44.57.45</pre>
<pre>$ host 67.202.12.112
112.12.202.67.in-addr.arpa domain name pointer ec2-67-202-12-112.z-1.compute-1.amazonaws.com.
$ host 72.44.57.45
45.57.44.72.in-addr.arpa domain name pointer ec2-72-44-57-45.z-1.compute-1.amazonaws.com.</pre>
<p>Rather than a single A record with a very low TTL Mogulus uses two A records pointing to two different EC2 nodes and a TTL of 7200 seconds (two hours).</p>
<p>Personally, I consider these low TTL values (especially the 60s one) to be mildly anti-social behavior because it forces additional work on DNS servers throughout the Internet to deal with a local problem. Amazon should consider adding the ability to statically provision IP addresses. This would allow the Internet facing EC2 nodes to have consistent addresses and thereby reduce the failover problems. Like everything else in EC2, this could be charged by usage. I&#8217;d be happy to pay a few dollars (5, 10, x?) a month for single IPv4 address within EC2 that I could assign to a node of my choosing.</p>
<h1>Pricing</h1>
<p>Unless you are reading this close to the date it was written it is probably a good idea to visit Amazon for pricing information instead of relying on the data here.</p>
<h2>Instance time</h2>
<p>When I first starting investigating EC2 I misinterpreted EC2&#8242;s pricing. I thought that instance usage was charged on a CPU time basis. This would effectively mean that an idle server would cost next to nothing. The correct interpretation is that billing is based on how long the instance is running not how much CPU it uses. The current EC2 pricing is:</p>
<ul>
<li><span class="small">$0.10/hour &#8211; Small Instance</span></li>
<li><span class="small">$0.40/hour &#8211; Large Instance</span></li>
<li><span class="small">$0.80/hour &#8211; Extra Large Instance</span></li>
</ul>
<p><span class="small">This makes the constant use of a single small instance cost $70/month. Pretty reasonable especially when you consider that you do not have to buy the hardware.</span><br />
<span class="small"></span></p>
<h2>Data transfer</h2>
<p>Using EC2 also incurs data transfer charges.</p>
<ul>
<li>Data transfer into EC2 from the Internet: $0.10/GB.</li>
<li>Data transfer out of EC2 to the Internet: $0.18/GB (gets cheaper if you use &gt; 10TB/month).</li>
</ul>
<p>Data transfer between EC2 nodes and to/from the S3 persistent storage service is free. Note that S3 has its own pricing structure.</p>
<h1>Summary</h1>
<p>In a lot of ways EC2 is similar to server location services. At its lowest level EC2 gives you a &#8216;server&#8217; to work with. Given the prices outlined above using EC2 as a colocation replacement may be a good choice depending on your requirements.</p>
<p>What really makes EC2 interesting is its API and dynamic nature. The EC2 API makes it possible for resources such as servers and the hosting environment in general to become a component of your application instead of something which the application is built on. Applications built on EC2 have the ability to automatically add and remove nodes as demands change. Replacing failed nodes can also be automated. Giving applications the ability to respond to their environment is very intriguing idea. Somehow it makes the application seem more alive.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.coverfire.com/archives/2008/03/26/amazon-ec2-from-a-network-administration-perspective/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>A new way to look at networking</title>
		<link>http://www.coverfire.com/archives/2008/03/25/a-new-way-to-look-at-networking/</link>
		<comments>http://www.coverfire.com/archives/2008/03/25/a-new-way-to-look-at-networking/#comments</comments>
		<pubDate>Wed, 26 Mar 2008 01:43:03 +0000</pubDate>
		<dc:creator>Dan Siemon</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[Networking]]></category>

		<guid isPermaLink="false">http://www.coverfire.com/archives/2008/03/25/a-new-way-to-look-at-networking/</guid>
		<description><![CDATA[I finally got around to watching A new way to look at networking yesterday. This is a talk given by Van Jacobson at Google in 2006 (yes, it has been on my todo list for a long time).This is definitely &#8230; <a href="http://www.coverfire.com/archives/2008/03/25/a-new-way-to-look-at-networking/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I finally got around to watching <a href="http://video.google.com/videoplay?docid=-6972678839686672840&amp;q=google+tech+talk+networking&amp;total=124&amp;start=0&amp;num=10&amp;so=0&amp;type=search&amp;plindex=0" title="A new way to look at networking">A new way to look at networking</a> yesterday. This is a talk given by <a href="http://en.wikipedia.org/wiki/Van_Jacobson" title="Van Jacobson">Van Jacobson</a> at Google in 2006 (yes, it has been on my todo list for a long time).This is definitely worth watching if you are interested in networking.</p>
<p>A couple of quick comments (These are not particularly deep or anything. This is mostly for my own reference later.):</p>
<ul>
<li>He says that the current Internet was designed for conversations between end nodes but we&#8217;re using it for information dissemination.
<ul>
<li>Me: This distinction relies on the data being disseminated to each user being identical. However, in the vast majority of cases even data that on the surface is identical such as web site content is actually unique for each visitor. Any site with advertisements or with customizable features are good examples. As a result we are still using the Internet for conversations in most situations.</li>
</ul>
</li>
<li>He outlines the development of networking:
<ul>
<li>The phone network was about connecting wires. Conversations were implicit.</li>
<li>The Internet added metadata (the source and destination) to the data which allowed for a much more resilient network to be created. The Internet is about conversations between end nodes.</li>
<li>He wants to add another layer where content is addressable rather than the source or destination.</li>
</ul>
</li>
<li>He argues for making implicit information explicit so the network can make more intelligent decisions.
<ul>
<li>This is what IP did by adding the source and destination to data.</li>
</ul>
</li>
<li>His idea of identifying the data not the source or destination is very interesting. A consequences of this model is that data must be immutable, identifiable and build in metadata such as the version and the date. It strikes me how the internal operation of the <a href="http://git.or.cz/" title="Git">Git version control system</a> matches these requirements.
<ul>
<li>At the moment I write this <a href="http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=cc7feea39bed2951cc29af3ad642f39a99dfe8d3" title="A commit in Linus's kernel tree">cc7feea39bed2951cc29af3ad642f39a99dfe8d3</a> uniquely identifies the current version (content) of Linus&#8217;s kernel development tree.</li>
</ul>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.coverfire.com/archives/2008/03/25/a-new-way-to-look-at-networking/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

