This is part 3 in a sequence of posts examining how broadband services actually work. Part 1 looked at ISP concentration ratios and part 2 examined the impact of averaging many subscribers. Future posts will consider how to size backhaul links and how to configure buffers.
Two major reasons people purchase broadband (or sign up for more capacity) are to improve interactivity and to get access to new applications, for example streaming movies. New apps is clear, but it's worth looking congestion and how congestion affects interactivity.
Congestion has two impacts. First you get increased delays, as packets are buffered in queues waiting for free capacity on the next link. Usually this adds tens or hundreds of milliseconds to round trip times. Then, if the average traffic exceeds link capacity, you get packet loss. If you do have congestion, it's these packet losses that cause TCP traffic flows to back off, thus avoiding congestion collapse (and even less throughput).
Here is a graph of packet loss on a service that most of us would consider unacceptable. These measurements were taken by Tom Dunigan on an early cable modem service (February 2001) that was substantially upgraded about a year later.
The next graph (also from Tom Donigan) shows how this congestion impacts interactivity. The graph shows the echo delay for typing into a remote program using telnet in character-at-a-time mode. One test (red line) was run in the early morning and one test (green line) was run during the evening when packet loss was peaking. Each test measured the echo response time for 100 successive individual characters. Notice the typical echo delay goes up from ~130 ms in the morning to roughly 200-300 ms in the evening, but with occasional delays of more than one second.
Relatively few programs operate character-by-character, so the extra 100 ms or so might not matter. On the other hand, a once second delay is noticeable!
A more common activity is viewing content on websites. Here, the dominant time in any interaction is the time spent waiting for web content to download to your browser. Since web content is downloaded using TCP (and more generally, TCP is the dominant protocol in use today), it's worth looking at the impact of packet loss on TCP throughput.
The TCP protocol includes a congestion avoidance algorithm which is triggered by packet loss. When a packet is lost, the TCP sender slows down. As a result, the data rate for a single TCP flow looks like this (thanks to Guido Appenzeller):
Of course this assumes the TCP flow lasts long enough to saturate the bottleneck link and that this is the only flow in the network!
What happens in real networks is complex, but RFC 3155 has an approximate formula for how a single flow is affected by packet loss in real network. Even better, Bill Gibson at Niwot Networks has produced a nice graphic based on that formula for his paper on TCP limitations on file transfer performance. It looks like this:
This is dramatic! The different lines reflect different end-to-end round trip times (RTT) - times that will vary depending on the site you are connecting with. Also, they represent the maximum throughput you could achieve with long duration TCP flows and no other bottlenecks. What's notable is the logarithmic throughput scale on the left and the fact that, at 1% packet loss (0.01 on the bottom scale), potential throughput drops by a factor of 100!
Again, there are many caveats. A real network has a mix of short- and long-lived flows. The local operating system may not be optimized to take full advantage of broadband speeds (although MS Windows actually got better at this with Vista). None-the-less, even 1% or 2% packet loss is correlated with poor user experience.
In short, to actually obtain the instantaneous throughput you thought you were purchasing, you don't want packet loss in upstream portions of the network.
I'll discuss what this actually implies for routers and backhaul links in a subsequent post.
Nice post! One thing to add is that increased latency often also affects throughput as many connections have a limited maximum TCP window size. If the TCP window size is limited, the througput scales with 1/RTT. Thus double latency (as shown in the example above) will reduce throughput by half. ISP's sometimes make use of this by using shapers that intentionally delay ACKs of TCP flows to limit flow throughut without increasing the loss rate.
I don't know how common it is today for flows to be limited by their maximum window size. 5 years ago it was the case for essentially all residential TCP connections, today I would still expect it to be common.
Posted by: Guido Appenzeller | December 22, 2009 at 12:49 PM
This is really a very informative post.I think we be thankful to you because you have surely done a good job by letting us know this.I liked the idea few programs operate character-by-character, so the extra 100 ms or so might not matter. On the other hand, a once second delay is noticeable!
Posted by: Juliet Waugh | January 17, 2010 at 11:58 PM