The problem with speedtests

Imagine this scenario. Outside your house, the most awesome superhighway has been built.  It has a speed limit of 120 Mile Per Hour.  You calculate at those speeds you can get to and from work 20 minutes earlier. Life is good.  Monday morning comes, you hop in your 600 horsepower Nissan GT-R, put on some new leather driving gloves, and crank up some good driving music.  Your pull onto the dedicated on-ramp from your house and are quickly cruising at 120 Miles an hour. You make it into work before most anyone else. Life is good.  

Near the end of the week, you notice more and more of your neighbors and co-workers using this new highway.  Things are still fast, but you can’t get up to speed like you could earlier in the week.  As you ponder why you notice you are coming up on the off-ramp to your work.  Traffic is backed up. Everyone is trying to get to the same place.  As you are waiting in the line to get off the superhighway, you notice folks passing you by going on down the road at high rates of speed.  You surmise your off-ramp must be congested because it is getting used more now.

Speedtest servers work the same way. A speedtest server is a destination on the information super-highway. Man, there is an oldie term.  To understand how these servers work we need a quick understanding of how the Internet works.   The internet is basically a bunch of virtual cities connected together.  Your local ISP delivers a signal to you via Wireless, Fiber, or some sort of media. When it leaves your house it travels to the ISP’s equipment and is aggregated with your neighbors and sent over faster lines to larger cities. It’s just like a road system. You may get access via a gravel road, which turns into a 2 lane blacktop, which then may turn into a 4 lane highway, and finally a super-highway.  The roads you take depend on where you are going. Your ISP may not have much control over how the traffic flows once it leaves their network.

Bottlenecks can happen anywhere. Anything from fiber optic cuts, oversold capacity, routing issues, and plain old unexpected usage. Why are these important? All of these can affect your results and can be totally out of control of your ISP and you.  They can also be totally your ISP’s fault.

They can also be your fault, just like your car can be.  An underpowered router can be struggling to keep up with your connection.Much like a moped on the above super-highway can’t keep up with a 600 horsepower car, your router might not be able to keep up either.  Other things can cause issues such as computer viruses, and low performing components.

Just about any network can become a speedtest.net node or a node with some of the other speedtest sites.  These networks have to meet minimum requirements, but there is no indicator of how utilized these servers are.  A network could put up one and it’s 100 percent utilized when you go running a test. This doesn’t mean your ISP is slow, just the off-ramp to that server is slow.

The final thing we want to talk about is the utilization of your internet pipe from your ISP.  This is something most don’t take into consideration.  Let’s go back to our on-ramp analogy.  Your ISP is selling you a connection to the information super-highway.   Say they are selling you a 10 meg download connection.  If you have a device in your house streaming an HD Netflix stream, which is typically 5 megs or so, that means you only have 5 megs available for a capacity test while that HD stream is happening. Speedtest only tests your current available capacity.  Many folks think a speedtest somehow stops all the traffic on your network, runs the test, and starts the traffic. It doesn’t work that way. Your available capacity is only tested at that point in time.  The same is true for any point between you and the speedtest server.  Remember our earlier analogy about slowing down when you got to work because there were so many people trying to get there.  They exceeded the capacity of that destination.  However, that does not mean your connection is necessarily slow because people were zooming past you on their way to less congested destinations.

This is why results to a server should be taken with a grain of salt. They are a useful tool, but not an absolute. The speedtest server is just a destination.  That destination can have bottlenecks, but others don’t.  Even after this long article, there are many other factors which can affect Internet speed. Things we didn’t touch on like Peering, the technology used, speed limits, and other things can also affect your internet speed to destinations.

“Glue addresses” in networking

Imagine this scenario.  You have bought an IP or DIA circuit from someone that is going to provide your network with bandwidth.  Typically this company will make the connection, IP wise, over a /30 or even a /29 of IP space.  I have called this the “glue address” for many years.  This is the IP address that binds (the glue reference) you to the other provider’s network. They can route you IP blocks over that glue address or you can establish BGP across it, but it is the static address which binds the two networks together.

Some network folks call this a peering address.  This isn’t wrong but can infer you are doing BGP peering across the address.  You aren’t always doing BGP across the glue address.

#routinglight #packetsdownrange

Remote Peering

Martin J. Levy from Cloudflare did a presentation about remote peering possibly being a bad thing. In this presentation, he brings up several valid points.

https://www.globalpeeringforum.org/pastEvents/gpf14/presentations/Wed_2_MartinLevy_remote_peering_is_bad_for.pdf

Some thoughts of my own.

Yes, remote peering is happening.  One thing touched upon is the layer3 vs layer2 traffic.  We at MidWest-IX only allow remote peering at a layer2 level unless it is groups like routeviews.org or other non-customer traffic situations.

Many providers are overselling their backbone and transit links.  This oversubscription means access to content networks in places that do not have an exchange or places that do have the content locally can suffer through no fault of the ISP or the content provider.  We have situations with content folks like Netflix who do not join for-profit IXes at the moment, keeping the content further away from customers.  These customers are reaching Netflix through the same transit connections many other providers are.  The can result in congested ports and poor quality for the customer.  The ISP is left trying to find creative ways to offload that traffic.  An Internet Exchange is ideal for these companies because cross-connect charges within data centers are on the rise.

When we first turned up MidWest-IX, now known as FD-IX, in Indianapolis we used a layer2 connection to Chicago to bring some of the most needed peers down to our members.  This connection allowed us to kick-start our IX.  We had one member, who after peering with their top talkers, actually saw an increase in bandwidth.  The data gained told the member that their upstream providers were having a bottleneck issue. They had suspected this for a while, but this confirmed it. Either the upstream provider had a congested link, or their peering ports were getting full.

As content makes it way closer remote peering becomes less and less of an issue.  There are many rural broadband companies just now getting layer2 transport back to carrier hotels. These links may stretch a hundred miles or more to reach the data center.  The rural broadband provider will probably never get a carrier hotel close to them.  As they grow, they might be able to afford to host caching boxes. The additional cost and pipe size to fill the caches is also a determining factor. The tradeoff of hosting and filling multiple cache boxes outweighs the latency of a layer2 circuit back to a carrier hotel.

I think remote peering is necessary to by-pass full links which give the ISP more control over their bandwidth.  In today’s race to cut corners to improve the bottom line having more control over your own network is a good thing. By doing a layer2 remote peer you might actually cut down on your latency, even if your upstream ISP is peered or has cache boxes.

How I learned to love BGP communities and so can you

This content is for Patreon subscribers of the j2 blog. Please consider becoming a Patreon subscriber for as little as $1 a month. This helps to provide higher quality content, more podcasts, and other goodies on this blog.
To view this content, you must be a member of Justin Wilson's Patreon at "Access to patro..." or higher tier

The problem with peering from a logistics standpoint

Many ISPs run into this problem as part of their growing pains.  This scenario usually starts happening with their third or 4th peer.

Scenario.  ISP grows beyond the single connection they have.  This can be 10 meg, 100 meg, gig or whatever.  They start out looking for redundancy. The ISP brings in a second provider, usually at around the same bandwidth level.  This way the network has two pretty equal paths to go out.

A unique problem usually develops as the network grows to the point of peaking the capacity of both of these connections.  The ISP has to make a decision. Do they increase the capacity to just one provider? Most don’t have the budget to increase capacities to both providers. Now, if you increase one you are favoring one provider over another until the budget allows you to increase capacity on both. You are essentially in a state where you have to favor one provider in order to keep up capacity.  If you fail over to the smaller pipe things could be just as bad as being down.

This is where many ISPs learn the hard way that BGP is not load balancing. But what about padding, communities, local-pref, and all that jazz? We will get to that.  In the meantime, our ISP may have the opportunity to get to an Internet Exchange (IX) and offload things like streaming traffic.  Traffic returns to a little more balance because you essentially have a 3rd provider with the IX connection. But, they growing pains don’t stop there.

As ISP’s, especially WISPs, have more and more resources to deal with cutting down latency they start seeking out better-peered networks.  The next growing pain that becomes apparent is the networks with lots of high-end peers tend to charge more money.  For the ISP to buy bandwidth, they usually have to do it in smaller quantities from these types of providers. Buying this way introduces the probably of a mismatched pipe size again with a twist. The twist is the more, and better peers a network has the more traffic is going to want to travel to that peer. So, the more expensive peer, which you are probably buying less of, now wants to handle more of your traffic.

So, the network geeks will bring up things like padding, communities, local-pref, and all the tricks BGP has.  But, at the end of the day, BGP is not load balancing.  You can *influence* traffic, but BGP does not allow you to say “I want 100 megs of traffic here, and 500 megs here.”  Keep in mind BGP deals with traffic to and from IP blocks, not the traffic itself.

So, how does the ISP solve this? Knowing about your upstream peers is the first thing.  BGP looking glasses, peer reports such as those from Hurricane Electric, and general news help keep you on top of things.  Things such as new peering points, acquisitions, and new data centers can influence an ISPs traffic.  If your equipment supports things such as NetFlow, sflow, and other tools you can begin to build a picture of your traffic and what ASNs it is going to. This is your first major step. Get tools to know what ASNs the traffic is going to   You can then take this data, and look at how your own peers are connected with these ASNs.  You will start to see things like provider A is poorly peered with ASN 2906.

Once you know who your peers are and have a good feel on their peering then you can influence your traffic.  If you know you don’t want to send traffic destined for ASN 2906 in or out provider A you can then start to implement AS padding and all the tricks we mentioned before.  But, you need the greater picture before you can do that.

One last note. Peering is dynamic.  You have to keep on top of the ecosystem as a whole.

BGP communities and traffic engineering

BGP communities can be powerful, but an almost mystical thing.  If you aren’t familiar with communities start here at Wikipedia.  For the purpose of part one of this article, we will talk about communities and how they can be utilized for traffic coming into your network. Part two of this article will talk about applying what you have classified to your peers.

So let’s jump into it.  Let’s start with XYZ ISP. They have the following BGP peers:

-Peer one is Typhoon Electric.  XYZ ISP buys an internet connection from Typhoon.
-Peer two is Basement3. XYZ ISP also buy an internet connection from Basement3
-Peer three is Mauler Automotive. XYZ ISP sells internet to Mauler Automotive.
-Peer four is HopOffACloud web hosting.  XYZ ISP and HopOffACloud are in the data center and have determined they exchange enough traffic amongst their ASN’s to justify a dedicated connection between them.
-Peer five is the local Internet exchange (IX) in the data center.

So now that we know who our peers are, we need to assign some communities and classify who goes in what community.  The Thing to keep in mind here, is communities are something you come up with. There are common numbers people use for communities, but there is no rule on what you have to number your communities as. So before we proceed we will need to also know what our own ASN is.  For XYZ we will say they were assigned AS64512. For those of you who are familiar with BGP, you will see this is a private ASN.  I just used this to lessen any confusion.  If you are following along at home replace 65412 with your own ASN.

So we will create four communities .

64512:100 = transit
64512:200 = peers
64512:300 = customers
64512:400 = my routes

Where did we create these? For now on paper.

So let’s break down each of these and how they apply to XYZ network. If you need some help with the terminology see this previous post.
64512:100 – Transit
Transit will apply to Typhoon Electric and Basement3.  These are companies you are buying internet transit from.

64512:200 – Peers
Peers apply to HopOffACloud and the IX. These are folks you are just exchanging your own and your customer’s routes with.

64512:300 – Customers
This applies to Mauler Automotive.  This is a customer buying Internet from you. They transit your network to get to the Internet.

64512:200 – Local
This applies to your own prefixes.  These are routes within your own network or this particular ASN.

Our next step is to take the incoming traffic and classify into one of these communities. Once we have it classified we can do stuff with it.

If we wanted to classify the Typhoon Electric traffic we would do the following in Mikrotik land:

/routing filter
add action=passthrough chain=TYPHOON-IN prefix=0.0.0.0/0 prefix-length=0-32 set-bgp-communities=64512:100 comment="Tag incoming prefixes with :100"

This would go at the top of your filter chain for the Typhoon Electric peer.  This simply applies 64512:100 to the prefixes learned from Typhoon.

In Cisco Land our configuration would look like this:

route-map Typhoon-in permit 20  
match ip address 102  
set community 64512:100

The above Cisco configuration creates a route map, matches a pre-existing access list named 102, and applies community 64512:100 to prefixes learned.

For Juniper you can add the following command to an incoming peer in policy-options:

set community Typhoon-in members 64512:100

Similar to the others you are applying this community to a policy.

So what have we done so far, we have taken the received prefixes from Typhoon Electric and applied community 64512:100 to it.  This simply puts a classifier on all traffic from that peer. We could modify the above example to classify traffic from our other peers based upon what community we want them tagged as.

In our next segment we will learn what we can do with these communities.