bgp

The Mess we call BGP

Ever wonder why BGP seems to be such a complicated protocol to administer? It seems pretty straightforward to set up. Some commands, and you have a BGP session. Easy huh? BGP is one of those things where the more BGP feeds you bring in, the more complex traffic management becomes. Why? Take a look at the following graphic.

https://thyme.apnic.net/BGP/ARIN/#

What you are looking at is a small visualization of some of the AS connections to Hurricane Electric (AS6939) in North America. This is not all of them, just what I could fit on the screen for this article. Some of these are “transit ASes” which means they sit between Hurricane and another network or networks. This is important to understand because they can influence how your traffic reaches customers or resources on Hurricane electric if they are between you and them. The same thing goes for Hurricane Electric. They are a transit AS between companies and resources. Their policies in terms of BGP traffic can influence your traffic. This is just one AS. There are thousands and thousands of others.

Now imagine you have 4 upstream providers with various peerings and upstream peers. Each one of them can do various manipulations to the same destination. Your routers will pick the best path, but that path may have Congestion or a host of other influences on your traffic.

For myself, as a network engineer, being able to diagnose and troubleshoot path issues is an art, just as it is a science.

ARIN consulting package end-of-the-year special

From now until the end of the year, I am running the following consulting package, including the following ARIN services.

-Helping you set up your organization within ARIN
-Helping you set up your Point of Contact (POC) records
-Getting your own ASN
-Getting an IPV6 allocation
-Generating RPKI and ROA for up to 10 IP blocks (V4 and V6)
-Creating route registry entries for 1 ASN and up to 10 IP blocks
-Creating a PeeringDB entry and linking that to your route registry
-Setting up your IP blocks to point to a reverse DNS server
-Updating your whois information (if needed)
-Signing you up for ShadowServer reports
-Signing you up for monitoring of your blocks (up to 5 for free)
-Tutorial on using Looking Glasses to view your IP blocks and how they relate to other networks

All of this for $1200. This is a savings of over $800 with this promotion. Don’t wait. I only have limited slots available. I can put you on a payment plan (10% fee) or take a 20% deposit to secure the promotion for 60 days.

Optional Add ons
-Hosting a reverse DNS server for your IPs
-IPV6 Deployment plan
-Justification for getting on the waiting list for an IPV4 block
-BGP setup for Team CYMRU

You can e-mail me here for more details.

Learn BGP for $6.99 for My Patreons

This content is for Patreon subscribers of the j2 blog. Please consider becoming a Patreon subscriber for as little as $1 a month. This helps to provide higher quality content, more podcasts, and other goodies on this blog.
To view this content, you must be a member of Justin's Patreon
Already a qualifying Patreon member? Refresh to access this content.

A tool to find out if BGP is lying to you

APNIC has a bog article on detecting “bgp lies”.

https://blog.apnic.net/2021/05/24/a-tool-to-detect-bgp-lies/
Do you ever wonder whether you can really trust other networks, such as your provider(s) and peers? More precisely, wouldn’t you like to be able to tell if the traffic you send always flows through the paths received in the Border Gateway Protocol (BGP)? Could it be that, for some prefixes, the forwarding path might differ?

BGP, a single /24 and two diverse non-connected exit points

I am starting to see the following scenario more and more as IPv4 space is hard to get, but isn’t.

With ARIN it is still possible to get an IPv4 allotment. Many smaller ISPs qualify for a /24 and can get one if they wait long enough on the ARIN waiting list. a /24 of IPv4 space is the smallest block that 99% of the Internet allows to be advertised on the Capital I Internet. There are filter rules in place that drop smaller prefixes because that is the agreed upon norm.

So what happens if you are an ISP and you have a shiny new /24 but you have two networks which are not connected. Let’s look at our scenario.

The above network have no connectivity between the two of them on the internal side. These could be half way across the world or next door. If they were half way across the world it would make sense to try and get another /24. Maybe they are either side of a big mountain or one is down in a valley and there is no way to get a decent link between the two networks.

So what is a way you can use this /24 and still be able to assign IP addresses to both sides of the network? One way is to use a tunnel between your two edge routers.

Without the tunnel the scenario is traffic could come into network1, but if the IP is assigned on network 2 it will come back as unreachable. BGP is all about networks finding the shortest path to other networks. You don’t have much control over how networks find your public IP space if you have two providers advertising the same information. Some of the Internet will come in Network2 and some will come in Network1.

By running a tunnel between the two you can now subnet out that /24 into two eqal /25s and assign one /25 Network1 and one /25 to Network2 or however you want to. You can make the tunnel a GRE, EOIP, or other tunnel type. If I am using Mikrotik I prefer to use EOIP. If it’s another vendor I tend to use GRE.

Once the tunnel is established you can use static routing, OSPF, or your favorite IGP (interior Gateway Protocol) to “tell” one side about the routes on the other side. Let’s look at a fictional use.

In the above example our fictional ISP has an IPv4 block of 1.2.3.0/24. They have two networks separated by a tall mountain range in the center. It’s too cost prohibitive to run fiber or a wireless backhaul between the two networks so they have two different upstream providers. The ISP is advertising this /24 via BGP to Upstream1 from the Network 1 router. Network 2 router is also advertising the same /24 via BGP to Upstream 2.

We now create a Tunnel between the Mikrotiks. As mentioned before this can be EOIP, GRE, etc. We won’t go into the details of the tunnel but let’s assume the ISP is using Mikrotik. We create an EOIP tunnel (tons of tutorials out there) between Network 1 router and Network 2 router. Once this is established we will use 172.16.200.0/30 as our “Glue” on our tunnel interfaces at each side. Network 1 router gets 172.16.200.1/30. Network 2 router gets 172.16.200.2/30

To keep it simple we have a static route statement on the Network 1 Mikrotik router that looks like this:

/ip route add dst-address=1.2.3.129/25 gateway=172.16.200.2

This statement routes any traffic that comes in for 1.2.3.128/25 via ISP 1 to network1 across the tunnel to the Network 2 router. The Network 2 router then send it to the destination inside that side of the network.

Conversely, we have a similar statement in the Network 2 Mikrotik router

/ip route add dst-address=1.2.3.0/25 gateway=172.16.200.1

This statement routes any traffic that comes in for 1.2.3.0/25 via ISP 2 to network2 across the tunnel to the Network 2 router. The Network 2 router then send it to the destination inside that side of the network.

It’s as simple as that. You can apply this to any other vendor such as Cisco, Juniper, PFSense, etc. You also do not have to split the network into even /25’s like I did. You can choose to have os of the ips available on one side and route a /29 or something to the other side.

The major drawback of this scenario is you will takef a speed hit because if the traffic comes in one side and has to route across the tunnel it will have to go back out to the public internet and over to the other ISP.

#packetsdownrange

Mikrotik RouterOS and CPU usage

There always is a lot of talk about Mikrotik RouterOS CPU usage. I wanted to take a few minutes and go over a real-world example and explain some of the ins and outs when discussing Mikrotik CPU usage.

Let’s talk about the router in question. This is a CCR1016-12s-1S+. This is a 16 core 1.2GHz per core and 2GB RAM tilex based router. It is currently pulling in 1,764,849 IPv4 routes. There are two transit provider BGP feeds, multiple direct peers, an Internet Exchange peer to dual-route servers. The router handles a little over 3 gigs of routed traffic at peak times. Most of the traffic is on VLANs coming from a Cisco switch to the SFPPlus port.

One of the first things people turn on is the overall CPU usage within winbox. I like to think of this as an overall view of the CPUs on this router. Keep in mind there are 16.

Th next thing to investigate when it comes to CPU is to open up System..resources. Once there clock on CPU.

Mikrotik System..resources

It will then bring up a screen that looks like the following.

Oh My we have 100% CPU! Must replace this router ASAP! Calm down, remember you have 16 cores. So, why is this CPU at 100% and what ramifications does this have?

Remember earlier when we talked about BGP? In Mikrotik, BGP is not a multi-core aware process. This means BGP is limited to just one core to do it’s work. Since there are always routes being withdrawn and re-added to the routing table it is a busy process. Lots of math calculations going on. The key thing is this is expected behavior on a router running multiple BGP peers such as this one. This is not a bad thing, but not ideal. Throwing more cores at BGP is not the answer. Optimizing the process, as it has been done in V7 is the way to go.

If we expand the CPU window we will notice other processes are multi-core aware and.or are spreading their load among different cores.

As you can see we are in pretty good shape. We have a few CPUs above 50% utilization but, only a few. I will keep reminding you of the fact we have 16 of them.

Closing notes:
Diagnosing CPU issues can get a little complicated because routers like the 3011 have some have the majority of their ports shared with a single CPU bus.  https://wiki.mikrotik.com/images/f/f3/Switch_chip_block_diagram.png. As you can tell in the diagram there are 5 ports which share 1 Gig to the CPU.  The fact that an actual switch chip with hardware offloading is in the middle helps, but the bus is still oversold.  This is one reason consolidating routers to an actual switch will make a difference.  

Janis Megis from Mikrotik had presentation at MUM, which is a little older now, still sheds a lot of light on how Mikrotik CPU works.  https://mum.mikrotik.com/presentations/US10/Megis.pdf There is some pretty interesting stuff starting on page 14

With Mikrotik switching to ARM processors we will see huge differences with them and RotuerOS7. We will see less cores, but better utilization of those cores. The new 2004 with all SFP and 2 25 gig ports only has 4 CPU.

So the next time you look at a router, take a few moments to see how utilized the entire CPU architecture is instead of just one CPU.

#packetsdownrange #mikrotik

Hurricane Electric Route Filtering Algorithm

The following is from http://routing.he.net/algorithm.html . This outlines the criteria HE.NET uses for filtering routes from peers and customers.

This is the route filtering algorithm for customers and peers that have explicit filtering:

1. Attempt to find an as-set to use for this network.
1.1 Inspect the aut-num for this ASN to see if we can extract from their IRR policy for what they would announce to Hurricane by finding export or mp-export to AS6939, ANY, or AS-ANY.
1.2 Also see if they set what looks like a valid IRR as-set name in peeringdb.

2. Collect the received routes for all BGP sessions with this ASN. This details both accepted and filtered routes.

3. For each route, perform the following rejection tests:
3.1 Reject default routes 0.0.0.0/0 and ::/0.
3.2 Reject paths using BGP AS_SET notation (i.e. {1} or {1 2}, etc). See draft-ietf-idr-deprecate-as-set-confed-set.
3.3 Reject prefix lengths less than minimum and greater than maximum. For IPv4 this is 8 and 24. For IPv6 this is 16 and 48.
3.4 Reject bogons (RFC1918, documentation prefix, etc).
3.5 Reject exchange prefixes for all exchanges Hurricane Electric is connected to.
3.6 Reject routes that have RPKI status INVALID_ASN or INVALID_LENGTH based on the origin AS and prefix.

4. For each route, perform the following acceptance tests:
4.1 If the origin is the neighbor AS, accept routes that have RPKI status VALID based on the origin AS and prefix.
4.2 If the prefix is an announced downstream route that is a subnet of an accepted originated prefix that was accepted due to either RPKI or an RIR handle match, accept the prefix.
4.3 If RIR handles match for the prefix and the peer AS, accept the prefix.
4.4 If this prefix exactly matches a prefix allowed by the IRR policy of this peer, accept the prefix.
4.5 If the first AS in the path matches the peer and path is two hops long and the origin AS is in the expanded as-set for the peer AS and either the RPKI status is VALID or there is an RIR handle match for the origin AS and the prefix, accept the prefix.

5. Reject all prefixes not explicitly accepted

Don’t try this at home kids. Automated BGP Optimization

https://radar.qrator.net/blog/as10990-routing-optimization-tale
Conclusion? Do not try to optimize the routes with automated software – BGP is a distance-vector routing protocol that has proved, throughout the years, its ability to handle the traffic. Software, wanting to “optimize” the system involving thousands of members would never be smart enough to compute all the possible outcomes of such manipulation.

Network troubleshooting tools

Recently, there was a thread on the NANOG list asking what were somne favorite network troubleshooting tools. I have taken many of these tools and created the following list.

http://ping.pe/
Simple pingport and dig commands

https://mtr.sh/
BGP Looking glass

https://perfops.net/mtr-from-world
Traceroute from various hosts on the net

http://www.traceroute6.net/
IPV6 tools (ping,traceroute,etc)

https://dnsviz.net/
Carious DNS tools

http://irrexplorer.nlnog.net/
Routing Registry object explorer

https://mxtoolbox.com/
DNS and Mail tools