Testing a fiber cross-connect at a data center. #packetsdownrange

Writings from Justin Wilson
Testing a fiber cross-connect at a data center. #packetsdownrange

Over the years I have been able to narrow the most common reasons a service provider goes down or has an outage. This is, by no means, an extensive list. Let’s jump in.
Layer1 outages
Physical layer outages are the easiest and where you should always start. If you have had any kind of formal training you have ran across the OSI model. Fiber cuts, equipment failure, and power are all physical layer issues. I have seen too many engineers spend time looking at configs when they should see if the port is up or the device is on.
DNS related
DNS is what. makes the transition from the man world to the machine world (queue matrix movie music). Without DNS we would not be able to translate www.j2sw.com into an IP address the we-servers and routers understand. DNS resolution problems are what you are checking when you do something like:
PING j2sw.com (199.168.131.29): 56 data bytes 64 bytes from 199.168.131.29: icmp_seq=0 ttl=52 time=33.243 ms 64 bytes from 199.168.131.29: icmp_seq=1 ttl=52 time=32.445 ms --- j2sw.com ping statistics --- 2 packets transmitted, 2 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 32.445/32.844/33.243/0.399 ms
Software bugs
Software bugs typically are always a reproducible thing. The ability to reproduce these bugs is the challenge. Sometimes a memory leak happens on a certain day. Sometimes five different criteria have to be met for the bug to happen.
Version mismatches
When two or more routers talk to each other they talk best when they are on the same software version. A later version may fix an earlier bug. Code may change enough between version numbers that certain calls and processes are speaking slightly differently. This can cause incompatibilities between software versions.
Human mistakes
“Fat fingering” is what we typically call this. A 3 was typed instead of a 2. This is why good version control and backups with differential are a good thing. Things such as cables getting bumped because they were not secured properly are also an issue.
What can we do to mitigate these issues?
1.Have good documentation. Know what is plugged in where what it looks like and as much detail as possible. You want your documentation to stand on its own. A person should be able to pick it up and follow it without calling someone.
2.Proactive monitoring. Knowing problems before customers call is a huge deal. Also, being able to identify trends over time is a good way to troubleshoot issues. Monitoring systems also allow you to narrow down the problem right away.
3.When it comes to networking know the OSI model and start from the bottom and work your way up.
Books can and are written about troubleshooting, This has just been a few of the common things I have seen.
Between Oct. 12 and Nov. 14 1930 the eight-story 11,000-ton Indiana Bell building was shifted 52 feet south along Meridian St. and rotated 90 degrees to face New York St. Workmen used a concrete mat cushioned by Oregon fir timbers 75-ton, hydraulic jacks and rollers, as the mass moved off one roller workers placed another ahead of it. Every six strokes of the jacks would shift the building three-eights of an inch – moving it 15 inches per hour.
https://www.indystar.com/story/news/history/retroindy/2014/01/07/indiana-bell/4354705/
Over the years I have worked in hundreds of data centers. The number is probably closer to 1,000 than 100. Each data center has its own nuances and policies. This varies by company but can vary by data center within the same company. It can be quite the tangled mess to keep track of. As I was sitting here not able to sleep I wanted to come up with a list of some things Data Centers should do to make tenant life easier. Some of these you can do yourself, and I do for many sites. Some of these overlap, but are important to mention a few times So let’s get into it
Don’t assume your customers know what is going on
Some of us don’t visit these sites very often. Many times it’s late at night when no one is around except security. Much of this list revolves around the things we don’t know which may be common knowledge for those at the data center everyday.
Pre-visit items
Do I need to open a ticket to access my equipment? Some facilities I have to. Others I don’t.
Have a published facilites and helpline
I ran into a situation today. I was visiting a site today and had to go through a turnstyle. Come to find out I was on the wrong floor, but I did not realize that until someone came to help. My cell phone was at 5% and I had to end up calling the business development guy to get ahold of someone. Those Sales guys know everyone!

Where to Park
This may be obvious if you work at the building every day, but to the occasional visitor, this can be confusing, and potentially costly. Data centers in downtown areas really need some sort of updated information on a website outlining this. Facilities near sporting arenas are the worst. It seems you need a decoder wheel on when and where you can park before and after games. Some cities are better at marking this than others. I once had to walk 10 blocks lugging a switch because I could not park near the data center due to a football game.
What doors are open at what times
If your building has multiple entrances keep an updated list of the hours of each door. Nothing like unloading 10 switches onto a cart and finding out you are on the wrong side of the building. Can I bring carts and equipment only in a certain door or loading dock?
What to do when you enter the building

This is another obvious thing to those who use the building on a regular basis. Do you check in with the security desk first? Do you need to sign in if you have a badge? In order to help understand some of the issues, I will take you through getting to equipment at two data centers in different cities owned by different companies.
Dats Center 1
Upon entering I pass security but do not have to check in with them. I use my badge on the gates, but not the top sensor, the one on the angle facing me. Both accept badges, but my badge only works on one. After my elevator ride to the floor my equipment is on I badge in through a door. Next up is the man-trap (sorry people trap sounds dumb). I am presented with a reader with a pin pad, This pin pad is the same as many other of the data centers I visit. However, at this particular one I just swipe my card and do not enter a pin. The light on the reader flashes between red and green rapidly. I have to remember this is not an error message, but normal for this data center. I nervously open the door hoping I don’t set off any alarms. Once inside the man-trap I have another similar keypad to let me into the data center room. Swiping my badge results in the same rapidly flashing red and green and I am in the data center and can proceed to my rack.
Data Center 2
In order to gain access to the building I swipe my card at the exact same pin pad, I mentioned in the above Data Center. I have to remember at this data center I need a pin. I enter my pin and the reader beeps the light turns green. it’s important to note it does not flash like the previous pinpad at the other data center, even though it’s the same model. I can hear the door mechanism unlock. Once inside it is a similar procedure in order to get through each door until I get to my equipment. On the way out, I just swipe my badge and do not have to enter the pin. The pin is only for going through the door on the way in. The system knows you have been through that door and just is looking for your card swipe to let you out. At this data center, you do have to remember to swipe your card before you leave a door. The door will open without a card swipe but will set off alarms. This can be easy to do if you are distracted.
Other data centers make you check in with security and sign in before using your badge to go onward into the depths of the facility. Others, the building security personnel have little to do with the data center.
Also, train the building security folks to realize there are many of us who don’t visit the facility on a regular basis. I have had many a security guard ticked off about my questions on procedures.
Maps to my equipment
If each data center provided me a map of the floor where my equipment is I would probably love them forever, at least send them a Christmas card. If I could take that map and draw an “x marks the spot” type of map that would help me remember where my stuff is. If the data center provided this as part of a welcome package they might get upgraded to cookies for Christmas.
The things you don’t need to know until you need to know
Do your badges expire after so many days/weeks/months of inactivity? I have some facilities where my badge expires after 30 days of inactivity. Just about every visit to the facility involves getting my badge reactivated. It seems this procedure changes each time.
If I have equipment shipped is it only available to get out of storage at certain times?
Are there crash carts on site for keyboard, mouse, monitor? What is the procedure for getting access to those?
Go through what a customer would have to at 3am on a Tuesday night
If you are a data center, especially one that says customers have access to their equipment 24/7/365, go through some mental exercises as if you were a customer. It’s okay if you can’t do certain things during business hours only, knowing ahead of time can solve a ton of issues. Just some things a customer may go through
-It’s 3am and my server crashed. Is there a crash cart? How do I get access to that cart?
-Where do I put my cardboard and such when I am done? Do I have to carry it out with me or is there a specific place I can put it to be recycled/trashed?
-Is there wifi? If you want me to fill out a ticket to get stuff done and I have no internet due to being inside a structure that doesn’t help. Going out in the hall or even outside is not productive.
Knowing what the capabilities of the facility during different hours can really help. If a noc technician needs to escort me to the meet-me room and I can only do that during normal business hours that is something very handy to know. I am fine with that if I know ahead of time and can plan. Don’t say you are 24/7/365 and I can’t add a new device to my meet-me-room rack at 7am on a Sunday because no one is available.
e-mail lists are your friend
I have a few data centers which keep a pretty active list of what is going on at the facility. things like break room closures, parking restrictions due to construction, etc.
In other words, make it easy for your customer to follow your rules and procedures.
Tomas Kirnak has a great presentation on load balancing and Mikrotik. The PDF is available for direct download here: https://mum.mikrotik.com/presentations/US12/tomas.pdf
If you are looking for a way to bandwidth test from a PC through to a Mikrotik, here is your solution.
It supports RouterOS version 6.43 and newer. Advice from Mikrotik:
Please remember that Bandwidth Test uses a lot of resources. If you want to test real throughput of a router, you should run Bandwidth Test through the tested router not from or to it. To do this you need at least 3 devices connected in chain: the Bandwidth Test server, the router being tested and the Bandwidth Test client.
DNS Shotgun is capable of simulating real client behaviour by replaying captured traffic over selected protocol(s). The timing of original queries as well as their content is kept intact.
From Christian Koch from Foundations
I am excited to reveal that my quarterly interconnection update has
transformed into the Interconnection Quarterly, a hand-tailored,
independent briefing on the interconnection industry. Right now, my plans
are to publish the Interconnection Quarterly shortly after the last public
companies report earnings, as I’ve done with the previous updates. This
may change in the future, but for now, this is the plan.
In this inaugural issue, you’ll find the latest financial and business metrics
for select data center operators and interconnection platforms, as well as
insights into key developments and newsworthy events that occurred
within the fourth quarter of 2020.
We’re at an important juncture for interconnection, and while it still may
be seen by some as just a basic service that a data center or colocation
provider must offer, the truth is, that interconnection is much more
important.
From cross-connects to cloud networks, the constant here is in the
connection. How that connection is established and what you can do
with it is what’s changing as we adapt to a world powered by software in
the cloud.
Came across this gem in a newsletter.
https://github.com/kuchin/awesome-cto
A curated and opinionated list of resources for Chief Technology Officers and VP R&D, with the emphasis on startups and hyper-growth companies