Management networks and the Internet Service Provider (ISP)

Most ISPs don’t start with a management network. Instead, they add it later as the network expands. They add a router, an OLT, some core switches, and a few servers. Everything runs smoothly—until it suddenly doesn’t. Then someone asks a tough question, usually during an outage: If the main network goes down, how do we log in to anything?

What a Management Network Is

A management network is separate from the access (Customer) network, used only for managing infrastructure. It handles SSH, HTTPS, SNMP, API calls, syslog, TACACS, RADIUS, and monitoring traffic. It never carries customer data.

In an ISP, that includes your core and edge routers, aggregation switches, OLTs, CMTS, DNS and DHCP servers, monitoring boxes, and even power systems. If it has an IP and you care about it during an outage, it belongs on the management plane.

The rule is simple. Your ability to manage the network cannot depend on the network that is currently broken. If your management traffic rides the same forwarding path as your customers, you built a circular dependency. That design will fail at the worst time.

In-Band vs Out-of-Band

In-band management means your management traffic runs in a VLAN or VRF on the main network. This is common, but it’s also risky. If the core fails or a bad BGP change breaks connectivity, you lose both visibility and control.

Out-of-band management uses separate physical interfaces, its own switch fabric, its own IP space, and its own routing. It doesn’t rely on transit, customer VLANs, or the global routing table. If you use MPLS, you can put it in a dedicated L3VPN. If not, keep it simple and isolated. Out-of-band can utilize LTE routers, VPNs, or even Starlink.

Addressing and Routing

Use RFC1918 addresses (aka 10.x.x.x, or 172.x) for management addresses.  Keep your addressing logical.  There are a bunch of tricks you can do.. Use loopbacks for device management, and /31 for point-to-point links to stay efficient.

For small networks, static routing can work well. In bigger networks, OSPF can help you scale.  If you run a separate management VRF, you are further isolated.

Security Model

At a minimum, you should do the following:

• Restrict access to a defined jump box or VPN with MFA
• Bind SSH, API, and web services to the management VRF or interface only
• Enforce TACACS or RADIUS with  logging

If someone gains access to your management network, they can control your routers.

Make your NOC staff happy

A proper management network changes so many things. Monitoring stays up during a transit outage. You can still reach a router out-of-band. Firmware upgrades do not compete with customer traffic. Automation tools talk to a stable address space that does not change when you redesign a VLAN.

It also helps with discipline. You decide what counts as infrastructure, document your addressing, and consider failure domains. This maturity helps you grow.

Common Failure Patterns

I see some of the following happen

• Reusing customer VLANs for management or exposing the management network to the public Internet
• Forgetting that console servers, PDUs, and access gear also need isolation

j2networks family of sites
https://j2sw.com
https://startawisp.info
https://indycolo.net
#packetsdownrange #routethelight

Discover more from Justin Wilson (j2sw)

Subscribe to get the latest posts sent to your email.