The third focus of this Essentials Series is on the need for effective configuration management, a common feature across many Network Management Solutions (NMSs) but one that sometimes gets missed. In this instance, what do I mean by configuration management? I mean the unified storage and uniform distribution of configurations to each of the devices on your network.
There is a certain brilliance in the way that most network devices can and are configured. Using little more than text files, a smart administrator can set up their interfaces, ACLs, and essentially every other setting within these devices. Their use of text files means that one device's configuration can very easily be replicated on another device through a file copy. Their editing is also trivial, accomplished with a simple text editor or SSH application. As an example, the following code snippet shows the simplicity of a Cisco device's initial configuration:
no service password‐encryption ! hostname Router ! enable secret 5 $2m$FJdHx53V$t7rQJop3jjbXIB7n3 ! interface FastEthernet0/0 ip address 192.168.1.1 255.255.255.0 duplex auto speed auto ! interface FastEthernet0/1 no ip address duplex auto speed auto shutdown ! interface Vlan1 no ip address shutdown
Yet there's a certain level of pain that comes with this simplicity. That pain grows as the number of devices and their individual configurations increases in number. Managing the configuration of just a few devices means that you're responsible for just a few text files and their individual settings. But as your network grows in size and complexity, your number of elements under management grows geometrically. At some point, no one person can safely handle the sheer volume of text files and their settings that are required by a production network.
It is in just this situation where the configuration management elements of an effective NMS grow extremely valuable to the IT organization. An effective NMS will include the database storage of configurations, versioning and version control of individual config files, analysis tools for comparing those files, and the ability to rapidly deploy changes to devices all across the network. In much the same way that most people program their favorite phone numbers into their cellular phones, managing a network through an NMS ensures you don't accidentally call the wrong person, forget a phone number, or misconfigure a device in such a way that brings down the LAN.
This workflow wraps around the traditional actions associated with changing a device config and adds a lot of value to the process. Consider a situation I experienced a number of years ago in the network of a major governmental defense contractor. There, a network condition began occurring where some servers intermittently lost their connection with the network. When those servers could talk to the network, their connection speeds were dramatically lower than expected. Network bandwidth rates were so slow that network applications began to suffer, users began calling into the Help desk, and fellow administrators started contacting loved ones to report they'd be spending the night.
In this situation, the entire staff of systems administrators was tasked with resolving the problem. As the problem affected a large percentage of servers on the network, every eye was needed on the problem.
After a full day of troubleshooting by the entire staff, the problem was eventually tracked to an incorrect configuration on a particular switch in the data center. That configuration mismatched the duplex settings between the switch and its connected servers, with one side inexplicably reset to 100/Half duplex with the other at Auto/Auto. As a result, the two sides found themselves repeatedly renegotiating their communication channel, with the resulting loss in service and performance.
In the end, a half‐dozen systems administrators lost a full day of productive work as a result of a very simple misconfiguration. This misconfiguration was set into place by a wellmeaning network engineer, who manually made a small change to a config file and accidentally introduced the error. Because the engineer completed the change using a traditional SSH connection directly to the device, the change wasn't logged into any change management system. No one knew about the change, and so no one was looking in that location for the problem. Conversely, had the engineer made the change using an NMS' change control engine, the error would have been found before it was released into production.
Another story that is relatively common with network engineers involves an enterprise client of mine and their massively distributed network. This client was a single business unit of a much larger corporate network, responsible for the network traffic for many thousands of people across dozens of sites. As you can imagine, the level of networking equipment required to support the infrastructure was large and exceedingly complex.
This client and I were working on a widespread network slowdown situation. This situation was not necessarily that the network had gotten slow, or for some reason stopped operating at its expected level. In this environment, the network was slow, had been slow, and its users had grown to accept its slowness as baseline. The network engineer and I recognized that its baseline performance simply did not make sense based on the kinds of equipment in the infrastructure and the bandwidth rates between sites. In this environment, even the intra‐LAN traffic itself was slow beyond comprehension.
After a substantial amount of time peering through reports and looking through device statistics, we realized that a small but important misconfiguration had been propagated into the config files of each and every device on the network and across every site. The specific misconfiguration is less important than the realization that the scope of the fix was far greater than our group of individuals could take on. With literally thousands of devices spanning dozens of sites, the steps needed to locate each device, log in, make and confirm the change, and move on to the next device was anticipated to take between 5 and 10 minutes per device. Multiplying that number across each device meant that the solution could take literally months of constant manual effort to resolve.
Adding to the complexity of the resolution was the nature of the fix itself. Due to the specific change required, a rapid fix was necessary to preserve network connections between sites. Although the fix was trivial, the network engineers were baffled as to how to implement it.
The solution arrived with the implementation of an NMS not unlike those discussed in this Essentials Series. By adding the NMS to the environment and instructing it to automatically discover and map the network infrastructure (see Figure 1), the organization was able to very quickly bring each individual device under centralized management. Using the NMS solution's bulk change feature enabled the team to quickly implement and distribute the change across the infrastructure. The result was a massive improvement in performance across the business unit, and a promotion for the engineer.
Figure 1: An NMS' automated discovery and mapping features can quickly bring a large network infrastructure under management.
As stated earlier, the goal of this Essentials Series has been to illustrate why effective monitoring and management is necessary for a healthy network. That need is the case irrespective of the size of your network. Whether you're a small business with a few devices or a large enterprise with many thousands, not having this vision prevents you from actually understanding what's going on inside your network.
As this article has shown, not having these tools also inhibits you from cohesively managing the configuration of your network devices when you need them the most. When looking for an NMS, look for one that is scoped to the needs of your environment, with the right features and integrations you require for a complete situational awareness across the IT landscape.