Network outages are disruptive and expensive. An IvyTalk Connect Application Note from 2012 presents the following eye-opening statistics about outages:
- IEEE says that 49% of network outages are caused by human error.
- Gartner reports the typical corporation has 87 hours of downtime each year, at a cost of $42,000 per hour.
- IT Process Institute says that the average resolution time for a network outage is around 200 minutes, and that 80% of unplanned outages are caused by changes made by network administrators or developers.
- The Enterprise Management Association reports that 60% of negative impacts on network availability and performance are due to misconfigurations.
However you look at it, human error has a major effect on information system security, and while organizations do what they can to minimize these problems, the fact is, wherever there are humans, there will be human error. Organizations must be able to react swiftly after these incidents to minimize downtime and the risk to reputation that could result.
Prevention is Ideal
Obviously, the least costly error is the one that isn’t made in the first place. With system errors, network security monitoring platforms and software profile utilities can help identify the source of errors so they can be addressed. But identifying human error can be painfully difficult. Summary journals that log what each system user does, and searchable activity logs with information like which applications were run and which text was edited, can help chase down a human error, but the process can still be harrowing, particularly with the knowledge that the organization is losing money with every additional minute of downtime. Careful hiring and training can prevent some human error, but no preventative solution is perfect.
An Incident Response Protocol is Essential
Having an incident response plan is critical. When a major error is followed by downtime, a plan prevents chaos and keeps everyone focused on solving the problem. Here is a brief summary of what your incident response protocol should include.
- A defined incident response team that includes IT and security personnel as well as someone from PR and the legal department. They will be briefed on the incident, their involvement, and what is expected from them.
- Hard copies of phone numbers for all team members. If the network is down, email won’t work, and people must be able to communicate by phone.
- A protocol for data capture and note taking. Data logs, disk images, and handwritten notes may all have a role in documenting the incident.
- A plan for reporting criminal activity, should the downtime be due to intentional damage by a disgruntled employee, for example.
Getting the System Running Again
Every IT worker has wished there were a “Ctrl-Z” for life, which would allow a person to go back in time to before those disastrous keystrokes were made. A serious human error could result in the necessity of reconfiguring dozens or hundreds of servers, load balancers, or firewalls, a process that is complex and time-consuming. But when the configuration of every piece of hardware in your system is regularly and automatically backed up, you have the closest approximation to going back in time, and it won’t involve manually reconfiguring everything.
In most organizations, configurations for routers, firewalls, load balancers, content filters, and custom devices are rarely or never backed up. However, when an organization uses BackBox, all security and network device backups are done through a single application dashboard. By being able to view the entire infrastructure at once, administrators eliminate the need to track each individual device or management system. Moreover, BackBox is fully automated, so backups are done on schedule. No agents are required, and no network configuration changes are necessary for BackBox to perform full configuration backups.
Imagine if next time a system administrator makes an error, you could reconfigure everything through a single interface, going “back in time” to a configuration that works, and thus minimizing downtime. BackBox can’t prevent human error altogether, but it can make errors much easier to recover from.
BackBox can be run entirely from the cloud, or it can be installed at customer sites for local backup while still being managed from a single management server installed in the cloud. Configuration information can be stored on remote BackBox agents as well as the central BackBox management server, for redundancy that obviates the need for access to all internal devices.
Human error is the main cause of network downtime, and your company’s network security must be able to recover quickly from errors to minimize customer backlash and revenue loss. Computer security involves much more than just data backup and access controls. It should also include automated configuration backup, so that recovery from errors – human or otherwise – is as swift and effective as possible.
Photo Credits: David Castillo Dominici / freedigitalphotos.net, photostock / freedigitalphotos.net
The post The Main Cause of Network Outages and How to Prevent It appeared first on BACKBOX BLOG.