There comes a point when your network has grown to the extent that uptime monitoring and reachability checks simply cannot be performed by humans alone. In large networks there are lots to keep track of: applications, bandwidth usage, security monitoring and evaluation, service providers, user behaviour, to name but a few. The aim of reachability and uptime monitoring is to guarantee that the network and its applications are available when users need them, and that corrective action is taken as quickly as possible when problems occur.
Networks can be as fickle as they are complex. It takes the smallest human or computing error to throw things into disarray and disturb business continuity. For some businesses, tolerance for downtime is simply doesn’t exist. Look at banks for example. With transactions occurring every second of the day, 10 minutes of system’s downtime is an eternity for financial institutions and can result in losses totalling millions.
The ability to see into your environment is a crucial function for network managers that facilitate the building of stable and predictable networks. The following are key things to keep in mind around uptime monitoring.
Don’t rely on pings
Just because a device responds to a ping test, doesn’t mean the application or service associated with it is up and running. Pinging a server or other device may indicate that it is up, but that only tells half the story. It is equally important to monitor the service or application running on any given device to ensure that the business can utilize the resource fully. What you need is a monitoring tool that performs holistic checks on both hardware uptime and the availability of the associated service. Take an email server as an example, once you’ve confirmed that the actual server is connected to the network, you need to ensure that all mail services for in and outgoing email and other functions, such as address book, contacts, resource scheduling etc. are also all running. This constitutes uptime monitoring in a holistic sense.
Use alerting smartly
Your uptime monitoring solution should actively monitor events on your network and leave a traceable path to investigate any areas of concern. It should also function as an alerting system that brings events and errors to the right people’s attention timeously. Some people overlook the importance of proactive alerting after implementing an uptime monitoring regiment. This is a big oversight. Your alerting is crucial to kicking off the problem escalation process and bring network problems to the attention of the relevant people or sub-divisions within IT operations.
Cloud and Virtualisation challenges
Virtualised environments bring with them a unique set of challenges with intrinsic layers of abstraction that can complicate the troubleshooting process. Monitoring virtual environments differ from physical systems due to the virtualisation of operating systems and the way those environments allocate resources. You have to monitor shared pools of CPU, memory, network, storage and other resources in a way that is quite different from physical hardware. Contention for shared resources in virtualised systems adds a layer of complexity to determine the source of application and performance issues. Your uptime monitoring solution should include the tools needed to monitor virtualised environments according to their unique characteristics.
Get the complete insight with IRIS’ uptime monitoring solutions
IRIS is a robust, lightweight network monitoring system that measures network performance aimed at large environments including ISPs, virtualised and cloud infrastructure. Capable of monitoring and alerting on performance thresholds on almost any network attached device, IRIS is the inside scoop on what is happening on complex environments for IT Operations Managers. Its ability to measure almost any metric on the network simplifies troubleshooting, problem analysis, and design of complex network topologies.
Image Credit: PixaBay