Latency and packet loss metrics- the unsung heroes of network reporting?Monitoring the modern network environment can be an arduous task, to say the least. With the amount of information that traverses the average network in a single day, not to mention the variety of data types that come with today’s media-rich internet, managing capacity and uptime is more important than ever. It’s easy to find yourself up to your neck in network reporting data and pointless metrics that don’t give you a clearer idea of how your network is performing, especially when you’re being inundated by notifications from your Network Management Software (NMS). Even though your NMS is capable of providing very detailed information down to the deepest layers of your network, there’s usually a much easier way to determine the health of your environment at a glance.

In this article, we’ll discuss two of network reporting’s unsung heroes – namely, latency and packet loss metrics. But before you scoff at the thought of using such elementary metrics for serious network management, let’s take a look at what makes latency and packet loss two of the most important metrics at your disposal.

No matter how broad your bandwidth, latency is key to maintaining speed and service levels

Nearly twenty years ago, Stuart Cheshire wrote a paper on latency as a critical component in ensuring acceptable levels of network service. Even though some aspects of his analysis have dated, the specifics of the network environments he uses as a reference are more or less superficial – he opens the article as follows: “ Years ago David Cheriton at Stanford taught me something that seemed very obvious at the time — that if you have a network link with low bandwidth then it’s an easy matter of putting several in parallel to make a combined link with higher bandwidth, but if you have a network link with bad latency then no amount of money can turn any number of them into a link with good latency.” Even though networks today can achieve transfer speeds around 100 times faster than they could when Cheshire wrote his paper, his fundamental point is still very pertinent to the state of networking in 2015. Many network managers deal with network service issues the same way they always have: by throwing more bandwidth at the problem. Essentially, all that’s changed in the past twenty years is that bandwidth has become cheaper to source.  But because the amount of data we use on a daily basis is only increasing, we’re constantly playing catch-up with the amount of bandwidth needed versus the amount we’re able provide. In other words, insufficient bandwidth is rarely the cause for poor network service in your environment and believing that it is leads to building a performance-based environment on an unsolid foundation.

Latency is the best indicator of the overall health of your network environment

When something goes wrong on your network, what is the first step you take? Chances are that you’ll ping a server – usually Google – to assess the extent of the error. Depending on the result, you may take a number of actions: if your ping is lower than about 30ms, you might ignore it altogether; if it’s noticeably higher than usual you’ll probably investigate a bit further. It’s also important to be cognisant of the fact that high latency doesn’t happen without reason. It often happens that, after being alerted to high latency on the network, an engineer investigates to find nothing out of the ordinary – and crucially, many engineers assume that this means their Network Reporting software has developed some kind of psychosis. Then, some kind of major network error will typically arise within the next few weeks and the source of the increased latency will become clear. Simply put, if your latency is taking a hit without an identifiable cause, you haven’t looked in the right places yet. So, using latency analyses as your network’s “pulse-checker” means you’re able to effectively pick up on any irregularities before they evolve into full-blown network performance issues.

Latency, TCP windowing and packet loss: the not-so-holy trinity

High latency doesn’t only cause inconvenience in the form of stuttering on Voice over IP (VoIP) calls and long buffering times for streaming media – high latency can wreak havoc on all your links if your ping is high enough to interfere with Transmission Control Protocol (TCP) transfers. Simply put, TCP is a protocol that ensures data packets sent between user and server arrive in the correct order and as a complete whole. But because TCP involves the server and user agreeing on an acceptable window of time in which data packets may arrive instead of individually acknowledging each packet, high latency results in more idle time spent by the user and ultimately impedes transfer times. 

Be aware of the factors that cause latency and packet loss

Even though latency can provide a good indication of the overall health of your network environment, it won’t tell you in and of itself what the cause of the error is – that’s where network reporting software and your engineers come in. High latency or packet loss could be caused by any number of factors – insufficient processing power on your routers or other network devices, congestion on your links, interface errors, badly-implemented Quality of Service (QoS) classes, or errors on the physical medium like faulty cables or bad quality copper lines. Although using the best network management software and configuring network reporting protocols correctly should keep you in the loop with any errors that arise on your network, monitoring your latency and packet loss diligently only makes your network more robust, stable and reliable.

IRIS has over twenty years’ experience in the Information and Communications Technology industries. To find out more about our services, please don’t hesitate to contact us or download a free copy of our Network Manager’s Guide to a Highly Available Network.

Image credit: http://www.2toro.allright.info