MRTG never lies

As you may know or guessed by these posts I actually work in the technology field. More specifically I deal with network & server operations for my company. Because we do that we monitor lots and lots of things on our systems. Anywhere from network traffic to temperature of equipment, to performance counters. This gives us an idea of what is going on when we are not watching the systems. Some of this is manual (stuff gets logged but to understand what is going on we need to look at it) and some things are automated (something is not working we get paged about it). That being said late this morning I took a look at our network monitoring pages for the weekend. I was surprised to see huge utilization between two of our sites. More specifically a database server in one site is sending data to two servers in another site. We use MRTG to watch bandwidth. It is a free linux tool. It is great. I for one am just getting into linux, but stuff like MRTG gets me wanting to learn more. So I bring the problem to everyone’s attention. Now here is the problem, half the people dealing with our database servers are saying nothing is going on at the time we see a problem. The other half agrees something is wrong but not sure what. Turns out, oh yeah we run some stuff late at night and an anomaly happened to cause it to send out 600 meg files and not 20 meg files, or something like that. So some of these same fun guys are saying that nothing is going on, but our T-1 pipe is flooded for 4 hours with traffic because some servers are trying to transfer well over a gig of data.

The moral of this story is that tools like MRTG don’t lie. We knew (well, Gus, Keith and I knew) from the first second we saw this what was causing the problem. We just didn’t know why it was happening. If it wasn’t for some sherlockian tactics from the man “Gus” we would be shooting in the dark. Never doubt the tools you use to figure out problems. When you doubt their ability it is time to get new tools, or test them. MRTG has not failed us yet, but we still have skeptics claiming “everything is ok”. Wake up and smell the coffee. Or wake up and drink some coffee to stay awake.

This whole problem boils down to a sherlock holmes type mystery. That is how I think of it. You have a problem, and you have to logically deduce what it is, and why it happens. Don’t ignore the obvious. Coming up with a complicated explanation as to why something is going on is stupid when a simple one fits the same criteria. There is a term for that, that I cannot spell so I won’t mention it here. I sometimes think that is all I really do when it comes to fixing problems. I try to use deductive reasoning. Probably why I like sherlock holmes so much.

This blog may seem like a rant, but it really is not. it is to say pay attention to the obvious, and think things through. Some people forget to do that, or just never think.

In other work news, we are still working to get a deal for voice service in our new facility. Dealing with telecom people is a nightmare. I thought dealing with routers was hard. And I remember when I thought programing a switch was a daunting task. Boy was I naive.

I spent the entire day dealing with telecom quotes or the above mentioned bandwidth issues. What is crazy is I know we have tons more in monitoring to deploy and watch. Right now we are finding tons of stuff. I wonder what we are letting slip through the cracks? Maybe nothing, but you never know.

I am left work at like 7pm tonight so I won’t get home till like 8pm. I need to make dinner, and relax a bit. today was very stressful day.

Leave a Reply