Nagios

Rob has gotten allot done with Nagios. Hopefully by the end of the week he should have basic system monitoring (up and down) plus disk space checking on all gear at our data center. Once that is done we can roll out more locations as well as add functionality. He is doing great with it so far. I know Danny wanted to work on the project but he is busy with other issues. I also know Danny would have done a great job on the project, but when he works on stuff like this it seems like it takes him forever to complete stuff. Rob is able to churn out product allot faster. For a time sensitive project like this I am happy with the quick response.

Hopefully this program works out better than our old whatsup system. I can’t wait till the end of the week.

Technorati Tags: , ,

Performance Issues Resolved

After like 4 passes at the configuration of our network (switches, firewall’s, load balancers, etc) Danny found an abonormality that we wanted to correct and see what happened. On our core switch side we had the port where our Pix went set to full duplex 100meg, but on our Pix (configured years earlier than our core) it was set to auto. Turns out the Pix does not auto sense the 100meg full, but does not error out in the situation. You don’t even get lost packets, but you do get some collisions. Well some is an understatement.

Later in the day we set the port on the pix to be full duplex 100meg and within a few hours our metrics back to normal. This little change took us weeks to find. This is not the first time I have been burned by a port mismatch. Knowing that we even took steps to prevent this, or we thought we did.

It is frustrating to find such a little issue that does not show with errors causes so much problems..

Rough Days At Work

I have had a few rough weeks at work. Hopefully a fix that went into place today will make things better. I know I am getting worn down, and I think others around me are also. We just don’t talk about it. Not much I can really talk about on this blog. I have written more extensively on my Work blog (password protected for my protection).

Because of the extra work, not much else to report on the social front.

Site Issues & Network Disasters

I took yesterday off to catch up on some sleep. I ended up by the office in the late afternoon to go with Jayson backpack shopping. Our website performance (or alleged) issues continued on. All of our research boils down to we don’t think there is anything wrong, or shall I say anything new wrong. We know our application needs to be improved. That is why the development team has spent over a year designing and building a next generation application. I think certain business people are missing there numbers and blaming the issue on a site problem. They got one bad metric and they stomp all over it. It has been a long week.

On another note I was just sitting down to figure out what I wanted to eat for dinner last night when I got a call from Jayson that an internal website was offline and someone called about it. Turns out several several servers where down. They where all plugged into the same switch module. I ended up having to go to our data center and meet Jayson to fix the issue. It was as simple as re-seating the module and it started working again. To be safe we moved all critical systems off that module onto another switch module. I didn’t get home until after 1AM today. So for a day off, I worked almost a full day’s work. Nice!!!

Bandwith

I meet with our Voice/Data provider today with our CFO to discuss renewing our contract with them. One account rep and his boss flew out from down south, and our local rep showed up also. They offered us a nice discount off of what we are currently paying since our volume is up, but our CFO thinks we can get a better deal. Let the games begin. This is the part of my job I don’t like. It is necessary but I don’t have to like it. Other topics we discussed where open issues we having as well as some new network options we have been thinking about. it was productive but long discussion.

Other issues that came up today was dealing with the ever growing space problem. We had to plan on moving 2 people to accommodate a new hire starting next week.

Electrical Work

Today was busy with electrical work upgrades in our Computer room. We needed to add several circuits to support filling up the remaining rack space we have. Now that the holidays are over we were able to get the work done. Next week will have a flurry of activity to clean up the room and mount and power the gear that has been waiting for a home.

Wiring Day

Today Jayson and I spent most of the day rewiring a cabinet in the office. Try as we might these cabinets get super messy with cables everywhere after a while. We rewired this same cabinet over a year ago but it looked horrible again. The issue is that we keep putting in and taking out different kinds of gear. The cables get really messy.

To keep things cleaner we moved the PDU’s, rewired all the electrical, moved some servers to different racks, and took out the thick analog KVM cables and replaced them with IP KVM’s that run over regular CAD5. The result is a much cleaner rack. That and everything is neat and tied down helps. We next need to get the electrical guys to come in and add some more circuits.

We still have to clean up the mess of cables and old servers we pulled from the cabinet, but we can do that during the week.

The work was messy, and we came in on a Saturday but the results where worth it!

Wednesday At Work

Wednesday was not that eventful (other than my Best Buy excursion). I got some task work done. Updated people on what to do. Ran to the colo for 2 hours, and that is about it. Nothing major to report. It was almost quiet (ALMOST, I don’t want to have the powers that be reign down problems because I said it was quiet) with Danny upstate, Kai out, and Jayson at the data center.

A Rare Trip

Today I made a trip with Jayson up to the data center. I don’t get up there that much. I usually let others go, but every so often I have a reason to (or just want to) get up there. Today was both. I had some specific tasks I needed to do, and I also wanted to see how the additions we have made look. Things look awesome! Jayson has done a fantastic job of making that place nice. Most everything is so neat.

I brought up some gear, and also did some inventories. We spent a few hours doing that. Then I spent the rest of the afternoon working on firewall rules, and distributed file system security permissions. I had a stupid problem with a cross domain security issue with a file share. I have yet to resolve it, but I found a work around that will be fine for the next few months.

Jay and I decided that we are going to try leaving the overhead lights off in the office. We think it is easier to read the monitors without the lights. I installed a small desk light for some background light. We shall try it out for a few days and see how it goes. I like that we decorated and make the office feel lived in. My office has more posters on the wall than my apartment does, but that is not something I should be proud of!

Word of warning to myself. Do not eat ice cream after a big lunch. I have been feeling the pain ever since I got Hagen Das. It was good, but after it was bad! More later.

HL DL 320 SATA Servers

I have been working with a few HP Prolient DL 320 SATA servers. The price was right on them and they have decent specs. The issue I had with the last round of Supermicro SATA box’s weren’t the supermicro box’s them selves, but the SATA RAID cards that went into them. They would fail much more than their SCSI counterparts. Also the array controllers would not rebuild without crashing computers. We tried several brands. We are using 3ware for new deployments of older chassis SATA servers. They seem the best out of all I have seen. They were a crap shoot. These HP box’s seem to rebuild fine in our tests. Time will tell if they drives hold up, but so far I think HP finally got it right with a low end non SCSI RAID system.