One of the last messages on this blog was that Sea-Munchkin was merging with Impertinent. That was the case for a while, as I was too busy to maintain two websites. Too busy to blog on one. That’s still the case, but events are transpiring to make it necessary to split and blog again.
The first thing going on is that we are planning another run in the oceans on the Sea Munchkin, and this time are determined to recruit a few more crew to help out along the way and enjoy the trip. The other reason is that with the sea-munchkin site being mostly about the boat and those who sail on her, this site is becoming a place for the thoughts and rants associated with the everyday and the unusual. Having a new baby in the house makes for dull life, but that dullness does not include thoughts or the future – and so I suspect you will very soon have much more to read about.
One topic for today is an emerging need to do best practices reviews of Linux based environments.
If you do a search on best practices for Linux, you will likely find a hundred links to security hardening. After that, a handfull of hardware guides from various vendors, and the odd commercial software guide for the likes of Oracle and SAP et all. There just isnt a lot of guidelines for you to evaluate if the sysadmins are smoking crack, or wasting computer cycles on bad config. RedHat has a decent offering which includes a checklist, but of course thats part of their secret sauce, and not google-able.
My thoughts on how to evaluate systems falls into three categories – Availability, Manageability, and Performance. This sounds a lot like RAS (Reliability, Availability, and Scalability) and it is similar – but I think there is more to performance than just scalability, and I think Reliability can be folded nicely into availability.
Reliability – Can the system keep working at its task? What will stop it and how can we handle that as a risk?
Manageability – Can the system communicate its status completely to the operators, and when change needs to happen can the operators enact those changes in an auditable, easy, and controlled fashion?
Performance – Is the system performing at its best? Will it continue to? Are things efficient or is a crappy application code being buried by tons of CPU and IO performance, hiding an efficiency hog until things get hectic?
In part II, I will look at these and how to evaluate them against a typical Linux installation running a database and one or more layered applications.