Sunday, September 27, 2009


10 Rules That Every System Administrator Should Follow

This article was written by Ramesh Natarajan. At the The Geek Stuff blog he shares his knowledge and experience on Linux and other Geeky stuff. He has more than 15 years of experience in IT industry and has performed very intensive work on Linux system administration, DBA, Hardware and Storage.

The following 10 items are guidelines more than rules, that I have learned over the years doing intensive work on the IT infrastructure. These guidelines are mostly common sense and can be helpful for anybody who administers an IT system, including Linux/Windows Administrator, Network Administrator and DBA.

1. Keep it simple.
In technology environment, keeping things simple takes lot more effort and maturity than keeping it complex. As an administrator, when it comes to implementing a particular functionality or solving a problem, there are always several options available. It is best to learn all the available options, including the complex ones to understand how it works. However while implementing, try to keep it as simple as possible. The option you choose should be simple and have the following characteristics:
  • Easy to maintain in a long run
  • Does not add additional over head to the system
  • Solves the primary business/technical problem
Whenever you are in a dilemma of whether to choose a bleeding edge technology or proven technology that has been around for a while, always go with the proven technology for production implementation.

Everything should be made as simple as possible, but not simpler. - Albert Einstein

2. Backup regularly
Is both your personal laptop and servers at work, getting backed up regularly? If not, stop everything you are doing now and implement a backup solution on those systems immediately. Seriously! Start planning for your backup right now. Everybody knows that backing up data on a regular basis is critical. Only those who got burnt out on few occasions without having a backup, really understands the importance of having a reliable backup solution. Don't learn the importance of backup after loosing your critical data.

It is only a matter of time, when you'll be in a situation where a system crashed, data got deleted accidentally or laptop with critical data is lost. Spend quality time and implement a reliable backup solution for both your personal laptop and servers at work.

3. Test your backup regularly
I could've combined this as part of rule#2. But, I strongly believe testing the backup deserves special attention. I have seen on several occasions, where administrators thinks they have a valid backup, only to find out during disaster, they couldn't restore from the backup successfully. A backup solution without testing it on a ongoing basis is only as good as not having the backup. Just having faith in the backup that it will work is not good enough. You should have a process to test your production backup every month. You'll have a peaceful sleep at night just by implementing rule#2 and #3.

4. Proactive Monitoring
Are you always working in a fire fighting mode? Is your users calling you to indicate that a system is down or having problem? Experienced administrators knows that they should spend majority of their time implementing solutions to avoid problems, instead of fixing the problems after it happens. Make sure to implement a strong monitoring solution that will monitor and alert you about a problem before it happens. You should never be solving the same issue more than once. Following two points will help you to achieve the proactive monitoring.

Sit and identify all the equipments, services and applications that needs to be monitored through out the enterprise. Define an acceptable warning and critical levels for those systems. Define who should be notified and how often they should be notified and the method of notification. Once you have these identified, spend time implementing a monitoring system.

Despite proactive monitoring, there will be times when you'll be putting out a fire. Once you put off the fire, the first question you should ask yourself: How I could've avoided this issue from happening? Once you have the answer for that, make sure to implement an appropriate monitoring solution to prevent this particular incident from happening in the future.

5. Document Everything
You should document everything that you perform on the system. This is not a pleasant topic for administrators, as most of us hate to write documentation. An experienced administrators knows that documenting the environment and his work is key for his success and growth. I'm not talking about spending several hours creating a huge document with all fancy formatting.

Anytime you implement a solution or fix a problem, just scribble down the high level steps that was performed in a text file. You can simply copy/paste the commands you've executed along with one line description. This in itself is a huge step towards documentation for most administrators who are not used to documenting their work.

Following are some of the primary reasons for documenting every technical activity performed by administrators:
  • Don't learn the same topic twice. When you implement something new, you have spend enough time learning the technology and understanding the steps to implement it on your specific environment. During this process, write down all the steps and refer to those steps the next time you want to perform the same task on a different server.
  • There will be situations when you want to delegate tasks to others. For e.g. when you are going on vacation or when you want to delegate a particular routine task to a junior administrator who is eager to learn. If you had the practice of consistently documenting everything, you can simply pass those text file documentation to the other administrator.
  • Sharing your knowledge with others is one of the efficient ways to grow your knowledge. So, document everything and share with others.
  • Don't waste the valuable RAM space on your brain by remembering everything. Instead off-load some of the items from your brain's RAM to a simple text file and use your brain's RAM to explore new technology.
6. Plan and Execute it well.
When you are implementing a solution, have a clear plan on what you will do next and when. You should be Project Manager for your own tasks and projects. I.e Analyze all the potential risks involved in implementing a solution. Make sure to give sufficient time to test a particular solution. Come up with a clear test plan and get your users involved in testing process. On your next assignment, try the following and see the benefits for yourself. This forces you to think about all the possible scenarios even before you start the project.
  • Write down the objective of your project. I.e What is the problem you are trying to solve. What is your success criteria on this project/task?
  • List down all the tasks required to complete this particular activity and assign appropriate dates for it.
  • Even when nobody is requesting you to complete a project by certain date, hold yourself responsible by putting a completion date for your project/task.
When you really get this implemented on the projected date, give yourself a pat on the back and enjoy your accomplishment. Planning and executing projects well on a consistent basis could potentially become one of a huge motivation factor for administrators to start taking up bigger and complex technology projects.

7. Use Command Line more than GUI
Use the command line as much as possible. Whether you are configuring a VLAN on a switch or setting up LDAP/NIS authentication on a Linux server, always use the command line instead of GUI. Following are the advantages of using command line.
  • You can do things very quickly on command line.
  • GUI prevents you from understanding and learning the functionality happening behind the scenes.
  • Repetitive things can be automated easily using command line.
  • Your brain will have fun and Thank you for it. 
8. Automate repetitive tasks
If you perform a task more than once, you should find a way to automate it. It may be very tempting to do the repetitive tasks manually, as can complete the task quickly and know the exact steps to perform the task. But, avoid this temptation and spend some extra effort in automating the task, which will free-up your mind from thinking about that routine tasks. Once you've automated the tasks, you can use your time effectively in learning other new fun stuff.

9. Support your users and developers
Administrators are technically very sophisticated and sometimes get frustrated with end-users who don't understand technology. But, keep in mind that you have your job mainly because they don't understand technology and need your expertise. When user reports an issue that is totally not related to the system and mainly because of user-error, be nice to the person and explain in a non-technical term about why this is not a system issue.

Sometimes developers may deploy something on the server causing some undesirable results. Don't get mad on them and blame the problem on the developer. Instead, help the developer to identify the root cause of the problem, by providing sufficient data from the system to narrow down the problem.

10. Keep learning and have fun.
If you have mastered the skill on how to do all the above 9 items effectively, you'll have more free time on your hand. Keep learning all the times. Anytime someone reports an issue, be curious and treat it as an opportunity to learn something new. Once in a while step aside your computer and spend quality time with your family. On top of all, have fun and enjoy doing the system administration activities.
Live as if your were to die tomorrow. Learn as if you were to live forever. --Mahatma Gandhi


About bench3 -

Haja Peer Mohamed H, Software Engineer by profession, Author, Founder and CEO of "bench3" you can connect with me on Twitter , Facebook and also onGoogle+

Subscribe to this Blog via Email :