I’ve spent over a decade managing various production systems. After spending so much time with systems that are mostly not working as they should, one develops certain outlook on life. Like the deep belief that the only way to keep civilization functioning is by taking backups and testing them.
Recently I had few discussions with developers, and it turned out that ideas that I consider trivial can be viewed as deeply insightful by someone with different experience. Call that ideas arbitrage – goods that are common in one environment are valuable in another. Good hummus costs around 2$ in Tel-Aviv, while in Palo Alto I’ll be happy to pay many times that price if anyone would sell it!
I’m tired of spending time chasing pseudo-bugs that are actually problems in configuration. Things keep changing in QA, staging and production servers and I have to keep figuring out why my code no longer works
Figure out what your program needs in order to run properly. Memory, permissions, settings in files, etc. Then write a script that verifies all that.
This has 3 benefits:
- Sysadmins can check that the environment is correct before installing the software and before calling you for help.
- If something goes wrong, you can ask the sysadmin to run your script and send you the results – assisting in faster resolution.
- If the script doesn’t catch an issue and you have to spend hours debugging a broken configuration, you can later modify the script to catch the new issue and you’ll never have to spend those hours again.
I’m designing a new search system. Should I use Oracle or Voldermort?
Since you are asking me for advice, my guess is that neither of these solutions have a single compelling feature or limitation that make the decision clear-cut.
Therefore, go with the technology you understand better. Imagine yourself, a year in the future and the system just returned a wrong result. Which system you’ll find easier to solve the problem? Obviously the one you understand better, and the one that has troubleshooting tools you are more comfortable with.
Go with that one.
My users complain about a performance problem. My system is memory bounded and I believe that adding more memory will solve the issue. I opened a ticket for our sysadmins to add memory, but they have been ignoring my ticket and nothing was done. Now I’m blamed for not solving the performance issue!
Here are several possible solutions, ordered from recommended to highly risky:
- Update the ticket and ask for ETA.
- Use whatever internal process you have to escallate the ticket or make it higher priority.
- Make friends with a sysadmin and ask your friend to check the ticket and help you.
- Complain to your manager
- Complain to the manager of the sysadmin
- Go to the sysadmin cube and ask him nicely when the ticket will be ready
- Go to the sysadmin cube and yell
- Ask for a meeting or conference call involving more than 2 managers.