No, not that kind of trigger this time around… :)
Anyone can fall into this scenario. Everyone is carefully taking time to test before applying a couple patches to address an async I/O issue in Oracle to ensure all testing environments are exact, all patching is approved by everyone that would like to review it. No one is willing to pull this trigger and apply the patches until they are absolutely sure they have every SQL statement, every network tweak, every application scenario looked over and over again.
Suddenly, the database reaches what every DBA dreads- that threshold where standard processing is now in the range of the I/O bug, not once a month but multiple times a week due to one or two queries that are pivotal to production processing. The good intentions, hoping to avoid a production outage has now inadvertently caused them. The customer is frantic, (as expected) and you, as the DBA, are the one quickly expected to assess the situation and attempt to come up with a battle plan.
A competent DBA will most likely take the customer in hand and do the following:
1. Lighten every query/process you can, especially the two most impacting.
2. Discuss the importance of getting those patches in ASAP with the customer.
3. Explain the significance of the threshold and how they can identify it at the server level.
4. Enroll development’s help with the SQL that can impact and surpass the threshold.
5. Follow through with the plans and complete the work and ensure there is a “lessons learned” type of meeting to ensure you can avoid this type of emergency next time.
There is always a fine line between taking the safe course to applying patches/addressing issues and being too cautious and not pulling the trigger even when you’ve covered all your bases. Understanding the difference between a researched and calculated risk vs. a foolish risk is very important as a DBA.
There will always be risk whenever you work in a database. The database is connected to everything in an IT environment- the network, the application, the disk array, the CPU, the memory, you name it, our databases our connected to it.
What surprises many is that some of the most neglected and poor performing databases I’ve encountered have rarely been those that have had “risk taker” DBA’s managing them, but instead were managed by DBA’s that were overly cautious. These DBA’s were so terrified of any negative impact that they did not act at all and this resulted in some environments that could barely stay up and perform for a full day at a time.
Knowing when to “pull the trigger” and give companies technical direction is almost as important as knowing how to backup and recover a database. The latter skill can be taught and the former can come with experience and confidence, but for some, it simply is a mindset that must come with the DBA before any technical skills are acquired.
To have the ability to take the customer by the hand, offer them the confidence that you will do your best to address their issues and that you sincerely are interested in the well being of their systems and follow through with this is an incredible skill to possess. This skill will repeatedly bring larger success to this type of DBA than their more technically skilled counterpart who does not, (and how wonderful if you have one of each that work well together and are willing to utilize their skills to benefit both!)
If you find yourself hesitant to make changes that you know must be done, but are not, (and this is not for the new DBA’s to an environment. I consider myself pretty fearless and I can tell you that I’m definitely more hesitant these days in the new databases I’m acclimating to… :)) start to make lists of the items you feel need to be performed.
1. Under each task, rank the task with a risk rating between 1 and 10, (1- lowest risk, 10- highest.)
2. Tasks with overall rating of under five should be taken on as soon as possible.
3. Tasks with ratings between five and seven should have the higher risks identified and discuss what can be done to eliminate/counteract the risks.
4. Any tasks with seven+ should be discussed as a full team with the customer and decided whether they pose the value worth the risk.
5. As these are database changes, not change releases, always keep them separate from the other. One of the largest mistakes I see in environments is when change releases are bundled with database maintenance and/or patches.
From there, assign dates to the tasks that can be accomplished and start to complete them. Always monitor the environment for any impact after the change and keep a log of the changes made.
If you do have an impacting change, do not become too disappointed and use it as a learning experience. If you did your best to cover your bases and you still ran into issues, you still did you best. Document how you can better avoid any impact the next time around and learn from the experience, enough said!