I’ve never attended the North California Oracle User Group Conferences even though they are organized every quarter. However, I’ve been always jealous of the great agenda they put together. A couple months ago, Gwen Shapira reminded me once again that the next NoCOUG conference was coming up and asked whether I would be able to come to present. What a chance, I thought, easy to plan as I have no other conferences in May.
So, at NoCOUG Spring Conference 2010 in just 10 days, I’ll be doing my two hour long presentation — Demystifying Oracle RAC Workload Management. If it’s your local conference, I hope you can attend and say hello. You might also want to download the whitepaper that I put together few years ago for Hotsos Symposium — Oracle RAC Workload Management.
The conference is free to members of NoCOUG and only $50 to non-members but it would make more sense to just join the user group as its annual fees are unbelievably low — I couldn’t say it better than Iggy Fernandez did:
How much does a NoCOUG membership cost? It doesn’t cost $400, as you might expect to pay for so much educational value. It doesn’t cost $300 and it doesn’t cost $200. It doesn’t even cost $100. Yes, a calendar-year NoCOUG membership only costs $95! Won’t you join today?
NoCOUG has also its own printed publication and it’s been a honor to be interviewed by Iggy Fernandez for the NoCOUG Journal. Amongst the topics was the discussion of Battle Against Any Guess as the follow up on the chapter I contributed to the latest book by OakTable Network members — Expert Oracle Practices: Oracle Database Administration from the Oak Table. I quote a small fragment below but you can read the whole interview in the May issue. “This could be the most important issue of the NoCOUG Journal you will ever read” as Iggy mentioned and NoCOUG has made the online version of May’s issue publicly available. The BAAG chapter from the aforementioned book is also reprinted in the journal along with a detailed book review by Dave Abercrombie.
Battle Against Any Guess
Tell us a story. Tell us two. We love stories!
It’s June 2007, and I still have enough time left in my day to be active on the Oracle-L list. I’m reading the threads and once again there is one thread full of guesswork-based solutions to solve a particular performance problem. Not the first one and not the last. After entering into the discussion, I felt the conversation was the same I’d had time and time again (like a broken record), and this prompted me to create a place on the internet that I can refer to whenever I and others need to point out the fallacies of guesswork solutions. And so, the BAAG Party was born–www.BattleAgainstAnyGuess.com. The name idea came from the BAARF Party (Battle Against Any Raid Five) organized by fellow OakTable Network members James Morle and Mogens Nørgaard.
What’s wrong with making an educated guess? We have limited data, limited knowledge, limited experience, limited tools, and limited time. Can we ever really know?
“Yes we can!” At least, we should strive to know.
I’ll never forget how enlightened I was the moment I saw the slide “Why Guess When You Can Know?” presented by Cary Millsap, another fellow member of the OakTable Network. Most real life problems can be solved with the knowledge that is available in the public domain, using data that is possible to extract by applying the right experience and tools and taking enough time to do the job properly.
It is the purpose of the Battle to promote the importance of knowledge fighting ignorance, selecting the right tools for the job, popularizing the appropriate troubleshooting techniques, gaining experience, and learning to take time to diagnose the issue before applying the solution. One might think that the BAAG motto is a bit extreme but that’s a political decision to emphasize the importance of the goal.
I have elaborated on the concept of the “educated guess” in the first chapter of the book Expert Oracle Practices: Oracle Database Administration from the Oak Table. The chapter is titled “Battle Against Any Guess.” I would like to quote the following from page 11:
Oracle Database is not only a complex product, it’s also proprietary software. Oracle Corporation introduced significant instrumentation and provided lots of new documentation in the last decade, but there are still many blanks about how the product works, especially when it comes to the implementation of new features and of some advanced deployments that hit the boundaries of software and hardware. Whether it’s because Oracle wants to keep some of its software secrets or because documentation and instrumentation are simply lagging, we always face situations that are somewhat unique and require deeper research into the software internals.
When I established the Battle Against Any Guess Party, a number of people argued that guesswork is the cruel reality with Oracle databases because sometimes we do hit the wall of the unknown. The argument is that at such point, there is nothing else left but to employ guesswork. Several times people have thrown out the refined term “educated guess.” However, I would argue that even in these cases, or especially in these cases, we should be applying scientific techniques. Two good techniques are deduction and induction.
When we have general knowledge and apply it to the particular situation, we use deductive reasoning or deductive logic. Deduction is often known as a “top-down” method. It’s easy to use when we have no gaps in our understanding. Deduction is often the path we take when we know a lot about the problem domain and can formulate a hypothesis that we can confirm or deny by observation (problem symptoms).
Inductive reasoning is often considered the opposite of deductive reasoning and represents a bottom-up approach. We start with particular observations, then recognize a pattern, and based on that pattern we form a hypothesis and a new general theory.
While these techniques are quite different, we can find ourselves using both at different stages as verification that our conclusions are correct. The more unknowns we face, the more we favor inductive reasoning when we need to come up with the generic theory while explaining a particular problem. However, when we form the theory via inductive logic, we often want to prove it with additional experiments, and that’s when we enter into a deduction exercise.
When taking a deductive approach first, when applying known knowledge and principles, we often uncover some inconsistencies in the results that require us to review existing theories and formulate new hypotheses. This is when research reverts into inductive reasoning path.
Deduction and induction each have their place; they are both tools in your arsenal. The trick is to use the correct tool at the correct time.
How do we decide which competing methodology to use? Which tool is the best tool for the job? In matters of performance tuning, should we trace, sample, or summarize?
Good questions. Logic and common sense come to mind as the universal methodology for any troubleshooting. If we focus on performance then we should define what it means to improve performance. For me, performance tuning is all about reducing the response time of a business activity. When I think performance, I think response time. This is what Cary Millsap taught me through his book Optimizing Oracle Performance–he shifted my paradigm of performance tuning back then (by the way, you can read more about the paradigm shift concept in my chapter referenced above).
Since we identified that response time is what matters, the next step is to analyze where the time goes–build the response time profile. Adopting a top-down approach we might find that 2% of the time is spent on the application tier and 98% of the time spent in the database. Drilling down to the next level of granularity, we could identify two SQL statements that consume a 42% response time each. Focusing on those two, we drill down further into, say, wait events. We could pinpoint the reason for excessive response time at this stage or we might need to dig even deeper–somewhere where timed information isn’t available. This is where the current battle lies–we could win it by introducing the right instrumentation and tools.
More than a decade ago, Oracle database performance analysts didn’t have the luxury of wait interface and had to rely on various aggregations and ratios as time proxies. The same happens now on another level–when wait interface granularity is not enough, we have to rely on counters and methods such as call-stack sampling. Again, the same goes when execution exits the database, for example, to do storage I/O. Current I/O systems are not instrumented to provide a clear response time profile.
However, I want to emphasize that the vast majority of mistakes during performance diagnostic happen much earlier when we have enough knowledge and tools to avoid applying guesswork solutions, but we often don’t.
I digressed in my response from the original question on what the best tools are, but, unfortunately, I will have to disappoint–there is no magic-bullet performance tool that will diagnose all problems. The most sound advice I can give is to study the performance methods and tools available, understand how they work, when they should be used, and what their limitations are and why. There are a number of books published and if you ask me to distinguish one of the recent books, I would mention Troubleshooting Oracle Performance by Christian Antognini.
Should we extend the scientific method to Oracle recommendations or should we adhere to the party line: use the cost-based optimizer, don’t use hints, collect statistics every night, upgrade to Oracle 11g, apply the latest patch set, CPU, and PSU, etc.? After all, nobody gets fired for following vendor recommendations. Many years ago, I lost a major political battle about Optimal Flexible Architecture (OFA) and never recovered my credibility there. Once Bitten, Twice Shy is now my motto.
I’ve touched on the issue of best practices in the BAAG chapter:
“Best practices” has become an extremely popular concept in recent years, and the way IT best practices are treated these days is very dangerous. Initially, the concept of best practices came around to save time and effort on a project. Best practices represent a way to reuse results from previous, similar engagements. Current application of best practices has changed radically as the term has come into vogue.
What are now called best practices used to be called “rules of thumb,” or “industry standards,” or “guidelines.” They were valuable in the initial phase of a project to provide a reasonable starting point for analysis and design. Unfortunately, modern best practices are often treated as IT law–if your system doesn’t comply, you are clearly violating that commonly accepted law.
Vendor recommendations are very valuable in the early stages of a project and even later on, as progress is made. In order to apply vendor recommendations correctly, one should understand the reasoning behind such advice, what problems it solves specifically and what else could possibly be affected. If you take an example of collecting statistics every night, then it makes sense for the majority of Oracle databases. There are plenty of exceptions, however, and at Pythian, we often modify the default collection schedule for our customers. Having a sound understanding of what a vendor recommends and why is the key to a successful implementation.
In some cases, it might be difficult to act contrary to generic vendor recommendations, and convincing management otherwise is usually very difficult. Some basic principles to keeping in mind when deciding your course of action are below:
- Vendor recommendations are generic. Consider them as the default configuration of init.ora parameters. Nobody runs with all default parameters.
- Instead of going against vendor recommendations, call it modifying or adapting to a particular environment.
- Find a precedent where a recommendation has failed and why. It’s like being in court–nothing beats a precedent.
- Playing politics is a whole different game. Either you are a player or you stay away.
If for some reason you can’t be at the conference, you can always schedule some time to catch up by emailing to events [at] pythian.com. See you soon!