To demonstrate the effectiveness of the new performance capabilities in Oracle Database 10g, we now will evaluate the required steps to solve a performance problem prior to Oracle Database 10g and contrast it with the method in 10g. The same problem scenario results in very different diagnostic efforts. You probably will have guessed by this time, that the diagnosis in 10g, is significantly simpler when compared to the prior releases.
Reducing the diagnostic effort will allow the DBA to spend time fixing problems, which in our opinion is where a DBA’s real expertise will come to play.
Pre-Oracle Database 10g – The Before Image1. In this section we will look into an example of what is involved in diagnosing a performance problem in releases prior to Oracle Database 10g:
2. A DBA receives a call from a user (or an alert) complaining that the system is slow.
3. The DBA examines the server machine and sees that there are plenty of resources available, so obviously the slowdown is not due to the machine being out of horsepower.
4. Next (he or she) looks at the database and sees that many of the sessions are waiting on ‘latch free’ waits.
5. Drilling down into the latches he sees that most of the latch free waits are on ‘library cache’ and ‘shared pool’ latches.
6. From experience and referring to a number of books on the subject, the DBA knows that these latches are often associated with hard parsing issues. As a double check he looks at the rate at which the statistics ‘parse time elapsed’ and ‘parse time cpu’ are increasing. It is also observed that the elapsed time is accumulating much faster than CPU time so the suspicion is confirmed.
7. At this stage the DBA has a number of ways that he can precede, all of which are trying to identify skewed data distribution. One way is to look at the statistics for ‘parse count(hard)’ for all sessions to see if there are one or more sessions responsible for the majority of the hard parses. An alternative is to examine the shared pool to determine if there are many statements with the same SQL plan, but with different SQL text. In our example the DBA will do the latter and finds that there are a small number of plans each of which has many different SQL texts associated with it.
8. Reviewing a few of these SQL statements reveals that the SQL statements contain literal strings in WHERE clauses and so each of the statement must be separately parsed.
9. Having seen cases like this before the DBA can now say that the root cause of the problem is hard parsing caused by not using bind variables, and can move on to fixing the problem.
10. In performing these steps the DBA had to use his expertise to diagnose the cause of the problem and could easily have made the wrong decision at any of the steps resulting in wasted time and effort.
Oracle Database 10g - The After Image
Taking the same example we can see a noticeable difference in Oracle Database 10g:
1. A DBA receives a call from a user complaining that the system is slow.
2. The DBA examines the latest ADDM report (a complete sample is provided in Appendix A below) and the first recommendation reads:
FINDING 3: 31% impact (7798 seconds)
------------------------------------
SQL statements were not shared due to the usage of literals. This resulted in additional hard parses which were consuming significant database time.
RECOMMENDATION 1: Application Analysis, 31% benefit (7798 seconds)
ACTION: Investigate application logic for possible
use of bind variables instead of literals.
Alternatively, you may set the parameter
"cursor_sharing" to "force".
RATIONALE: SQL statements with PLAN_HASH_VALUE
3106087033 were found to be using literals. Look
in V$SQL for examples of such SQL statements.
The DBA immediately knows that over 30% of the time in the database is being spent parsing and has recommendations as to what to do to resolve the situation. Note that the finding also includes a suspect plan hash value to allow the DBAto quickly examine a few sample statements. In addition the DBA has not been adding overhead to the system with his diagnostic process.
This example highlights the major savings in time and effort that result from the automated diagnostic capabilities of Oracle Database 10g.
INTELLIGENT INFRASTRUCTURE
The ability to diagnose performance problems in Oracle systems does not happen by chance. Tuning experts need to understand the way that the database works and the ways they can do to influence it. The automatic diagnostic capabilities of Oracle Database 10g did not happen by chance either. In order to enable this new functionality many changes have been made in the Oracle server particularly in the area of code instrumentation.
Database Statistics
With each new release of the database more performance statistics are added that allow us to diagnose issues within the database. Several of the new statistics introduced in 10g were added specifically to improve the accuracy of the automated diagnosis of performance issues. One advantage of producing a tool inside the server is that if a problem is hard to diagnose we can add more instrumentation to make it easier!
Wait Classes
There are now over 700 different wait events possible in an Oracle Database 10g. The main reason for the increase is that many of the locks and latches have been broken out as separate wait events to allow for more accurate problem diagnosis. To enable easier high-level analysis of the wait events they have been categorized into WAIT CLASSES based on the solution space that normally applies to fixing a problem with the wait event. For example exclusive TX locks are generally an application level issue and HW locks are generally a configuration issue. The most commonly occurring wait classes and a few examples are listed below:
1. Application - locks waits caused by row level locking or explicit lock commands
2. Administration – DBA commands that cause other users to wait like index rebuild
3. Commit – wait for redo log write confirmation after a commit
4. Concurrency – concurrent parsing and buffer cache latch and lock contention
5. Configuration –undersized log buffer space, log file sizes, buffer cache size, shared pool size, ITL allocation, HW enqueue contention, ST enqueue contention 6. User I/O – wait for blocks to be read off disk
7. Network Communications – waits for data to be sent over the network
8. Idle – wait events that signify the session is inactive such as ‘SQL*Net message from client’