Locks, Latches, Mutexes, and CPU Usage

Posted in: Technical Track

I was alerted about a high CPU usage on a client machine. Checking which processes used the most CPU, I found that there was not one or a few processes taking up most of the CPU resources. There were just a lot of sessions in the run-queue each taking a bite of the CPU.

Checking current wait events, I found that there are a lot of sessions waiting on transaction-exclusive locks (enq: TX – row lock contention). The client implied that the CPU usage was caused by the high number of locks and sessions waiting for them.

This made me think that there is a misconception on locks and latches in an Oracle RDBMS instance and how are they are actually implemented under the hood – most importantly, how waiting  is implemented when the resource needed is locked, latched, or mutexed (I suppose this is not a real word but it sounds okay). Even if you understand the background of a latch spinning and have played with the spin_count parameter in latest versions, causing CPU spikes, this is also changed. In addition, latches are mostly not spinning while waiting and some are being replaced by mutexes to protect some areas like the shared pool.

So, as we know, locks are used to protect “end-user visible resources”, such as tables,  rows in tables, transactions, sequences, and materialized views, from concurrent user access, i.e. serializing requests that are incompatible to be done at the same time (think of a simple case of updating the same row in two different transactions). The locks are called enqueues because the sessions waiting on them are placed in queues (blockers, modifiers, and waiters). They are getting access to the resource they need as they come in an orderly fashion.

The implementation can be compared to a supermarket where people are sessions waiting to check-out the stuff they like to buy at the cash registers, which are the resources being locked, but with a little twist: Each cash register is for specific products, so people only go to a specific cash register depending on what they need, and are served as they come.

So how is the actual “waiting on lock” implemented? How does session B, waiting for a transaction to commit started by session A, knows that the resource is free for use?

When a resource needs to be locked, an entry is linked to a linked list made of parties interested in the particular resource. The session that has the lock has an entry in the owner’s linked list, while sessions that need to acquire the lock cannot since there is also another entry in the holders list (or converters list). So, how does it know that it can move up the waiters list or can be attached to the holders list? If it just retries and retries, the result would be like spinning and it will waste a lot of CPU. A better way to proceed is to sleep and be notified when the there is a movement in the queue.

To find out how it is implemented, I have traced Oracle foreground processes. I tried this on Oracle RDBMS 11.2.0.3, running on Linux. This is a excerpt of system calls being executed during a session waiting for a lock:

semtimedop(196610, {{35, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable) <3.001000>

semtimedop(196610, {{35, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable) <3.001000>

semtimedop(196610, {{35, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable) <3.001000>

So Oracle uses UNIX semaphores in order to prevent multiple sessions from modifying the holders, converters, and waiters linked lists. (These structures, on the Oracle kernel level, are protected  by the enqueue and the enqueue hash latches, so no two processes can modify them at the same time.)

There is a separate semaphore used for protecting a different resource, like a traffic light on a road junction controlling the flow of the locks trough the enqueue  linked lusts.  In order for a process to move “up” through the waiters queue, it needs to get a hold on the semaphore with ID 196610, semaphore number 35, in this case.  It sleeps for 3 seconds when waiting for the semaphore, so it doesn’t waste any CPU cycles. The nice thing is that when a process waits to get a hold on a semaphore, the owner process calls the semctl system call, which sets the semaphore to value 0. The waiters then get notified that the status of the semaphore has changed, so their wait is interrupted:

semctl(196610, 33, IPC_64|SETVAL, 0xbfeada10) = 0 <0.009000>

semtimedop(196610, {{35, -1, 0}}, 1, {0, 100000000}) = 0 <0.003000>

Consequently, a large number of sessions waiting on a lock would not impact the server performance noticeably.

Latches and Mutexes

The latches, as we know, are used for protecting Oracle’s internal structures against corruption by multiple sessions/processes that modify them at the same time. Examples for this include cache buffer chains, which are linked lists connected to a hash table used as a method for quickly determining if some block is in the buffer cache or not by hashing its address and class. In this example, each linked list (chain) of blocks is protected by a latch, so if we need to add a block to the list, remove, or just read through the list, we would need to get this latch. Obviously this needs to be held for a very short time, as getting this latch happens very often (for each block being accessed).

The processes waiting for a latch are not queued. They just retry to get the latch or get posted that the latch is free while sleeping, so there is no order in who gets the latch.

We can compare latches to a company of 100 employees. Say they are in 10 departments, each department having their own secretary answering the phone. When someone from the outside wants to call Scott in marketing, he dials the marketing dept.’s phone number. If their secretary is already talking on the phone, he will get a busy signal. If the caller is willing to wait and must make the call, he will call back immediately trying to reach Scott. If he is very desperate (and doesn’t have a life) he will try this 20000 times, until he gets tired and make a short break (and goes to sleep). Later, he’ll continue to bother the phone with another 20000 attempts. In the mean time, if another caller attempts and succeeds in getting the free phone line, he will get to the marketing dept. first.

The caller represents the process needing the latch. The latches are the secretaries’ phones and the resource needed is Scott. This method of getting a latch is by so called spinning and was used prior to Oracle 10g. (This is platform-dependent, and I am aware of the implementation on Linux.) The bad thing about spinning is that it burns CPU while doing it – it’s an active process. Think of the caller trying to reach Scott in the example – while he desperately redials 20000 times, he uses all of his “CPU” and is unable to do anything else while dialing.

That’s why in newer versions, Oracle implemented “sleeping” latch waits, which upon seeing that the latch is in use, immortally go to sleep and get notified by the latch holding process when it becomes free. This would mean that the waiters don’t clog the CPU while waiting; they just sleep. So, contention on latches would not impact the whole system and reduce sociability.  Here is how it goes:

  1. We will do some testing with the so-called “First Spare Latch”. We can get the address form v$latch: 1  select addr, name from v$latch 2* where name like ‘%spare%’

    SQL> /

    ADDR     NAME

    ——– —————————————————————-

    200091F8 first spare latch

    2000925C second spare latch

    200092C0 third spare latch

    20009324 fourth spare latch

    20009388 fifth spare latch

  2. Acquire the latch “artificially” by calling the Oracle function kslgetl:SQL> oradebug setmypidStatement processed.

    SQL> oradebug call kslgetl 0x&laddr 1 2 3

    Enter value for laddr: 200091F8

    Function returned 1

  3. From another session, trace a session that will try to acquire the same latch and will wait:SQL> oradebug setmypidStatement processed.

    SQL> oradebug call kslgetl 0x&laddr 1 2 3

    Enter value for laddr: 200091F8

    <The process is waiting now>

    The strace output shows that it waits using the semop system call:

    semop(196610, {{33, -1, 0}}, 1

This means that the process will sleep until it is waken by the process, which holds the latch and will not burn any CPU. When I first discovered that waiting on a latch is not active CPU spinning but just sleeping, it was a bit of a revelation to me that it works like this. I felt like someone who had just realized that the Earth is round and not flat!

The latch holder notifies the waiters (sleepers) by setting the semaphore value using:

semctl(196610, 33, IPC_64|SETVAL, 0xbfd5774c) = 0 <0.001000>

At which point the waiter finally (after waiting for 207 seconds here) gets the latch:

semop(196610, {{33, -1, 0}}, 1)         = 0 <207.309000>

In relation to the comparison of latches with secretaries answering phones, this is like if the secretaries got a list of all callers that called while the phone was busy and called them back after finishing with another call, so the callers would not just waste time (CPU) by obsessively redialing.

All this is good because latch waiting doesn’t waste CPU while waiting on latch free would be decreased. (Processes will not have to finish their sleep or spin_count before realizing the latch is actually free.) However, it is best if we decrease the chances of even hitting a latch-free contention, i.e. minimizing the chances of one process waiting for another to finish work on something that is protected by the latch. This is where mutexes come into the game.

Going back again to the office example with the secretaries, mutexes are like giving all employees their own phone and removing the secretary, which had become a bottleneck. This would mean that a caller would get a busy signal only if the called person is busy talking (in our example decreasing the chance of getting a busy signal 10 fold).

In the Oracle world, this is not implemented for all resources that need to be protected because it would become an overhead in memory usage. (Think of having additional 100-200 bytes allocated to a mutex for each block in the buffer cache if, for example, it were used for protecting block buffers.)

From 10g on, I noticed it being used for preventing cursors in the shared pool being parsed and executed. In theory we should rarely see contention on mutexes since as mentioned above, the chances of asking for a mutex that is already being used should be much smaller – think of massively re-executing the same SQL over and over in many sessions in parallel.

Mutexes in Oracle are not the same thing as a mutex on a OS level. Oracle mutexes are implemented by using OS semaphores (Linux), whith a simple kernel variable used a placeholder for a flag to mark some resource as busy or free and serialize, otherwise parallel  running processes, that would need to access that protected resource. So latches and mutexes are actually a Oracle instance high level interface to the OS semaphores.

I have tested making an artificial mutex contention by just “poking” (Andrey’s Nikolaev method) in the actual memory location of the mutex, setting a value that would represent a mutex being busy:

We first need to find the actual address of the mutex in memory, since it is not a fixed place. For protecting cursors, there is a separate mutex for each SQL with a particular hash value (I have noticed that even child cursors have different mutexes, although they have the same hash value). So we first create a contention for a particular SQL.

I am using the following SQL as it doesn’t require any additional load for logical/physical reads:

SQL> l

1* select ‘bla’ from dual

SQL> /

‘BL

bla

Execution Plan

———————————————————-

Plan hash value: 1388734953

—————————————————————–

| Id  | Operation        | Name | Rows  | Cost (%CPU)| Time     |

—————————————————————–

|   0 | SELECT STATEMENT |      |     1 |     2   (0)| 00:00:01 |

|   1 |  FAST DUAL       |      |     1 |     2   (0)| 00:00:01 |

—————————————————————–

Statistics

———————————————————-

0  recursive calls

0  db block gets

0  consistent gets

0  physical reads

0  redo size

420  bytes sent via SQL*Net to client

419  bytes received via SQL*Net from client

2  SQL*Net roundtrips to/from client

0  sorts (memory)

0  sorts (disk)

1  rows processed

SQL>

This is a script for making a mutex contention:

[[email protected] scripts]$ cat mutex_load.sh

N=1 ;

until test “$N” -gt “101”;

do

echo $N

sqlplus /nolog @soft_parse.sql &

sleep 5

N=`expr $N + 1`

done

[[email protected] scripts]$ cat soft_parse.sql

connect / as sysdba

begin

for i in 1..9000000 loop

execute immediate ‘select ”bla” from dual’;

end loop;

end;

/

Exit

So the script will run 100 sessions, each continuously executing the specified SQL.

Let’s find the SQL hash value, which will be needed to find the mutex address:

select hash_value, sql_text, child_number from v$sql where sql_text like ‘%bla%’

SQL> /

HASH_VALUE SQL_TEXT                                           CHILD_NUMBER

———- ————————————————– ————

2386981435 begin for i in 1..9000000 loop    execute immediat            0

e ‘select ”bla” from dual’; end loop; end;

957527304 select ‘bla’ from dual                                        0

957527304 select ‘bla’ from dual                                        1

2344015831 EXPLAIN PLAN SET STATEMENT_ID=’PLUS4294967295′ FOR            0

select ‘bla’ from dual

2344015831 EXPLAIN PLAN SET STATEMENT_ID=’PLUS4294967295′ FOR            1

select ‘bla’ from dual

HASH_VALUE SQL_TEXT                                           CHILD_NUMBER

———- ————————————————– ————

3438743966 select hash_value, sql_text, child_number from v$s            0

ql where sql_text like ‘%bla%’

1671538998 select hash_value, sql_text from v$sql where sql_t            0

ext like ‘%bla%’

7 rows selected.

We are interested in the SQL with hash value 957527304, and we have 2 child cursors for that SQL, since I have executed one plainly in SQL*Plues and the other one through execute immediate in the script above. So the mutex in question will have the identifier 957527304 and will protect both children for this SQL:

select MUTEX_ADDR, MUTEX_IDENTIFIER, MUTEX_TYPE, max(gets), max(sleeps), mutex_value

from x$mutex_sleep_history

group by MUTEX_ADDR, MUTEX_IDENTIFIER, MUTEX_TYPE, mutex_value

order by 4, 5

SQL> /

MUTEX_AD MUTEX_IDENTIFIER MUTEX_TYPE                        MAX(GETS) MAX(SLEEPS) MUTEX_VA

——– —————- ——————————– ———- ———– ——–

37D31DB4       3607215236 Cursor Pin                                1           1 00260000

37D0BDB4        722748295 Cursor Pin                                1           1 00280000

37D18DB4       4063208512 Cursor Pin                                1           1 00280000

37D3FDB4       3873422482 Cursor Pin                                9           2 00220000

37D4ADB4       3165782676 Cursor Pin                               11           1 00220000

36D493C0       3096556448 Cursor Pin                               16           1 00240000

352156CC        957527304 Cursor Pin                         15555114         178 003E0012

352156CC        957527304 Cursor Pin                         15765884          81 00450012

352156CC        957527304 Cursor Pin                         16536474         107 00430016

352156CC        957527304 Cursor Pin                         16776537         116 00340011

352156CC        957527304 Cursor Pin                         17281498          77 004C0017

The memory address of the mutex protecting the cursor for our SQL is 352156CC. Let’s poke it a bit to make the mutex busy:

SQL> oradebug setmypid

Statement processed.

SQL> oradebug poke 0x352156CC 4 0x004C0017

BEFORE: [352156CC, 352156D0) = 00000000

AFTER:  [352156CC, 352156D0) = 004C0017

And from another session, to initiate a waiting for that mutex:

1   begin

2      execute immediate ‘select ”bla” from dual’;

3*    end;

4  /

<The session is waiting now…>

Let’s see what the session is waiting on:

EVENT                          BLOCKING_SESSION

—————————— —————-

cursor: pin S                                76

The actual number of the blocking session 76 is actually the the first two bytes in the value I have set in the mutex 0x4C. In decimal, it is 76. So how is waiting on mutexes implemented?

Doing a strace on the waiting process shows it repeatedly executes:

semtimedop(196610, {{75, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.011000>

semtimedop(196610, {{75, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.011000>

semtimedop(196610, {{75, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.011000>

semtimedop(196610, {{75, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.011000>

semtimedop(196610, {{75, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.011000>

So again, it uses semaphores with a defined timeout (1 centisecond here). It is trying to get the semaphore 75 in the semaphore array with ID 196610. It sees it is busy, sleeps for 1 centisecond, times out with an error, and retries the same again on an loop which would end when the semaphore is free. I have released the mutex (i.e. the semaphore) by setting its memory location to a value of 0:

Statement processed.

SQL> oradebug poke 0x352156CC 4 00000000

BEFORE: [352156CC, 352156D0) = 004C0017

AFTER:  [352156CC, 352156D0) = 00000000

This doesn’t notify (post) the waiters that the mutex is free since I have just modified the location in memory and did not use system calls normally, such as syscrl. On the next semaphore time out it has realized that the mutex is free and executed the SQL, which was waiting:

1  begin

2          execute immediate ‘select ”bla” from dual’;

3*       end;

SQL> /

PL/SQL procedure successfully completed.

So all mechanisms for protecting a shared resource in the latest Oracle RDBMS instances (I have tested this on Oracle 11.2.0.3 on Linux) seem to use semaphores as an underlying mechanism on the OS level. This is a graphical representation of why two processes would use a semaphore to acquire access to a shared resource (SQL Cursor in this example):

When one of the processes in the example would like to access the shared resource:

      1. It would try to get a hold of the responsible semaphore for that object using the sysop, or systimedop system calls. If the semaphore is already set as being used, the process will go to sleep – it would either be awaken if the semaphore gets released, or if systimedop is ed, the specified timeout period has finished returning an error code.
      2. If the semaphore is free, its counter will be incremented, so it will be marked as being used, and the system call will finish with return value of 0.
      3. The process that was looking to get a hold of the resource now has full access to it.
      4. When it finishes with the resources, it would release the semaphore calling the sysctl system call, which would notify processes that wait (and that have executed em(timed)op on that the semaphore) and they will wake up and get a chance to acquire the semaphore and the underlying resource.

The implementation of mutexes in release 11.2.0.3 (and 11.2.0.2.2) had added some parameters that allow the mechanics of the mutexes to be tuned in the respect of the time a mutex sleeps improved how it uses the CPU cycles while waiting. It also added the possibility of an “exponential back-off” while sleeping. This is explained in MOS note:

Bug 10411618 – Enhancement to add different “Mutex” wait schemes [ID 10411618.8]

Wait schemes

~~~~~~~~~~~~

The following are the wait schemes added. For each of them we spin for a fixed number of times and then yield the first time we try to get the mutex.

From the second time onwards we do one of the following:

* Yield – Always yield the CPU.

* Sleep – Sleep for a fixed amount of time

* Exponential back off – Each iteration we sleep for a greater amount of time

When a process “yields” the CPU, it actually means it is giving up its time on the CPU (its slice) and is put in the end of the run queue.  This would imply that in a busy system, with a long run-queue, processes-yielding the CPU will result in poor performance. This is the reason why in some releases like 11.2.0.2, for example, we would see waits on mutexes (like cursor pin: S) in the top 5 wait events if we had a busy system.

These are the hidden parameters that can be used for mutex tuning:

> For 11.2.0.3, 11.2.0.2.2 with patch 12431716 or 11.2.0.2.3 onwards:

* _mutex_spin_count (Integer)

– This sets the number of times to spin before yielding/waiting.

* _mutex_wait_scheme (Integer)

– In 11.2 this controls which wait scheme to use. It can be set to one

of the three wait schemes described above thus:

o _mutex_wait_scheme = 0                        – Always YIELD

o _mutex_wait_scheme = 1 & _mutex_wait_time = t – Always SLEEP for t milli-seconds

o _mutex_wait_scheme = 2 & _mutex_wait_time = t – EXP BACKOFF with maximum sleep

=============

I have done some testing with different mutex wait schemes and wait times:

SQL> alter system set “_mutex_wait_time”=10 scope=memory;

System altered.

Here are the system calls it did while waiting, increasing the wait time up to 10 centiseconds (the _mutex_wait_scheme being 2 here):

semtimedop(196610, {{44, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.011000>

semtimedop(196610, {{44, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.011000>

semtimedop(196610, {{44, -1, 0}}, 1, {0, 30000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.031000>

semtimedop(196610, {{44, -1, 0}}, 1, {0, 30000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.031000>

semtimedop(196610, {{44, -1, 0}}, 1, {0, 70000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.071000>

semtimedop(196610, {{44, -1, 0}}, 1, {0, 70000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.071000>

semtimedop(196610, {{44, -1, 0}}, 1, {0, 100000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.101000>

semtimedop(196610, {{44, -1, 0}}, 1, {0, 100000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.101000>

semtimedop(196610, {{44, -1, 0}}, 1, {0, 100000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.101000>

semtimedop(196610, {{44, -1, 0}}, 1, {0, 100000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.101000>

Or up to 1 second of waiting:

SQL> alter system set “_mutex_wait_time”=100;

System altered.

semtimedop(196610, {{35, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.011000>

semtimedop(196610, {{35, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.011000>

semtimedop(196610, {{35, -1, 0}}, 1, {0, 30000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.031000>

semtimedop(196610, {{35, -1, 0}}, 1, {0, 70000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.071000>

semtimedop(196610, {{35, -1, 0}}, 1, {0, 120000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.121000>

semtimedop(196610, {{35, -1, 0}}, 1, {0, 130000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.131000>

semtimedop(196610, {{35, -1, 0}}, 1, {0, 220000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.221000>

semtimedop(196610, {{35, -1, 0}}, 1, {0, 230000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.231000>

semtimedop(196610, {{35, -1, 0}}, 1, {0, 400000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.401000>

semtimedop(196610, {{35, -1, 0}}, 1, {0, 410000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.411000>

semtimedop(196610, {{35, -1, 0}}, 1, {0, 740000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.741000>

semtimedop(196610, {{35, -1, 0}}, 1, {0, 750000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.751000>

semtimedop(196610, {{35, -1, 0}}, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable) <1.001000>

semtimedop(196610, {{35, -1, 0}}, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable) <1.001000>

semtimedop(196610, {{35, -1, 0}}, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable) <1.001000>

I tried the scheme 0, which would mean the process is not going to sleep but just yield the CPU (giving up):

SQL> alter system set “_mutex_wait_scheme”=0;

System altered.

This are its system calls:

sched_yield()                           = 0 <0.000000>

sched_yield()                           = 0 <0.000000>

sched_yield()                           = 0 <0.000000>

sched_yield()                           = 0 <0.000000>

sched_yield()                           = 0 <0.000000>^C

And the strace output (with the summary option):

% time     seconds  usecs/call     calls    errors syscall

—— ———– ———– ——— ——— —————-

57.71    1.729747           4    491535           sched_yield

35.39    1.060833           2    496678           gettimeofday

6.84    0.204961          41      4965           select

0.03    0.001000           7       140           munmap

0.03    0.001000         200         5           pwrite64

0.00    0.000000           0        11           read

0.00    0.000000           0         7           write

0.00    0.000000           0         8           open

0.00    0.000000           0        18           close

0.00    0.000000           0         1           chmod

0.00    0.000000           0         7           lseek

0.00    0.000000           0        28           times

0.00    0.000000           0         1         1 ioctl

0.00    0.000000           0        98           getrusage

0.00    0.000000           0         3           statfs

0.00    0.000000           0         2           fstatfs

0.00    0.000000           0        20           rt_sigaction

0.00    0.000000           0         2           rt_sigprocmask

0.00    0.000000           0         3           pread64

0.00    0.000000           0         5           getrlimit

0.00    0.000000           0         2           mmap2

0.00    0.000000           0        29           stat64

0.00    0.000000           0         8           lstat64

0.00    0.000000           0        10           fcntl64

0.00    0.000000           0         3           futex

0.00    0.000000           0         1           semctl

—— ———– ———– ——— ——— —————-

100.00    2.997541                993590         1 total

===================================================================

References and further reading:

–   Latch, mutex and beyond by Andrey Nikolaev:https://andreynikolaev.wordpress.com/

–   TANEL PODER’S BLOG: IT & MOBILE FOR GEEKS AND PROS: https://blog.tanelpoder.com/

–   THE Q U A D R O BLOG by ALEX FATKULIN: https://afatkulin.blogspot.ca/2009/01/longhold-latch-waits-on-linux.html

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

14 Comments. Leave new

Kevin Closson
June 14, 2012 2:15 pm

Great post, Gorjan. This is a topic I love and I’ve spent a lot of time in that code dating back to the days when we implemented my patented NUMA-optimized mutex (6480918) in place of redo allocation in the late 90s in the Sequent DYNIX/ptx port of Oracle.

If you’ll allow I’d like to point a couple of this out. First, and this is really just a nit, a latch is actually mutex in simple computer science. Oracle named it a latch for simplicity in differentiating it from a lock.

A lot changed since the first latches appeared in Oracle. These days there are shared-latches or what some would simply call reader/writer locks in plain computer science terms. Not all latches can be shared and not all can be in the post/wait class. Your debugging skills are keen. You can investigate kslcpat() to learn more about how Oracle determines the shareability and waitability of latches. kslges() is of course where sessions go to make a final attempt on the latch before a wait (either a yield loop or a post-wait sleep ala IPC semaphores). That leads me to point out that latch yielding has been done on many ports dating back to the early 90s.

In the current Linux port with the sched_yield() call this does not mean the caller loses CPU. It’s a request of the OS to see if there is a runable process that might take my place. Since you measured 4us yields we can be assured that other processes were not piling up to take the CPU from the yielding session.

Additionally you can event 10005, level 4 for more information and perhaps even more with _ultrafast_latch_statistics.

Again, good post.

Reply
Gorjan Todorovski
June 15, 2012 12:43 am

Thanks Kevin! It means a lot coming from the Architect himself :)

I am very interested in improving database performance by reducing unnecessary contention, so very keen to go under the hood and investigate how exactly things work. I will tryout your suggestions.

Cheers,
Gorjan

Reply

Kevin,
looks like event 10005 has been removed above 9i , and we have to oradebug dump KSTDUMPCURPROC 1 . Google have no clue about kslcpat so it doesnt exists ;), could You put some light on that call ?
Regards
GregG

Reply
Kevin Closson
June 15, 2012 12:27 pm

@GregG : I can’t say any more about kslcpat if there is nothing in the public domain about it :-( I would be going by my memory of the code and that would be confidential information I am legally bound to never disclose…and Oracle’s lawyers are an eager bunch as we see from their endless fun in the courts.

If you have a MOS account you can do the Tanel Poder approach. Search through MOS mentions of the function and stitch together the facts found there. I’d offer such a MOS reference but I do *not* log into MOS so as too keep sanitized.

I regret mentioned that event if it is deprecated. I never used it in my roles with Oracle code privilege. I always used a simple debugger side-by-side with the code.

Reply

Kevin,
totally understood . Admire Your work .
Best wishes.
GregG

Reply
SALMAN BIN MUHAMMAD
June 18, 2012 11:45 pm

Check the Top most Active / Inactive sessions with complete information then it will be more easy to evaluate that problem coming from CPU Contention due to Oracle Users or O/S Level.

TOP 10 SESSION MOST ACTIVE SESSIONS
/* top10active.sql
shows the top 10 longest-active user sessions
*/
col osuser format a10 trunc
col LastCallET format 99,999
col sid format 9999
col spid formar 999999
col username format a10 trunc
col uprogram format a25 trunc
col machine format a10 trunc
set linesize 132
set verify off
select * from (
select to_char(s.logon_time, ‘mm/dd hh:mi:ssAM’) loggedon,
s.sid, s.status,
floor(last_call_et/60) “LastCallET”,
s.username, s.osuser,
p.spid, s.module || ‘ – ‘ || s.program uprogram,
s.machine, s.sql_hash_value
from v$session s, v$process p
where p.addr = s.paddr
and s.type = ‘USER’
and module is not null
and s.status = ‘ACTIVE’
order by 4 desc)
where rownum < 10;

=================================
TOP INACTIVE SESSIONS
/* top10active.sql
shows the top 10 longest-active user sessions
*/
col osuser format a10 trunc
col LastCallET format 99,999
col sid format 9999
col spid formar 999999
col username format a10 trunc
col uprogram format a25 trunc
col machine format a10 trunc
set linesize 132
set verify off
select * from (
select to_char(s.logon_time, 'mm/dd hh:mi:ssAM') loggedon,
s.sid, s.status,
floor(last_call_et/60) "LastCallET",
s.username, s.osuser,
p.spid, s.module || ' – ' || s.program uprogram,
s.machine, s.sql_hash_value
from v$session s, v$process p
where p.addr = s.paddr
and s.type = 'USER'
and module is not null
and s.status = 'INACTIVE'
order by 4 desc)
where rownum < 300;

Reply
Lazar Nakinov
June 15, 2012 3:55 am

Gorjan,

This is a Masterpiece! Troubleshooting contention in shared pool, library cache or buffer cache is always challenging and fun and requires deep knowledge of Oracle/OS internals.
Don’t stop here, let it all out, keep on blogging!

Best regards

Reply

Great post!Thanks.
another question,why TX lock cause CPU usage?

Reply
Gorjan Todorovski
June 30, 2012 12:51 pm

Actually, the TX lock didn’t cause the high CPU load, but it was a consequence of it. Longer waits on TX locks, were caused by overall slowdown of the system due the higher CPU load caused by a “bad” SQL being executed by multiple sessions, making transactions taking longer to complete. This lead to a TX lock being held longer then usual.

Reply

Thanks

Reply
cursor: pin S wait on X in the Top 5 wait events | The Pythian Blog
August 9, 2012 9:37 am

[…] events in AWR reports. It was as if someone wanted to tell me that I was wrong in my last post (Locks, Latches, Mutexes and CPU usage), stating that a significant mutex contention should be rarely experienced. It started showing […]

Reply
André Araujo
August 22, 2012 7:21 pm

Great blog, Gorjan! I had earmarked this for reading a long time ago and I’m glad I came back!

I second Lazar’s request: Keep on blogging, mate!

Reply
how to make money online,hyip sites,search engine optimization,learn seo,download softwares and application for free,learn blogging tips and tricks
November 29, 2012 7:09 am

Thanks , I’ve just been searching for info approximately this subject for a long time and yours is the best I have came upon till now. However, what concerning the conclusion? Are you sure concerning the source?|What i don’t understood is actually how you are not really much more neatly-preferred than you might be now. You’re so intelligent.

Reply

Awesome explanation!

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *