After logon trigger in user schema and FRA full situation hang the database

Posted in: Oracle, Technical Track

This blog post is about a scenario where a combination of a logon trigger performing DML operations at user login time (writing to a custom audit table), and a Fast recovery area (FRA) full situation brings the database to a halted state where even the sys user can’t create a regular new session.

There are two goals to this post. One is to illustrate the blocking scenario, so that when using logon triggers, care is taken to think about possible side effects and consequences. The other is to present how I researched it, and show some of the techniques I used trying to fix it, possibly avoiding an instance restart. This can hopefully be useful when troubleshooting similar scenarios, as not all readers might be aware about them.

Testcase setup

For this example, I’m using a non-CDB 19c database. Lower Oracle versions can also be used. In a CDB architecture, the issue could also be reproduced in case the logon trigger resides in the root container, even though it’s not recommended to create user objects in there. If the trigger is created in a PDB, the issue can be fixed without a database restart. Also, if the logon trigger performs the INSERT into the audit table only for non-SYS users (for example by having an “IF (USERNAME<>’SYS’) THEN INSERT INTO..” statement), the issue can be avoided.

1) In order to reproduce the problem, let’s artificially limit the FRA size to 400MB:

SQL> show parameter db_recovery_file

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest                string      /u02/fast_recovery_area
db_recovery_file_dest_size           big integer 400M

On this test database, the size of the online redo logs is 200MB:

SQL> select GROUP#, BYTES, BLOCKSIZE, STATUS from  v$log;

---------- ---------- ---------- ----------------
         1  209715200        512 INACTIVE
         2  209715200        512 INACTIVE
         3  209715200        512 CURRENT



2) Create a user “u1” with the following privileges:

create user u1 identified by oracle default tablespace users;
grant create session to u1;
grant create table to u1;
grant create trigger to u1;
grant administer database trigger to u1;
alter user u1 quota unlimited on users;

3) Logged in as user “u1”, create a table which the login trigger will INSERT into. This will simulate an audit table the trigger might be actually writing to:

SQL> conn u1/[email protected]/orcl
SQL> show user
USER is "U1"

Side note: Though it’s not strictly relevant to this particular test case, the explicit CAST is used to make sure the columns’ datatypes are NUMBER and VARCHAR2(4000) respectively.

SQL> create table t1 (id, text)
select cast (rownum as number) id,
       cast (rpad('x',4000) as varchar2(4000))
from dual
connect by level <= 100;

4) Logged in as the same “u1” user, create a “LOGON ON DATABASE” trigger which will simulate writing into an audit table at user logon time:

  insert into t1 select * from t1 where rownum <= 100;

The trigger fires for all logons, SYS included.

5) In a separate sqlplus session (doesn’t matter as which user we’re logged in), start generating REDO, so that the FRA fills up. The SID of this session is 34 (we’ll need it later):

SQL> create table t2 as select * from all_source;

Table created.

SQL> begin
 while (true) loop
  insert into t2 select * from t2;
 end loop;

6) Short after that, the FRA fills up, and the archiver process is unable to archive the online redo logs:

ORACLE Instance orcl - Cannot allocate log, archival required
Thread 1 cannot allocate new log, sequence 17
All online logs need archiving
Examine archive trace files for archiving errors
  Current log# 1 seq# 16 mem# 0: /u02/oradata/ORCL/redo01.log
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_tt00_20161.trc:
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 209715200 bytes disk space from 419430400 bytes limit

And just to confirm, there’s still room available on the filesystem where the FRA resides:

[[email protected] ~]$ df -kh /u02/fast_recovery_area
Filesystem           Size  Used Avail Use% Mounted on
/dev/mapper/ol-root   50G   45G  5.3G  90% /
Trying to fix the problem in five different ways

1. In case enough disk storage is available, one of the fastest solutions to alleviate a critical FRA full situation could be to temporarily increase the FRA size. If we have an existing session where we’re logged in as a user with enough privileges to change the FRA size, that can be done. However, new connections to the database, even as sys, are not possible, as the logon trigger is blocking the creation of a new session:

[[email protected] ~]$ sqlplus / as sysdba

SQL*Plus: Release - Production on Mon Aug 22 16:34:19 2022

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

<<-- login is blocked

Side note about logon trigger firing rules

As described in MOS note “Connecting as DBA Does not Fire RAISE_APPLICATION_ERROR in a AFTER LOGON ON DATABASE TRIGGER (Doc ID 226058.1)” and in the documentation, unhandled exceptions in the logon trigger don’t prevent logins to users having the ADMINISTER DATABASE TRIGGER privilege, so sys should be able to login even if the trigger raises an exception. However, our logon trigger is not raising an exception, it’s just blocked on a DML operation.


Before proceeding further, let’s take a note of the PID of the process created for our connection. I found it out by simply running the below ps command before and after starting “sqlplus / as sysdba”, and it’s PID=1767. We’ll use it later on, when inspecting a hanganalyze trace file:

[[email protected] trace]$ ps -ef |grep beq
oracle    1767  1766  0 17:24 ?        00:00:00 oracleorcl (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

[[email protected] trace]$ pstree -lap 1766
sqlplus,1766   as sysdba
  |-oracle_1767_orc,1767 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

2. The same hang occurs if we try to connect with “rman target /” in order to manage the archivedlogs by backing them up or removing them. RMAN even displays the issue with the trigger when executing the ‘U1.TRG’ trigger:

[[email protected] ~]$ rman target /

Recovery Manager: Release - Production on Mon Aug 22 17:04:53 2022

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

^CRMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-04005: error from target database:
ORA-04088: error during execution of trigger 'U1.TRG'
ORA-00604: error occurred at recursive SQL level 1
ORA-01013: user requested cancel of current operation
[[email protected] ~]$

3. Manually deleting the archivedlogs from disk won’t help either, as Oracle won’t be aware of this change.
4. As we know that the sqlplus logon is blocked by a trigger, what happens if I try to cancel the SQL call – would I still get the session? Following Tanel Poder’s article, the kill -URG <PID> can be used to cancel the current SQL call.

Let’s try that with a new login attempt, which hangs:

[[email protected] ~]$ sqlplus / as sysdba
<<-- hangs

Verify the parent PID:

[[email protected] trace]$ pstree -lap 12458
sqlplus,12458   as sysdba
  |-oracle_12459_or,12459 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

And let’s try to cancel the SQL call with:

[[email protected] trace]$ kill -URG 12459

No luck – we can’t login, and sqlplus session reports the below error:

[[email protected] ~]$ sqlplus / as sysdba

SQL*Plus: Release - Production on Mon Aug 22 20:02:49 2022

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

ORA-01075: you are currently logged on

Enter user-name:

5. Running alter system set "_system_trig_enabled"=false would prevent the trigger from firing, and thus allow us to login as sys either in sqlplus or rman. Though, assuming we don’t have an established session from where to run the command, we can’t dynamically set the parameter, not even with a preliminary sqlplus connection (discussed further down in the post).

So at this point the DB instance is stuck. Existing sessions can still perform SELECTs, but no DML is possible.


Initial research steps

Following are some techniques to illustrate how to find out what the blocked sessions are waiting for. For illustration purposes I had an already established session before the FRA full situation occurred, so I could run the below SQLs. If we don’t have such a session, I’ll show more advanced techniques later on.

1) Session with SID=34 (the one inserting in an infinite loop) is blocked by the logwriter (LGWR, SID=10) process:

SQL> select sid, program, event, status, state, last_call_et, blocking_session, blocking_session_status from v$session where sid=34;

SID PROGRAM                              EVENT                               STATUS   STATE    LAST_CALL_ET BLOCKING_SESSION BLOCKING_SE
--- ------------------------------------ ----------------------------------- -------- -------- ------------ ---------------- -----------
 34 [email protected] (TNS V1-V3)  log file switch (archiving needed)  ACTIVE   WAITING          1095               10 VALID

1 row selected.


LGWR is in turn waiting on the archiver (ARCn) processes, which can be inferred by the wait event “log file switch (archiving needed)” wait event.

An interesting side effect of the current database state is that some monitoring SQL scripts don’t work anymore. While researching blocking scenarios, I usually use the below SQL (for single instance databases), as it visually displays the hierarchy between blocking and blocker sessions. However, when I run it, it just hangs:

SQL> WITH sessions AS
           sid, blocking_session, row_wait_obj#, sql_id, sql_child_number, event, program
      FROM v$session)
SELECT LPAD(' ', LEVEL ) || sid sid, object_name, event, s.sql_id, s.sql_child_number,
       substr(sql_text,1,55) sql_text, program
  FROM sessions s 
  LEFT OUTER JOIN dba_objects 
       ON (object_id = row_wait_obj#)
  LEFT OUTER JOIN v$sql sq on (s.sql_id = sq.sql_id and sq.child_number = s.sql_child_number)       
 WHERE sid IN (SELECT blocking_session FROM sessions)
    OR blocking_session IS NOT NULL
 CONNECT BY PRIOR sid = blocking_session
 START WITH blocking_session IS NULL; 

<<-- SQL hangs

The problem is that the table transformation with subquery factoring (= the WITH clause) causes the resultset to be materialized, and thus it’s gets blocked. It’s similar to using the “/*+materialize*/” hint. As a workaround we can instruct Oracle to not materialize the resultset in the WITH subquery by using the /*+inline*/ hint. After doing that, we get the result:

SQL> WITH sessions AS
   (SELECT /*+inline*/
           sid, blocking_session, row_wait_obj#, sql_id, sql_child_number, event, program
      FROM v$session)
SELECT LPAD(' ', LEVEL ) || sid sid, object_name, event, s.sql_id, s.sql_child_number,
       substr(sql_text,1,55) sql_text, program
  FROM sessions s 
  LEFT OUTER JOIN dba_objects 
       ON (object_id = row_wait_obj#)
  LEFT OUTER JOIN v$sql sq on (s.sql_id = sq.sql_id and sq.child_number = s.sql_child_number)       
 WHERE sid IN (SELECT blocking_session FROM sessions)
    OR blocking_session IS NOT NULL
 CONNECT BY PRIOR sid = blocking_session
 START WITH blocking_session IS NULL; 

SID        OBJECT_NAME      EVENT                                    SQL_ID          CH# SQL_TEXT                                                PROGRAM
---------- ---------------- ---------------------------------------- ------------- ----- ------------------------------------------------------- ----------------------------------------
 10                         rdbms ipc message                                          0                                                         [email protected] (LGWR)
  13                        log file switch (archiving needed)                                                                                   [email protected] (W000)
  31                        log file switch (archiving needed)                                                                                   [email protected] (W003)
  34                        log file switch (archiving needed)       7t74hdvmxrdn6     0 INSERT INTO T2 SELECT * FROM T2                         [email protected] (TNS V1-V3)
  43                        log file switch (archiving needed)                                                                                   [email protected] (W007)
  248      C_OBJ#           log file switch (archiving needed)       9wncfacx0nj9h     0 insert into smon_scn_time (thread, time_mp, time_dp, sc [email protected] (SMON)
  254                       log file switch (archiving needed)       g4gp07gt2z920     0 update sys.scheduler$_job set  last_start_date = :1, ru [email protected] (J002)
   15      SCHEDULER$_JOB   buffer busy waits                        g4gp07gt2z920     0 update sys.scheduler$_job set  last_start_date = :1, ru [email protected] (J003)
  261                       log file switch (archiving needed)                                                                                   [email protected] (W002)
  264                       log file switch (archiving needed)       g4gp07gt2z920     0 update sys.scheduler$_job set  last_start_date = :1, ru [email protected] (J000)
   282     SCHEDULER$_JOB   buffer busy waits                        g4gp07gt2z920     0 update sys.scheduler$_job set  last_start_date = :1, ru [email protected] (J001)
  265                       log file switch (archiving needed)                                                                                   [email protected] (W004)
  269                       log file switch (archiving needed)                                                                                   [email protected] (W001)
  271                       log file switch (archiving needed)                                                                                   [email protected] (W005)
  278                       log file switch (archiving needed)                                                                                   [email protected] (W006)
  283                       log file switch (archiving needed)       fz15hxcj83qzk     0 INSERT INTO T1 SELECT * FROM T1 WHERE ROWNUM <= 100     [email protected] (TNS V1-V3)

16 rows selected.



SID=10 belongs to the logwriter process, which is in turn blocking quite a few other sessions. We can also see that SID=34 (the one inserting in an infinite loop) is trying to execute “INSERT INTO T2 SELECT * FROM T2”. More important, SID=283 is a blocked “sqlplus / as sysdba” login attempt, and it’s trying to execute the SQL “INSERT INTO T1 SELECT * FROM T1 WHERE ROWNUM <= 100” from the trigger. Knowing the blocked SQL text could give us a clue about the root cause of the issue. Unfortunately there’s no information in v$sql or v$session about the PL/SQL object (trigger) that the SQL is being called from:

FROM   v$session
WHERE  sid = 283;
  2    3
--------------------- ------------------------- --------------- -------------------

1 row selected.


  from   v$sql
 where  sql_id = 'fz15hxcj83qzk';
---------- -------------
         0             2

1 row selected.

SQL> select owner, object_name
  from   dba_objects
 where  object_id = 0;
  2    3
no rows selected



Another way to check the issue could be to use Tanel Poder’s ash_wait_chains script, which uses Active Session History (ASH) to show the whole chain of waiters and (final) blockers. This can be especially useful in case the blocking/waiting scenario is already resolved, and we’re researching the issue after the fact:

SQL> @ash/ash_wait_chains session_id||':'||event2||sql_opname session_id=34 sysdate-1/24/60 sysdate

-- Display ASH Wait Chain Signatures script v0.6 BETA by Tanel Poder ( )

%This     SECONDS        AAS #Blkrs WAIT_CHAIN                                                                                                                                                                                                                                                                                                   FIRST_SEEN          LAST_SEEN
------ ---------- ---------- ------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ------------------- -------------------
  97%          56         .9      1 -> 34:log file switch (archiving needed) INSERT -> [idle blocker 1,10,46017 ([email protected] (LGWR))]                                                                                                                                                                                                 2022-08-22 16:15:00 2022-08-22 16:15:59
   2%           1          0      1 -> 34:log file switch (archiving needed) INSERT -> 10:LGWR all worker groups  -> [idle blocker 1,11,19550 ([email protected] (LG00))]                                                                                                                                                                   2022-08-22 16:15:03 2022-08-22 16:15:03
   2%           1          0      1 -> 34:log file switch (archiving needed) INSERT -> 10:ON CPU                                                                                                                                                                                                                                                 2022-08-22 16:15:04 2022-08-22 16:15:04

3 rows selected.


LGWR is again shown as the blocker of our session SID=34.

More advanced research techniques

Let’s assume we don’t know the issue was related to the logon trigger – we just got alerted about a FRA full situation, checked the alert log, and can’t login to the database with sqlplus or rman. As the database seems hung, we can follow the advice from the MOS note “SRDC – How to Collect Standard Information for a Database Hang (Doc ID 1594905.1)“. A preliminary sqlplus connection can be created, and we could run the “oradebug hanganalyze” command giving us more clues about the cause of the blocking scenario.

Following the advice in the MOS document, I attach to an existing instance’s process with oradebug. Even though the note says to attach to PMON/SMON process, I’ll try to stay away from critical background processes if possible, as crashing them will bring the whole instance down. Instead I’ll connect to the MMON process, and try to execute the hanganalyze command:

[[email protected] ~]$ ps -ef |grep mmon
oracle   32221     1  0 16:57 ?        00:00:03 ora_mmon_orcl

[[email protected] ~]$ sqlplus -prelim / as sysdba

SQL*Plus: Release - Production on Mon Aug 22 20:55:31 2022

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

SQL> oradebug setospid 32221
Oracle pid: 28, Unix process pid: 32221, image: [email protected] (MMON)
SQL> oradebug tracefile_name
SQL> oradebug hanganalyze 3
Statement processed.
SQL> exit
Disconnected from ORACLE
[[email protected] ~]$

Checking the orcl_mmon_32221.trc trace and the session’s trace file, there’s unfortunately no hanganalyze information. That’s the issue which was mentioned the above MOS document, and which is the very reason we’re attaching to a process with an already established session context. The reason for this behavior, a workaround, and an explanation about preliminary connections is available in this Tanel Poder’s post.

Repeating the above steps, but this time using “oradebug dump hanganalyze 3″, we get a usable hanganalyze file:

$ sqlplus -prelim / as sysdba
SQL> oradebug setospid 32221
SQL> oradebug tracefile_name
SQL> oradebug dump hanganalyze 3
Statement processed.

This MOS note provides more information on how to interpret the trace file: Interpreting HANGANALYZE Trace Files to Diagnose Hanging and Performance Problems for 9i and 10g. (Doc ID 215858.1). Inspecting the orcl_mmon_32221.trc file, we can search for the SID indicated as “LEAF” (or LEAF_NW, meaning “LEAF not in a wait”, so on CPU), as those indicate the final blocker. It’s SID=10:

State of LOCAL nodes

As we’re interested in our PID=1767/SID=283 (remember we found those two in the listings above), we can search the trace file for that process. It’s in lines 7 and 9 in the below snippet:

Chain 13:
    Oracle session identified by:
                instance: 1 (orcl.orcl)
                   os id: 1767
              process id: 37, [email protected] (TNS V1-V3)
              session id: 283
        session serial #: 55959
             module name: 0 ([email protected] (TNS V1-V3))
    is waiting for 'log file switch (archiving needed)' with wait info:
            time in wait: 1 min 40 sec (last interval)
            time in wait: 234 min 15 sec (total)
           timeout after: never
                 wait id: 6
                blocking: 0 sessions
          current sql_id: 4018450667
            wait history:
              * time between current wait and wait #1: 0.000000 sec
              1.       event: 'latch free'
                 time waited: 0.000191 sec
                     wait id: 187              p1: 'address'=0x7f353c20
                                               p2: 'number'=0x3
                                               p3: 'why'=0x23
              * time between wait #1 and #2: 0.000000 sec
              2.       event: 'log file switch (archiving needed)'
                 time waited: 7.664251 sec (last interval)
                 time waited: 232 min 35 sec (total)
                     wait id: 6
              * time between wait #2 and #3: 0.000000 sec
              3.       event: 'latch: redo writing'
                 time waited: 0.000156 sec
                     wait id: 186              p1: 'address'=0x6006d480
                                               p2: 'number'=0x16f
                                               p3: 'why'=0x10
    and is blocked by 'instance: 1, os id: 20115, session id: 10',
    which is a member of 'Chain 1'.

Chain 13 Signature: 'rdbms ipc message'<='log file switch (archiving needed)'
Chain 13 Signature Hash: 0xfc07a352

“is blocked by ‘instance: 1, os id: 20115, session id: 10’, which is a member of ‘Chain 1’.” indicates the blocker, that’s the LGWR process:

=> Oracle session identified by:
               instance: 1 (orcl.orcl)
                  os id: 20115
             process id: 20, [email protected] (LGWR)
             session id: 10
       session serial #: 46017

What we can’t see in the above trace is the SQL text our blocked sqlplus session is waiting to execute – there’s “current sql_id: 4018450667”, but no SQL text. Usually the sql_id line in trace file is followed by the SQL text and “short stack” lines, but in this case it’s missing.  One option to obtain more information is to repeat the steps of logging in with a preliminary connection and execute a systemstate dump with “oradebug dump systemstate 258”, though on systems with large shared memory structures, the produced tracefiles can be very large. Instead of a systemstate dump, we can opt for a more contained and focused dump, and thus generate a processtate dump, which limits the output to just the targeted process, or even only run “oradebug dump errorstack 3”:

$ sqlplus -prelim / as sysdba
SQL> oradebug setospid 1767  <<-- OSPID of our blocked sqlplus session
Oracle pid: 37, Unix process pid: 1767, image: [email protected] (TNS V1-V3)
SQL> oradebug tracefile_name
SQL> oradebug dump processstate 10
Statement processed.
(optionally also):
SQL> oradebug dump errorstack 3

If looking at the process state dump file, search for “Session Cursor Dump” (if looking at the systemstate dump, we first have to locate the State Object for the process 1767). The currently executing cursor is “Current cursor: 2”:

----- Session Cursor Dump -----
Current cursor: 2, pgadep=2

Searching the trace file for the text “Cursor#2”, the SQL which the session is trying to execute is:


Scrolling down the trace file, there’s a section with a more user-readable format:

SQL Information
user_id=107 user_name=U1 [email protected] (TNS V1-V3) action=
sql_id=fz15hxcj83qzk plan_hash_value=-458591652 problem_type=0 command_type=2
----- Current SQL Statement for this session (sql_id=fz15hxcj83qzk) -----
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x71d4c6e8         2  anonymous block

The SQL is being executed from an anonymous block, so we don’t see a direct connection with the U1.TRG trigger. Checking the trace file for the reported “object handle=0x71d4c6e8” (line 10 above), there’s a clue in the below excerpt:

        ObjectName:  Name=begin
  insert into t1 select * from t1 where rownum <= 100;
          FullHashValue=3c74be99dd87d9064bd57f96bbed98a1 Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=0 ContainerUid=0 Identifier=3152910497 OwnerIdn=107 
        Statistics:  InvalidationCount=1 ExecutionCount=22 LoadCount=3 ActiveLocks=1 TotalLockCount=22 TotalPinCount=1 
        Counters:  BrokenCount=1 RevocablePointer=1 KeepDependency=1 Version=0 BucketInUse=21 HandleInUse=21 HandleReferenceCount=0 
        Concurrency:  DependencyMutex=0x6ff6aff0(0, 2, 0, 0) Mutex=0x6ff6b090(0, 151, 0, 0) 
        Flags=RON/PIN/TIM/PN0/DBN/[10012841] Flags2=[0000] 
        Timestamp:  Current=08-22-2022 15:30:59 
        HandleReference:  Address=0x6ff6b148 Handle=(nil) Flags=[00] 
        LibraryObject:  Address=0x71d4c9c8 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] Flags3=[0000] PublicFlags=[0000] 
          ChildTable:  size='16' 
            Child:  id='0' Table=0x71d4d840 Reference=0x71d4d318 Handle=0x71d4c6e8 
          Parent Cursor:  sql_id=4rpbzkuxyv651 parent=0x71d4ca98 maxchild=1 plk=y ppn=n prsfcnt=0 obscnt=0

Line 17 reports “Handle=0x71d4c6e8” as a “Child” for the state object in lines 1-3, which is the actual body of the trigger. That could give us clues for further research of the INSERT statement origin, as the whole trigger body is printed. In a “real” trigger there would probably be more code than just a simple INSERT, possibly even with comments. But again, there’s no direct connection to the U1.TRG state object.

For completeness – there actually is a state object for the U1.TRG trigger in the tracefile, though I couldn’t find a connection leading from the above PL/SQL block to the trigger’s state object.


In specific situations, after logon triggers can block all users from creating a session, and thus care has to be taken when implementing or deploying them. The research techniques presented above post can hopefully be useful to the reader when diagnosing similar, or even different, issues.

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Oracle Database Consultant
Jure Bratina is an experienced Oracle DBA currently working as a Database Consultant with The Pythian Group. The areas he most likes to work on are troubleshooting and performance tuning of Oracle databases. Jure is also a member of the Executive Committee at the Slovenian Oracle User Group (SIOUG).

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *