REGEXP_REPLACE: Auto-Translate SQL Literals to Bind Vars

Posted in: Technical Track

Most DBAs are probably aware of the cursor_sharing feature in the Oracle optimizer that replaces all literals in the where clause of a SQL statement by bind variables. This allows Oracle to calculate a Signature from the resulting SQL string that can be used for various tasks, like sharing “like” statement cursors and identifying corresponding sql_profiles for statements. I’ve come to appreciate the bind variables even more when it comes to get some aggregate run-time statistics for statements. We can group by the Signature for this purpose, and that covers all statements, not just those that were submitted with bind vars. This Signature is very handy – it is also stored in various internal tables. (See Maris Elsins’ blog post here for more details about the SQL signature.)

Despite the convenience that the Signature provides, there are several situations where we would like to have the ability to convert an SQL string that contains literals into a statement that only contains bind variables. (Read the whole post, you will see what I mean.) I searched for an Oracle-supplied way to do this and could not find one. So, I decided to see if I could get the SQL strings transformed in a way that resembles the result of cursor_sharing. Using the regexp_replace function, I could get far enough to give me the result I wanted, that is, all string literals (strings within two single quotes) and numeric literals replaced by a generic bind variable name (:STR and :NUM). That was enough for my needs. I did not venture to try to get a different variable name for each literal. It would not have provided me any additional benefit, and I have no clue if it’s feasible at all using a single “non-procedural” expression.

Here is an example of a statement before and after translation:

Before: select 'username:'||substr(USERNAME,+1, 3), -1234.56 NEG_NUM from dba_users   where username like 'SYS%'
After : select ':STR'||substr(USERNAME,+:NUM, :NUM), -:NUM NEG_NUM from dba_users where username like ':STR'

Now I can get aggregated stats for any “similar” queries. I just need to supply the appropriate SQLl string source to the expression. :)

The basic regular expressions that I use are the following :

1- To remove duplicate white space and all carriage returns and linefeeds (this flattens the query into one line):

   regexp_replace(sql_text, '[[:space:]]+',' ' )

2- To transform all numeric literals into “:NUM”. The premise is that all numeric literals must be preceded by either one of "(=+, -" Otherwise, it is considered to be an identifier or a string literal:

   regexp_replace('([\(\=\<\>\+, -])[0-9]+\.?[0-9]*','\1:NUM')

3- To transform all string literals into “‘:STR'”. For example, the string “select ‘abc’ from dual” would be translated into “select ‘:STR’ from dual” :

   regexp_replace('''(.)*?''',''':STR''' )

The final Oracle function call looks like this (three embedded regexp_replace):

   regexp_replace(
   regexp_replace(
   regexp_replace(sql_text, '[[:space:]]+',' ' ),
                  '([\(\=\<\>\+, -])[0-9]+\.?[0-9]*','\1:NUM'),
                  '''(.)*?''',''':STR''' )

Here’s an example using faked queries and stats:

- create a holding table for faked queries and stats
create table billette_sqltext
as
select sql_id,
       sorts,
       executions,
       disk_reads,
       buffer_gets,
       USER_IO_WAIT_TIME,
       ROWS_PROCESSED,
       sql_fulltext
from v$sqlarea
where rownum < 1
;

- Generate 500 faked queries and stats
begin
 for x in 1..5 loop
  insert into billette_sqltext
   select sql_id, sorts, executions, disk_reads, buffer_gets, USER_IO_WAIT_TIME, ROWS_PROCESSED, statement
     from
    (select sql_id, sorts, executions, disk_reads, buffer_gets, USER_IO_WAIT_TIME, ROWS_PROCESSED, rownum rnum
       from v$sqlarea where sql_id not in (select sql_id from billette_sqltext) and rownum<101) sa,
    (select 'select '''||table_name||''' from '||owner||'.'||table_name||' where a = '||rownum statement, rownum rnum
       from dba_tables where owner = 'SYS' and rownum < 101) st
    where sa.rnum = st.rnum
    ;
 end loop;
commit;
end;
/

-- Get some aggregate stats for similar queries
col normalized_statement for a100
select sum(sorts) sorts,
       sum(executions) sum_executions,
       sum(disk_reads) sum_disk_reads,
       sum(buffer_gets) sum_buffer_gets,
       round(sum(user_io_wait_time)/1000000) sum_user_io_wait_time_s,
       sum(rows_processed) sum_rows_processed,
       regexp_replace(
       regexp_replace(
       regexp_replace(to_char(sql_fulltext), '[[:space:]]+',' ' ),
                      '([\(\=\<\>\+, -])[0-9]+\.?[0-9]*','\1:NUM'),
                      '''(.)*?''',''':STR''' ) normalized_statement
  from billette_sqltext
 where sql_fulltext like '%SYS.APPLY%'
 group by regexp_replace(
          regexp_replace(
          regexp_replace(to_char(sql_fulltext), '[[:space:]]+',' ' ),
                         '([\(\=\<\>\+, -])[0-9]+\.?[0-9]*','\1:NUM'),
                         '''(.)*?''',''':STR''' )
order by normalized_statement, sum_disk_reads Desc
;

     SORTS SUM_EXECUTIONS SUM_DISK_READS SUM_BUFFER_GETS SUM_USER_IO_WAIT_TIME_S SUM_ROWS_PROCESSED NORMALIZED_STATEMENT
---------- -------------- -------------- --------------- ----------------------- ------------------ ----------------------------------------------------------------------------------------------------
         3             12              1            3847                       0                 24 select ':STR' from SYS.APPLY$_CHANGE_HANDLERS where a = :NUM
       155              2         475018         1373125                     231                 24 select ':STR' from SYS.APPLY$_CONF_HDLR_COLUMNS where a = :NUM
        42             19             19            1217                       0                 19 select ':STR' from SYS.APPLY$_CONSTRAINT_COLUMNS where a = :NUM
        33              4              1            1547                       0                  4 select ':STR' from SYS.APPLY$_DEST_OBJ where a = :NUM
       350              4            306           22066                       0                 61 select ':STR' from SYS.APPLY$_DEST_OBJ_CMAP where a = :NUM
         3              5             48            3124                       0                  4 select ':STR' from SYS.APPLY$_DEST_OBJ_OPS where a = :NUM
        95              4            167            4593                       0                 29 select ':STR' from SYS.APPLY$_ERROR where a = :NUM
        94              5             74            2325                       0                  5 select ':STR' from SYS.APPLY$_ERROR_HANDLER where a = :NUM
        93              4           6930          356589                       5                 10 select ':STR' from SYS.APPLY$_ERROR_TXN where a = :NUM
       222              6            106           41380                       0                 10 select ':STR' from SYS.APPLY$_SOURCE_OBJ where a = :NUM
        63              3            169            2710                       0                  3 select ':STR' from SYS.APPLY$_SOURCE_SCHEMA where a = :NUM
       144              4             13           10185                       0                 23 select ':STR' from SYS.APPLY$_VIRTUAL_OBJ_CONS where a = :NUM

12 rows selected.

Here is a real-life example:

I have used this regular expression transformation to compare the “logical” OBIEE queries from its report tracking table. I could generate some interesting run-time statistics by statement version. (Note that OBIEE actually runs something other in the db than these logical queries. Also, the one report I looked at below actually runs 9 different versions of a query, depending on the parameters supplied by the user and that could be useful to identify which of these executions runs slower or is more popular etc.).

  set linesize 120 pagesize 1000
  select /*+parallel(8)*/
         count(*) cnt,
         sum(case when total_time_sec  < 8 then 1 else 0 end) num_lt_8s,
         sum(case when total_time_sec >= 8 then 1 else 0 end) num_ge_8s,
         round(avg(total_time_sec),2) avg_time,
         min(total_time_sec) min_time,
         max(total_time_sec) max_time,
         min(start_dt) first_run,
         max(start_dt) last_run,
         regexp_replace(
         regexp_replace(
         regexp_replace(query_text, '[[:space:]]+',' ' ),
                        '([\(\=\<\>\+, -])[0-9]+\.?[0-9]*','\1:NUM'),
                        '''(.)*?''',''':STR''' ) normalized_statement
    from SIEBLE_U.S_NQ_ACCT
   where saw_src_path = '/Path/to/the/report'
   group by
         regexp_replace(
         regexp_replace(
         regexp_replace(query_text, '[[:space:]]+',' ' ),
                        '([\(\=\<\>\+, -])[0-9]+\.?[0-9]*','\1:NUM'),
                        '''(.)*?''',''':STR''' )
  order by count(*) desc
  ;

         CNT  NUM_LT_8S  NUM_GE_8S   AVG_TIME   MIN_TIME   MAX_TIME FIRST_RUN          LAST_RUN
  ---------- ---------- ---------- ---------- ---------- ---------- ------------------ ------------------
  NORMALIZED_STATEMENT
  ------------------------------------------------------------------------------------------------------------------------
      535187     449674      85513      11.04          0       5142 11-OCT-11 00:00:00 15-JAN-13 00:00:00
  SELECT WGET."Member ID" saw_0, WGET."Merchant ID" saw_1, WGET."Merchant Name" saw_2, WGET."Order ID" saw_3, evaluate(':S
  TR',':STR',cast("Transaction Period".Date as char)) saw_4, WGET."Transaction Time" saw_5, WGET."SKU Number" saw_6, evalu
  ate(':STR',':STR',cast(WGET."Sales($)" as char(:NUM))) saw_7, WGET.Quantity saw_8, evaluate(':STR',':STR',cast(WGET."Com
  missions($)" as char(:NUM))) saw_9, evaluate(':STR',':STR',cast("Process Period".Date as char)) saw_10, WGET."Process Ti
  me" saw_11, WGET.TID saw_12, ':STR' saw_13 FROM TABLE1 WHERE ("Process Period"."Date Range" BETWEEN date ':STR' AND date ':
  STR') AND (Network."Network ID" = :NUM) ORDER BY saw_0, saw_1, saw_2, saw_3, saw_4, saw_5, saw_6, saw_7, saw_8, saw_9, s
  aw_10, saw_11, saw_12, saw_13

       84210      18901      65309      35.34          0       4243 28-FEB-12 00:00:00 15-JAN-13 00:00:00
  SELECT WGET."Member ID" saw_0, WGET."Merchant ID" saw_1, WGET."Merchant Name" saw_2, WGET."Order ID" saw_3, evaluate(':S
  TR',':STR',cast("Transaction Period".Date as char)) saw_4, WGET."Transaction Time" saw_5, WGET."SKU Number" saw_6, evalu
  ate(':STR',':STR',cast(WGET."Sales($)" as char(:NUM))) saw_7, WGET.Quantity saw_8, evaluate(':STR',':STR',cast(WGET."Com
  missions($)" as char(:NUM))) saw_9, evaluate(':STR',':STR',cast("Process Period".Date as char)) saw_10, WGET."Process Ti
  me" saw_11, WGET.TID saw_12, ':STR' saw_13 FROM TABLE1 WHERE ("Process Period"."Date Range" BETWEEN date ':STR' AND date ':
  STR') AND (Network."Network ID" = :NUM) AND (Advertiser."Advertiser ID" = :NUM) ORDER BY saw_0, saw_1, saw_2, saw_3, saw
  _4, saw_5, saw_6, saw_7, saw_8, saw_9, saw_10, saw_11, saw_12, saw_13

       72378      53443      18935      32.87          0      11967 24-NOV-11 00:00:00 15-JAN-13 00:00:00
  SELECT WGET."Member ID" saw_0, WGET."Merchant ID" saw_1, WGET."Merchant Name" saw_2, WGET."Order ID" saw_3, evaluate(':S
  TR',':STR',cast("Transaction Period".Date as char)) saw_4, WGET."Transaction Time" saw_5, WGET."SKU Number" saw_6, evalu
  ate(':STR',':STR',cast(WGET."Sales($)" as char(:NUM))) saw_7, WGET.Quantity saw_8, evaluate(':STR',':STR',cast(WGET."Com
  missions($)" as char(:NUM))) saw_9, evaluate(':STR',':STR',cast("Process Period".Date as char)) saw_10, WGET."Process Ti
  me" saw_11, WGET.TID saw_12, ':STR' saw_13 FROM TABLE1 WHERE ("Process Period"."Date Range" BETWEEN date ':STR' AND date ':
  STR') AND (Network."Network ID" <> :NUM) ORDER BY saw_0, saw_1, saw_2, saw_3, saw_4, saw_5, saw_6, saw_7, saw_8, saw_9,
  saw_10, saw_11, saw_12, saw_13

  ...

  9 rows selected.

In Summary:

Aggregations, similar to the above, can be done with any SQL source table, including the AWR and v$ views. It can be used on substrings of the statements if the whole statement is over 4000 characters (regexp_replace is bound to the varchar2 limit) or if you want to aggregate queries that are too different from each other. In other words, it’s a generic string parser and it’s up to you to supply the strings that suits your needs, and that makes it very flexible. :)

Some caveats:

– It may replace more strings than you want. For example, a column named "col-1" would be translated to "col-:NUM" because 1 is preceded by a minus sign (dash). You can prevent that by removing the minus from the regular expression

– It may miss some numeric values if it is preceded by a character I didn’t add in the list. You can add the character to the list if you find such case. Always test the result to ensure it doesn’t break the translation. Regexp can be very picky and unpredictable…

– 4000 characters is all a regexp_replace can handle. Any queries larger than that will need to be truncated or transformed to fit in 4000 characters.

Hope you will find this regexp_replace expression as useful as I did.

Cheers,
Marc

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

1 Comment. Leave new

Great use of regular expressions.
The only thing I struggle to understand is how regexp_replace('''(.)*?''',''':STR''' ) replaces “‘A’ and ‘B'” by “‘:STR’ and ‘:STR'” instead of single “‘:STR'”.
Must be something wrong with my regexp skills…

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *