OEM 13c Monitoring Features – Part 4, Corrective Actions

Posted in: Technical Track

This is the fourth in a series of blog posts covering the most important topics you should know to completely monitor your IT infrastructure with Oracle Enterprise Manager Cloud Control 13c. The previous posts covered how to set up basic monitoring, how to use extended metrics and how to create reports based on metric data.

Oracle Enterprise Manager (OEM) Corrective Actions allow OEM to take immediate action as soon as an event is raised, such as a metric alert. In the 13c version, Oracle already includes a corrective action to automatically add space to a tablespace whenever it meets certain criteria, such as free space on the diskgroup / file system, and so on.

With corrective actions, it is possible to significantly reduce the amount of work a human has to perform whenever alerts are raised. Besides the one mentioned above, PL/SQL packages and shell scripts are also accepted by corrective actions to perform a wide variety of automated actions. For example, an extended metric that uses a simple shell script to query a particular URL, and whenever the result of this script indicates an issue with the URL, OEM will raise an alert which will then trigger a corrective action to automatically restart the HTTP server.

There is a wide range of possibilities for the combination of extended metrics and corrective actions. Using the extended metric example presented previously, I’ll create a corrective action that will automatically list all the pending calls and send it by e-mail whenever the threshold for the extended metric is reached. This is just an example to show how corrective actions and its notifications works, but as explained above, these powerful tools may be used to monitor virtually anything in the IT environment. Just be creative!

Go to “Enterprise -> Monitoring -> Corrective Actions”

Select what type of command should be used for the corrective action and click on the “Go” button:

Define a name and which type of events may trigger this corrective action:

In the parameters tab, input the SQL or PL/SQL code for the corrective action. This is just a simple example, using a regular query:

On the table to the left named “Target and Event Properties”, OEM displays all the variables available for use. These variables are related to the targets / events that triggered the corrective action. The example above uses the %TargetName% variable in the “SELECT” statement. Another very useful variable is the “%key_value_name%”. This variable holds the name of the key record that triggered the alert. For example, when dealing with a “Tablespace Full (%)” metric, the “%key_value_name%” variable will hold the name of the actual tablespace that triggered the alert and “%key_value% will hold the actual utilization percentage for that tablespace.

More details on the environment variables can be found at Oracle® Enterprise Manager Cloud Control Administrator’s Guide.

As mentioned before, OEM 13c offers a default corrective action to automatically add new datafile(s) to a tablespace, but in previous versions, a personalized PL/SQL code was required to do that. One example of such code is displayed below:

SET SERVEROUTPUT ON;
WHENEVER SQLERROR EXIT SQL.SQLCODE;

DECLARE

 v_dg         VARCHAR2(100);
 v_pctfree_dg NUMBER(20);
 v_tablespace VARCHAR2(100);
 v_stmt       VARCHAR2(100);

BEGIN

  v_tablespace := '%key_value_name%' ;

  SELECT DISTINCT SUBSTR(file_name, 2,INSTR(file_name,'/',1,1)-2)
    INTO v_dg 
    FROM dba_data_files;

  SELECT ((free_mb / DECODE (type,'NORMAL',2,
                                  'HIGH',3,
                                  'EXTERN',1)) / 
          (total_mb / DECODE (type,'NORMAL',2,
                                   'HIGH',3,
                                   'EXTERN',1)))*100 
    INTO v_pctfree_dg 
    FROM v$asm_diskgroup 
   WHERE name = v_dg;


  IF ( v_pctfree_dg > 5 ) THEN

    v_stmt := 'ALTER TABLESPACE ' || v_tablespace || ' ADD DATAFILE SIZE 1g AUTOEXTEND ON NEXT 512m ';
    EXECUTE IMMEDIATE v_stmt;
    dbms_output.put_line('Successfully added datafile to: ' || v_tablespace );

  ELSE

    dbms_output.put_line('Not enough free space to add datafile to: ' || v_tablespace );
    RAISE_APPLICATION_ERROR (-20200, 'Not enough free space on diskgroup ' || v_dg || ' (' || v_pctfree_dg || ' MB free),
                                      to add datafile to: ' || v_tablespace );
  END IF;

END;
/

Note: This code was created for an environment that used a single ASM diskgroup for datafiles but didn’t have the “db_create_file_dest” parameter set, so we just query the name of the diskgroup directly from the “DBA_DATA_FILES” view.

When using a PL/SQL, it is important to include the “SET SERVEROUTPUT ON” at the first line of the “SQL Script” text box or Oracle won’t display any output. Also, don’t forget to include the “/” at the end of it, otherwise the PL/SQL code will not be executed.
Next, on the “Access” tab, select whether to grant access to other administrators so they can change / use this corrective action. After setting all these parameters, click the “Save to Library” button.
The newly created corrective action will be displayed on the main page:
In version 13c, corrective actions need to be published before being used, so select the action and click on the “Publish” button after creating it. It will ask for confirmation.

Although adding the corrective action to the library is not mandatory, in versions prior to 13c, it’s good practice to have all corrective actions stored in the library.

Now that the corrective metric is in place, OEM needs to know when to execute it. To do so, go back to the target / monitoring template to which the corrective action will be added. To keep a healthy and easily maintainable OEM environment, is always better to use monitoring templates for the different target types and different environments setup. Here we’ll add the corrective action to the “Production DB Monitoring Template” and then apply this template to all the Production DBs. Go to the monitoring template page (Enterprise -> Monitoring -> Monitoring Templates), select the desired template and click on the “Edit” button:

Now go to the “Metric Thresholds” tab, locate the desired metric to add the corrective action – in this case “Pending Calls” – and edit it by clicking on the pencil button at the left.
The next page displays an empty corrective action section. Initially, we’ll execute a corrective action only when the critical threshold is reached, so click on the “Add” button right in front of the “Critical”:

Oracle will display the corrective actions that match the “Event” and “Target” types:

Select the credentials to be used, and click on “Continue”. Back at the metric edit page, the newly selected corrective action is now displayed for the “Critical” threshold:

Click on the “Continue” button to save the settings and go back to the edit page of the “Monitoring Template”, then click on the “OK” button to save the “Monitoring Template”.

Now it’s important to include the “Corrective Actions” in the incident rules, otherwise OEM wot’t send notifications when this action is executed, to enable notifications go to “Setup -> Incidents -> Incident Rules”.

The corrective action notification can be added to an existing rule set. Or simply create a new one to gather only corrective actions events. Below is an example of how to add these notifications to an existing set:

Select the desired “Rule Set” and click on the “Edit” button, then select the specific rule that will notify whenever the corrective action is executed and click on “Edit” again, or simply add a new rule to the existing rule set.

Once inside the rule edit, the existing metric for which OEM should send notifications when the “Corrective action” is executed:
Now select if you want to be notified about either successful or failed executions of the corrective action and for which thresholds :
After selecting the correct options, click the “OK” button, and now OEM will display the following:
Check the next parameters of the rule by clicking on the “Next” button and then save it by clicking on the “Continue” button on the last page. After that, save the “Rule Set” by clicking the “Save” button. Don’t forget this step, otherwise all changes to the “Rule Set” will be lost.
Now OEM is all set to send notifications whenever a corrective action is executed for the “Pending Calls” metric. Just push the monitoring template to the desired targets and then whenever an event is raised for that metric, OEM will send the following notifications:
The first one is just the alert based on the threshold we have set, and the following one is the “Corrective Action” (CA), that was executed, displaying the name of the action “LIST PENDING CALLS”, the status “Successful”, and the alert that triggered the action “The amount of CALLS is: 78”:
……
Stay tuned for the next blog post which will cover specific recommendations from Oracle for best practices and Automatic Incident Creation.
email

Interested in working with Fernando? Schedule a tech call.

Senior Oracle Database Consultant

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *