I’m sure one of the most popular scripts for Apps DBAs on My Oracle Support is cmclean.sql from MOS Article ID 134007.1 “Concurrent Processing – CMCLEAN.SQL – Non Destructive Script to Clean Concurrent Manager Tables”. DBAs usually use the script to clean up stale data from concurrent processing tables (FND_CONCURRENT_%) after incidents like a crash of the database or concurrent processing node. This script sets correct completion phase and status codes for terminated concurrent requests and sets correct control codes for terminated concurrent manager processes. Despite the assuring “Non Destructive” claim in the title of the MOS Article there is a possibility to lose concurrent request schedules when cmclean.sql is executed.
First of all it’s important to understand how scheduled concurrent requests are executed and resubmitted. A simplified process of the execution is:
- Concurrent manager process (e.g. FNDLIBR in case of Standard Manager) queries the FND_CONCURRENT_REQUESTS table for pending requests.
- When a pending request is found, the manager process updates the PHASE_CODE=R (Running) and STATUS_CODE=R (Running).
- The next step is to start the executable of the concurrent program. If it’s a PL/SQL procedure – FNDLIBR connects to the DB and executes the PL/SQL code, if it’s a java program – FNDLIBR starts up a java process to execute the java class, etc.
- FNDLIBR catches the exit codes from the executable of the concurrent program and updates the statuses in FND_CONCURRENT_REQUESTS accordingly – PHASE_CODE=C (Completed) and STATUS_CODE = C (Normal), G (Warning) or E (Error).
- FNDLIBR checks if the concurrent request has a schedule and needs to be resubmitted. If yes – it resubmits a new concurrent request with the same parameters.
But what happens if the FNDLIBR process crashes, terminates or gets killed while it’s running a concurrent request? Who takes care of the statuses in FND_CONCURRENT_REQUESTS table and how the request is resubmitted if the concurrent manager process is not there anymore?
It appears the Internal Concurrent Manager (ICM) takes care of these tasks. It checks the running requests periodically (every two minutes by default) and if it finds any that are missing the concurrent manager process and the DB session, it updates the statuses for the concurrent request and also resubmits it if it has a schedule. This action is followed by a log entry in the ICM log file:
Process monitor session started : 17-JUL-2013 04:24:24 Found running request 5829148 attached to dead manager process. Setting request status to completed. Found dead process: spid=(15160), cpid=(2032540), ORA pid=(35), manager=(0/0) Starting STANDARD Concurrent Manager : 17-JUL-2013 04:24:25 Process monitor session ended : 17-JUL-2013 04:24:25
Interesting to note, if the Internal Concurrent Manager is terminated at the same time with the manager process and is restarted later by the reviver process or by running “adcmctl.sh start” manually, the ICM performs the same check of running requests as part of the startup sequence, but this time it restarts the request instead of terminating and resubmitting it. The log of the ICM contains the following lines:
Found running request 5829146 attached to dead manager process. Attempting to restart request.
The concurrent request is started again with exactly the same request_id as the previous time it was terminated, and the log file of the request will contain information from 2 executions – the 1st which didn’t complete and then the 2nd which probably completed. I think this scenario is very confusing and instead of restarting the request it should better be terminated and a new one should be submitted.
Let’s get back to the problem with cmclean.sql! The worst thing that can be done is running cmclean.sql after the crash of the concurrent processing node before starting up the concurrent managers. Why? Because cmclean.sql cleans up data in FND_CONCURRENT_REQUESTS by executing one simple update statement to change the phase and status of any “Running” or “Terminating” request to “Completed/Error”:
UPDATE fnd_concurrent_requests SET phase_code = ‘C’, status_code = ‘E’ WHERE status_code =’T’ OR phase_code = ‘R’;
Cmclean.sql does not resubmit the request if it has a schedule. Execute it and you risk to lose some scheduled programs without any warning.
Similarly – never run cmclean.sql if you stopped the concurrent managers using “adcmctl.sh abort” or “kill -9” on concurrent manager processes to speed up the shutdown procedure. There’s the same risk to lose some scheduled requests.
Despite the risks, cmclean.sql is still a useful tool in case concurrent managers don’t come up after a failure or there are some stale data that is otherwise not cleaned up. But please, be careful when you run it! Check closely the list of requests reported in the following section of the outputs from cmclean.sql, because these requests have to be resubmitted manually if they had schedules.
— Updating any Running or Terminating requests to Completed/Error Request ID Phase Status ———- —— —— 6607 R W 6700 R W 893534056 R R 3 rows updated.
“Concurrent Manager Recovery” wizard is even worse! (Added on Jul 21, 2013)
After posting this article I started thinking about whether the “Concurrent Manager Recovery” Wizard available from Oracle Applications Manager in e-Business Suite was any better then cmclean.sql or not. As I didn’t have much experience with it I decided to give it a try. This is what I did:
- I scheduled 2 concurrent programs (“CP Java Regression Test” and “CP PLSQL Regression Test”) to restart in 1 minute after the previous execution completes. These are simple test concurrent programs which sleep for some time and then complete.
- I made sure both programs were running and terminated all concurrent manager process and DB sessions for these concurrent programs.
- The termination of the processes and sessions left the rows in FND_CONCURRENT_REQUESTS with PHASE_CODE=R and STATUS_CODE=R
- I executed the “Concurrent Manager Recovery” wizard which fixed the status codes of the concurrent manager processes, but didn’t touch the statuses of the concurrent requests – I thought this was a good thing (I expected the ICM to clean up the statuses and resubmit the requests at its startup phase)
- I started up the concurrent managers, but ICM didn’t clean up the 2 stale records in FND_CONCURRENT_REQUESTS table. The 2 requests appeared as they would be running, while in fact they didn’t have any OS processes or DB sessions.
I didn’t have much time to look into the details, but it looks like the ICM is only cleaning up requests attached to dead managers (“Active” status in the FND_CONCURRENT_PROCESSES table and no OS processes running). Here, the Wizard updated the statuses of the manager processes as if they completed normally, so the ICM couldn’t identify them as being “dead”.
This actually means that the “Concurrent Manager Recovery” wizard can cause serious issues too – it doesn’t clear up the concurrent_request statuses and it prevents ICM from doing it too, so once we start up the system the terminated requests appear as if they were running. And because of this, the Conflict Resolution Manager might prevent execution of some other programs with the incompatibility rules against the terminated requests. You will need to stop the managers and run cmclean.sql to fix the statuses (and loose the schedules) to get out of this situation.
So what should we do to clean up the concurrent processing tables after crashes or cloning? (Added on Jul 21, 2013)
It appears to me that no reliable way exists to clean up the tables properly. The cmclean.sql can remove some schedules without warning. The “Concurrent Manager Recovery” wizard may leave some requests in the running state even if they were terminated.
I’m going to open a SR for Oracle to request a proper solution, but meanwhile I’d suggest to use the cmclean.sql. However, make sure to check its outputs carefully and reschedule any requests which got cleaned up (as described above).
P.S. The description of the behavior of ICM in this blog post is a result of investigation performed on R12.1.3. I believe it behaves the same way in R12.1 and probably even in R12.0 and 11i, but I didn’t check. MOS Article ID 134007.1 which contains the cmclean.sql script is valid for Applications versions 10.7 to 12.1.3 – be careful when using it independently from the version of your e-Business Suite installation.
19 Comments. Leave new
Fantastic Maris…Informative…cmclean is the instrument most of the time for impatient DBA’s who do not wait for a complete CM shutdown. But in production environments, most of the time the CP schedule would be repetitive, So, when cmclean misses one schedule, the next scheduled run should be running subsequently. If it was not the case, in a environment with 15000 concurrent programs/day , even a single cmclean would badly hit. But cmclean has never disturbed the CP schedules because of repititiveness .Not sure, if i am correct..But beleive thats how we having been escaping with the with cmclean drawback you explained :)
Not sure if I understood you correctly, but what I’m trying to say here is that cmclean is able to “unschedule” the scheduled requests without even letting you know about it, that is – cmclean can remove that repetitiveness you are talking about.
Interesting that you haven’t ever been hit by it, it might have to do something with the way how the shutdown of the managers is speeded up. Could you elaborate a bit more on this?
Maris,
I started to use cmclean.sql to avoid errors like
“An error occurred while attempting to establish an Applications File Server connection with the node FNDFS_”
on cloned environment.
But last cloning magically hits to lost schedulled requests as you described (cmclean script worked so long this time).
So I doubt of using cmclean.sql with clonning procedure?
Hi Viacheslav,
I agree cmclean is a good thing to run after cloning to clean up things. This is what I’ve been doing too. Luckily for me most of my clients don’t want to leave all the requests scheduled so in most cases we put them on hold anyway (may be this is why we escaped the issue with cmclean so far).
I haven’t tested this, but I think you could try removing “UPDATE fnd_concurrent_requests SET phase_code = ‘C’, status_code = ‘E’ WHERE status_code =’T’ OR phase_code = ‘R’;” statement from the cmclean.sql before running it in the cloned database. The concurrent requests should get cleaned up and properly rescheduled when the ICM comes up.
Maris, thank you for tips!
I’ll try.
In my stupid situation consultants prefer run schedulled requests because their clients prefer “forget” run requests manually :)
My suggestion was not good. see the explanation in “Concurrent Manager Recovery wizard is even worse!” section I just added. You’ll end up in the same situation if you comment the update statement on FND_CONCURRENT_REQUESTS.
Thanks for taking your time Maris and describe the issue. It sounds like a nasty bug to me if the things are working the way you say. I would:
1. Log a bug with ATG team asking to fix the cmclean.sql. It could be that the things worked slightly differently in earlier versions and it is a historical heritage.
2. Strongly suggest to all DBAs to comment CR table update in the script until Oracle fix the issue. It sound like ICM can process the impacted CRs in a bit more better manner (well there is a place for other bug here as ICM doesn’t process CRs the same way after restart and normal processing)
3. Let’s appeal to people who have connections in ATG development team to make sure they get this message and it doesn’t “stuck” in MOS queues for years :)
– we have Vasu on the CAB and other connection ourselves + others I’m sure will help
Great work my friend! Let’s make EBS a better product to work with :)
Yury
Yury, step 2. on the suggestions list is not an option, it will make things worse, see the updated sections of the blogpost for details.
Maris: Issue well captured and well explained. Did you guys get any promising answer from Oracle yet? Why do you think commenting CR table update in cmclean.sql is dangerous?
Regards
Sandesh Achar
Hi Sandesh,
the Internal Concurrent Manager (ICM) only reschedules and cleans up requests that are “Running” and are tied to a concurrent manager process that has an “ACTIVE” status in FND_CONCURRENT_PROCESSES, but the process itself doesn’t exist (has crashed/got killed).
if you comment the update on the CR table in cmclean.sql, the script will update the statuses of concurrent processes to “completed”, but the requests will remain “running” – and, when you start up the managers, these requests will net be cleaned up by the ICM, because they are not tied to an “active” manager process anymore. These requests will remain “running” forever (until you clean the, up manually) and they can prevent other requests from starting if incompatibility rules exist between the stuck requests and the new ones. This is why the update to CR table should not be commented in the cmclean.sql
…and to answer the 2nd half of the question – cmclean.sql is not supported by Oracle Support, so it’s impossible to get answers about it. The recommendation is to test the script before using it and blame yourself if it didn’t do what you thought it would do. Funny right?
The rumors are (not sure how ) – a new tool is being developed, it would be properly supported and it would provide functionality to manage the concurrent managers and probably would also perform the cleanup functionality. I have absolutely no idea when and if it would see the daylight.
See
Bug 17198256 CMCLEAN.SQL SHOULD TAKE CARE OF RESCHEDULING TERMINATED CONCURRENT REQUESTS
Hi Maris, the new concurrent manager recover process provided in 12.2 is also not helpful on many occasions. Is there any best way deal with such situation where scheduled requests can run again after CM/db crash.
Hi Niyaz,
I agree this area is still not perfectly covered by the provided tools.
I’ve been doing the following a few times:
1) run the unsupported cmclean.sql (record the list of request ids that it updates)
2) Update the failed concurrent requests (list from the previous step) to set their phase_code=’P’ and status_code=’I’ (Pending/Normal).
This way the same requests are picked up again by the concurrent managers and they are rescheduled when they complete.
It’s still not perfect and neither supported, but at least it’s simple and efficient.
Maris
Hi,
1. Can anyone tell me what happen When we bounce CM,what happens to the requests which were running and which
were pending and scheduled ?
2.When we run cmclean after shutting down the CM and start again,what happens to the running requests,does it error out? Also what about to the requests which are pending and scheduled request ?
Hi Ajay,
I’ll try to answer your questions.
> what happen When we bounce CM.
By default the shutdown of CMs will prevent new requests from starting, and it will wait until all running requests complete. So unless you use “abort” option to shut down the CMs, the state of requests is going to be unaffected and they will complete normally. The new requests and other scheduled requests will be picked up when the CMs are started again.
> When we run cmclean after shutting down the CM…
If the CMs and requests were stopped gracefully (without termination), the requests will complete and the CM processes will stop “cleanly” – at this point cmclean does not do anything, as it’s intention is to clean up only the incorrect statuses. (I.e. if processes are terminated in a way that they don’t update the exit statuses in the respective tables).
However if the processes / requests are forcefully terminated, the cmclean will update their statuses to “Completed – Error” (and will not re-schedule any scheduled requests that were in this state.)
Maris
Hi Maris,
Thnx for the reply,
Can u explain if the fndlibr process are terminated then, what will be the status of the running requests at that time will be set but ICM.
And after cmclean when we start the CM, YOU said that running request status will be completed/error.
So the running requests will not be resubmited again?
Hi,
> Can u explain if the fndlibr process are terminated then, what will be the status of the running requests at that time will be set but ICM.
If you kill an FNDLIBR process, the ICM will notice it’s terminated and will mark the Manager process as terminated. If that manager was running a concurrent request at the time the FNDLIBR was terminated – the concurrent request will be updated as “Completed – Error”, and it will NOT be re-scheduled.
> And after cmclean when we start the CM, YOU said that running request status will be completed/error.
Yes, that’s how it works.
> So the running requests will not be resubmited again?
That’s correct.
Maris
Hi Maris,
Thnx for the reply, it was really helpful.