21-hours Phantom SELECT Causing ORA-1555 ‘snapshot too old’

Posted in: Technical Track

One of our clients had an ORA-1555 “snapshot too old” error two nights in a row. The quick and dirty fix would be to increase the retention_period and the potential size of the UNDO tablespace.

I was looking at it together with Dave, my new team mate, and a small detail popped up right away — Query Duration=76584 — 21+ hours? I checked the retention period, and it was 2 hours, so the dirty fix would probably fail unless it’s very dirty — dumping undo retention to something like a day and blowing the UNDO tablespace, and still without guarantee that the query finish within a day. (In fact, I rather think that it would run for 42 hours. But I digress.)

The query was simple (names obscured):

SELECT *
FROM   TABLE1
WHERE
       (COL_DATE >=
                (SELECT MAX(COL_DATE)
                   FROM TABLE2
                  WHERE FILTER_COL = :"SYS_B_0")
                );

Table1 has millions of rows, so my first reaction was to check if there are histograms on this column, whether it has an index on it, and when statistics were last collected. There were no index and no histograms, so the execution plan couldn’t be any different than a full table scan. In this case, the query shouldn’t really be running for longer than 21 hours.

We couldn’t easily get the client’s developer’s advice on which batch this statement belongs to, but it definitely was a batch and seems to be regular. So, assuming that the duration is correct, the query was supposed to be running right at that moment. I checked the V$SESSION with STATUS='ACTIVE' OR LAST_ET_CALL > 3600 — nothing back. Dave recalled that there was a bug related to the reported duration of queries reported within ORA-1555 error and in view V$UNDOSTAT. Fair enough — we located the Metalink Note and used another query with a workaround giving the same result.

Next step was to look at V$SQL and see if the query was still there and running. And, bingo! there it was! V$SQL provided the hash value and V$SESSION.SQL_HASH_VALUE showed the guilty session with status ‘INACTIVE’ and LAST_CALL_ET in the range of a few minutes. A quick peek into V$SESSION_WAIT showed that the session was waiting on SQL*Net message from client for several minutes. I was already 99% sure what was going on, and I just needed a few minutes to prove it with a 10046 trace, so we enabled 10046 event with level 8. The content of the trace file speaks for itself:

*** 2007-05-15 09:33:12.418
WAIT #1: nam='SQL*Net message from client' ela= 256877226 p1=1413697536 p2=1 p3=0
WAIT #1: nam='db file scattered read' ela= 8144 p1=12 p2=39721 p3=12
WAIT #1: nam='SQL*Net message to client' ela= 4 p1=1413697536 p2=1 p3=0
WAIT #1: nam='SQL*Net more data to client' ela= 175 p1=1413697536 p2=2001 p3=0
FETCH #1:c=0,e=10805,p=12,cr=4,cu=0,mis=0,r=100,dep=0,og=4,tim=6190387721660
*** 2007-05-15 09:37:34.942
WAIT #1: nam='SQL*Net message from client' ela= 256356355 p1=1413697536 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 3 p1=1413697536 p2=1 p3=0
WAIT #1: nam='SQL*Net more data to client' ela= 156 p1=1413697536 p2=2001 p3=0
FETCH #1:c=0,e=1462,p=0,cr=2,cu=0,mis=0,r=100,dep=0,og=4,tim=6190644080126

So the query was not actively kicking and this is why we didn’t catch it in the beginning. I should also mention that the program (as indicated in V$SESSION) was DTSRun.exe, so it seems that the culprit is some kind of data-extraction job that fetches 100 rows and processes them for 256 seconds before going for another fetch. Well, what would you expect from a Microsoft tool? ;-)

Anyway, since DTS is not our responsibility, we had to hand this issue to the relevant group with the definitive diagnosis, “It’s not a database problem.”

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

What does it take to be chief technology officer at a company of technology experts? Experience. Imagination. Passion. Alex Gorbachev has all three. He’s played a key role in taking the company global, having set up Pythian’s Asia Pacific operations. Today, the CTO office is an incubator of new services and technologies – a mini-startup inside Pythian. Most recently, Alex built a Big Data Engineering services team and established a Data Science practice. Highly sought after for his deep expertise and interest in emerging trends, Alex routinely speaks at industry events as a member of the OakTable.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *