Scalar subquery caching in action

Posted in: Technical Track

The trouble with a feature that kicks in without announcement

So, there I was the other day creating a procedure to obfuscate data for one of our customers, you know: GDPR is coming!!

The idea is pretty simple: a PL/SQL BULK COLLECT FOR ALL UPDATE that will obfuscate by replacing the actual data with random generated stuff. The obfuscation rules are provided by the customer and we created a package “to rule them all”.

Nice and neat and the process to update 500K rows was running in 4 seconds. But there was a small problem with the code in terms of performance, see if you spot it:

DECLARE c_limit PLS_INTEGER := 50000;

	CURSOR to_obfuscate IS
	SELECT rowid
	FROM owner.table_to_obfuscate;
	TYPE ids_t IS TABLE OF to_obfuscate%ROWTYPE INDEX BY PLS_INTEGER;
	l_ids ids_t;
	rcount INTEGER := 0;

BEGIN 
	OPEN to_obfuscate;

	LOOP FETCH to_obfuscate BULK COLLECT INTO l_ids LIMIT c_limit;
	EXIT WHEN l_ids.COUNT=0;

	FORALL indx IN INDICES OF l_ids SAVE EXCEPTIONS
		UPDATE owner.table_to_obfuscate c
		SET column_to_obfuscate=(SELECT remap_utils.maskcolumn()  FROM dual)
		WHERE c.rowid=l_ids(indx).rowid;
	rcount:=rcount+SQL%ROWCOUNT;
	COMMIT;
	DBMS_OUTPUT.PUT_LINE('Updated columns: '||rcount||' at '||sysdate);
	END LOOP;
	CLOSE to_obfuscate;
EXCEPTION 
	WHEN OTHERS THEN 
		IF SQLCODE = -24381 THEN
		FOR indx IN 1 .. SQL%BULK_EXCEPTIONS.COUNT 
			LOOP DBMS_OUTPUT.put_line ('Error updating row #' || 
				SQL%BULK_EXCEPTIONS (indx).ERROR_INDEX |
				'. Error code is ' || 
				SQL%BULK_EXCEPTIONS (indx).ERROR_CODE);
		END LOOP;
		ELSE RAISE;
		END IF;
END;
/

Not easy to spot as is quite a lot of code but the thing (see the highlighted line #18), and I have to thank the Oracle tech guys here at Pythian for pointing out this one to me, is that the code is using a SELECT FROM DUAL to obtain the random value.

What happened next?

After I was advised to change that small part of the code, boom, the performance went to 10x slower, running in some 45 seconds what before was running in 4, what the … is going on here?

After some research, and thanks to Luke here for his help, it turned out to be my new friend “scalar subquery caching” preventing my call to SELECT FROM DUAL from actually getting a new value. The data was obfuscated, yes and pretty fast, but the resulting data was not random at all and the cardinality was complete askew for the obfuscated columns.

So, lessons learned, beware of nice performance features that may save your day most of the time but can also give you trouble in some cases.

 

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Data Project Engineer
First of all father, then husband and finally Oracle database consultant. I love technology in general and managing data in particular. Trying to learn one new thing every day.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *