The trouble with a feature that kicks in without announcement
So, there I was the other day creating a procedure to obfuscate data for one of our customers, you know: GDPR is coming!!
The idea is pretty simple: a PL/SQL BULK COLLECT FOR ALL UPDATE that will obfuscate by replacing the actual data with random generated stuff. The obfuscation rules are provided by the customer and we created a package “to rule them all”.
Nice and neat and the process to update 500K rows was running in 4 seconds. But there was a small problem with the code in terms of performance, see if you spot it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
DECLARE c_limit PLS_INTEGER := 50000; CURSOR to_obfuscate IS SELECT rowid FROM owner.table_to_obfuscate; TYPE ids_t IS TABLE OF to_obfuscate%ROWTYPE INDEX BY PLS_INTEGER; l_ids ids_t; rcount INTEGER := 0; BEGIN OPEN to_obfuscate; LOOP FETCH to_obfuscate BULK COLLECT INTO l_ids LIMIT c_limit; EXIT WHEN l_ids.COUNT=0; FORALL indx IN INDICES OF l_ids SAVE EXCEPTIONS UPDATE owner.table_to_obfuscate c SET column_to_obfuscate=(SELECT remap_utils.maskcolumn() FROM dual) WHERE c.rowid=l_ids(indx).rowid; rcount:=rcount+SQL%ROWCOUNT; COMMIT; DBMS_OUTPUT.PUT_LINE('Updated columns: '||rcount||' at '||sysdate); END LOOP; CLOSE to_obfuscate; EXCEPTION WHEN OTHERS THEN IF SQLCODE = -24381 THEN FOR indx IN 1 .. SQL%BULK_EXCEPTIONS.COUNT LOOP DBMS_OUTPUT.put_line ('Error updating row #' || SQL%BULK_EXCEPTIONS (indx).ERROR_INDEX | '. Error code is ' || SQL%BULK_EXCEPTIONS (indx).ERROR_CODE); END LOOP; ELSE RAISE; END IF; END; /
Not easy to spot as is quite a lot of code but the thing (see the highlighted line #18), and I have to thank the Oracle tech guys here at Pythian for pointing out this one to me, is that the code is using a SELECT FROM DUAL to obtain the random value.
What happened next?
After I was advised to change that small part of the code, boom, the performance went to 10x slower, running in some 45 seconds what before was running in 4, what the … is going on here?
After some research, and thanks to Luke here for his help, it turned out to be my new friend “scalar subquery caching” preventing my call to SELECT FROM DUAL from actually getting a new value. The data was obfuscated, yes and pretty fast, but the resulting data was not random at all and the cardinality was complete askew for the obfuscated columns.
So, lessons learned, beware of nice performance features that may save your day most of the time but can also give you trouble in some cases.
Interested in working with Jose? Schedule a tech call.