I saw a question on the OTN Exadata forum asking whether or not to use BIGFILE temporary tablespaces in an Exadata system. BIGFILE tablespaces have been around since at least Oracle 10gR2 and extend the 32G datafile size limit, but they restrict to a single datafile per tablespace. Indeed, it trades the simplified management of a single datafile for reduced flexibility.
The default Exadata database “dbm” comes with a single BIGFILE tablespace, and I’ve found this approach to work well in most cases. However, the OTN question reminded me of a situation I encountered last year where i couldn’t use BIGFILE tablespaces for performance reasons, and I thought I’d share some of the details with Pythian blog readers.
The environment was an Exadata environment undergoing pre-production stress testing. I used Real Application Testing to take a highly-concurrent OLTP workload, and replayed the workload with the
synchronization parameter set to
FALSE, effectively increasing concurrency beyond the original test system.
AWR showed a large volume of buffer-busy activity:
Top 5 Timed Foreground Events ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Avg wait % DB Event Waits Time(s) (ms) Time Wait Class ------------------------------ ------------ ----------- ------ ------ ------------- gc buffer busy release 110,700,450 12,544,451 113 58.9 Cluster buffer busy waits 112,565,811 3,950,102 35 18.5 Concurrency latch: cache buffers chains 113,162,321 2,335,055 21 11.0 Concurrency enq: SS - contention 20,026,343 1,208,422 60 5.7 Configuration gc buffer busy acquire 11,244,505 720,003 64 3.4 Cluster
To drill down, I used ASH (active session history) information. Given the raw volume of sessions and data, though, querying dba_hist_active_sess_history is very slow. I therefore took advantage of Exadata’s hybrid columnar compression feature to create a smaller, flattened, and compressed work table containing observations from the RAT replay period. The resulting table is much faster to query.
create table marc_ash nologging compress for query high parallel 128 as select /*+parallel(128)*/ * from dba_hist_active_sess_history where snap_id between 384 and 423;
Then, I identified the P1, P2, and P3 parameters of the top wait events to see where the waits were happening, as well as how much time was spent in each combination. This is probably a good time for me to mention my standard disclaimer regarding dba_hist_active_session_history: The data is composed of samples of the system state, so it doesn’t have the same granularity as, say, session traces. But with a large sample like the one I have here, I can still get a good idea of what’s going on.
select event,p1, p2, p3, count(*), sum(tm_delta_time) from marc_ash group by event,p1,p2,p3 order by sum(tm_delta_time) desc; EVENT P1 P2 P3 COUNT(*) SUM(TM_DELTA_TIME) --------------------------- ------------ --- --- -------- ------------------ gc buffer busy release 501 2 13 2336685 57421543417904 buffer busy waits 501 2 13 736241 16902873986356 latch: cache buffers chains 262188190496 155 0 260182 6197021780073 latch: cache buffers chains 230512806688 155 0 217646 5831016580939 enq: SS - contention 1397948422 41 2 183510 2378750008606 gc buffer busy acquire 501 2 13 130649 3265344853224
What’s interesting here is that a common set of parameters came up in buffer busy waits: P1 = 501, P2 = 2, P3 = 13. P3 lists the block type, and there’s a great summary of values on Charles Hooper’s blog. 13 maps to “file header block”.
P1 is the file number. The number 501 is above my DB_FILES value of 500, indicating that it’s a tempfile, and in this case, the one very large BIGFILE of the TEMP tablespace. P2 is the block number. According to MOS note 109630.1, block 2 is the space management bitmap in a locally-managed tablespace.
My Oracle Support note 465840.1 has some good information about addressing tempfile contention in RAC. It mentions that “buffer busy wait” events on the file header block can be encountered in RAC since each instance allocates space in the same tempfile. It recommends creating at least as many tempfiles as there are instances in the cluster. To accommodate multiple files in a tablespace, we must create a SMALLFILE tablespace, and work around the 32G file limit. In our case, it means creating a new temporary tablespace with 24 32G tempfiles for a total capacity of 768GB, changing the default temporary tablespace, and dropping the old one.
After flashing back the database and re-running the same RAT workload, the wait on tempfile 501 and the SS enqueue disappeared entirely.
What are your experiences, blog readers? Do you normally use BIGFILE temporary tablespaces? Have you encountered similar issues?
Nice post, Marc. It’s good to see that folks still discover and document the objects that need to be partitioned with RAC. Good find.
Also, Oracle Managed Files was introduced in Oracle9i. I recall that timeframe because I was in Veritas development working on another new 9i featured called Oracle Disk Manager.
The irony is that, had BIGFILE tablespaces not existed, my 768G temp tablespace would already have been spread across multiple tempfiles, and I wouldn’t have encountered this performance issue in the first place.
Ah yes, I see!
Did you try this workload with a temporary tablespace group having several bigfile temp tablespaces?
Thanks for stopping by!
I did briefly consider temporary tablespace groups, but I noticed a blog post by Riyaj Shamsudeen (https://orainternals.wordpress.com/2012/02/13/temporary-tablespaces-in-rac) where he encountered TS/SS enqueue contention (though not buffer busy waits) reallocating temp space in RAC. Quoting:
“Temporary tablespace groups is of no use since the contention will be at Cross Invocation Call. In fact, there is a potential for temporary tablespace groups to cause more issues since the free space in one temp tablespace can not be reassigned to another temp tablespace dynamically, even if they are in the same tablespace group. In theory, it is possible to have more SS, CI locking contention with temp tablespace groups.”
So at the time I considered it safer to simply add more tempfiles. Riyaj’s case didn’t have the added factor of buffer busy waits though, so it is concerivable that temporary tablespace groups would have resolved the buffer busy wait issue.
Hi Mark. I’m interested if this same relationship of busy waits would show when using exclusively ASM rather than file systems?
IOW: is it common to all disk/file mappings or is it something specific only to OS file systems?
The block that experienced buffer busy waits was block 2, the first locally-managed space management bitmap. Since this same bitmap exists in non-ASM locally-managed tablespaces, I would expect this same issue to happen in non-ASM storage. One of the design goals of ASM was actually to make it as close as possible to raw files from a performance perspective.
[…] to use BIGFILE temporary tablespaces in an Exadata system? Marc Fielding […]
One of the many items on my todo list is to check to see if there are any significant differences (apart from hitting database limits) between using multiple tempfiles with a single tablespace and using multiple tablespaces with a tablespace group.
There used to be a bug with the latter strategy that resulted in dictionary cache contention and extreme executions of a particular sys-recursive SQL statement, so I’ve never got around to looking at it.
Have you had any thoughts on the topic ?
Interesting question. In my particular case I only tested the single-tempfile case and single-tablespace multiple-tempfile case, and multiple tempfiles did resolve the issue for me, which seems to indicate that the contention was more at the datafile level than the tablespace level.
I don’t have much direct experience with temporary tablespace groups, but a quick google search brought up a blog post at
https://oracle-randolf.blogspot.com.au/2009/06/temporary-tablespace-groups.html where a single session only used one of the tablespaces in the tempfile group. If that’s still the case in newer versions, that would be a major difference between the single-tablespace and multiple-tablespace use cases.
Also, MOS note 1469347.1 (last updated July 2012) indicates that an issue with TS$ contention was resolved in 22.214.171.124, but another similar issue was marked as expected behavior, with enhancement request 6052597 outstanding to fix it. The workaround is not to use temporary tablespace groups.
Creating the database in Exadata I’m always doing the group of bigfile tablespaces, usually 8. This step completely prevents the temporary tablespace concurrency.