Oracle feature request #94739: CTAS optimizer statistics

Posted in: Technical Track

When creating an index, Oracle versions 10g and above automatically compute optimizer statistics. And even before that, 9i had a COMPUTE STATISTICS clause to accomplish the same thing. Not only does it save the time and effort of running DBMS_STATS, but it also saves the disk I/O involved in such an operation, since all rows are available in the course of index creation.

Now why can’t this happen on a CREATE TABLE AS SELECT command? Most if not all of the statistics gathering steps (high/low value, number of distinct values, and even possibly histograms) can be gathered over the course of a simple table read, which is happening anyway.

Oracle 12.1 maybe?

email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Marc is a passionate and creative problem solver, drawing on deep understanding of the full enterprise application stack to identify the root cause of problems and to deploy sustainable solutions. Marc has a strong background in performance tuning and high availability, developing many of the tools and processes used to monitor and manage critical production databases at Pythian. He is proud to be the very first DataStax Platinum Certified Administrator for Apache Cassandra.

4 Comments. Leave new

And there should be a clause to tell oracle to not to gather stats during CTAS.

My feature wish list include Read-only partitions in a table.

For now, in my data warehouse, I assign partitions to their own tablespaces, so that I could make those tablespaces read only. It would be great to mark partitions read only, and RMAN should also recognize that to improve backup and recovery.

regards

Reply
Christo Kutrovsky
August 19, 2010 9:38 am

Histograms can be quite heavy to compute. It requires grouping and sorting. Doing this for every column can consume significant sorting space.

Same for distinct values, unless the new method is used, which uses an advanced hashing technique to quite accurately approximate the number of distinct values.

Reply
Marc Fielding
August 19, 2010 10:59 am

Hi Christo,

For the distinct values, it’s jsut a call to the ndv functions. Since a CTAS reads all the rows anyways, this should be straightforward to calculate on the fly.

For histograms, there may be additional work needed to first determine what buckets to use and then populate them. Nonetheless, calculating this during the CTAS avoids at least one pass through the data, so should certainly save I/O.

Reply
David Aldridge
August 19, 2010 10:17 am

You could certainly simulate this to some extent I think, though it would be the sort of work that you would only undertake if this was a regular process that you do quite frequently.

Other than histograms, the significant metrics for the table and columns can be calculated with analytic functions as part of the select, and a multitable insert could be constructed to insert table data into the target table and the calculated statistics into another table (one row, of course). You could then use the statistical data as the basis for setting statistics for the table and columns.

It’d be pretty straightforward for the table I expect, and a pain for the columns.

However, one reason why it might not be worth doing is that estimating statistics can often be both very fast and very accurate, particularly when a table has just been created and you don’t need to worry about the variability in the number of rows per block that comes about from normal delete operations. Block-based low percentage estimation is worth a look.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *