When creating an index, Oracle versions 10g and above automatically compute optimizer statistics. And even before that, 9i had a
COMPUTE STATISTICS clause to accomplish the same thing. Not only does it save the time and effort of running
DBMS_STATS, but it also saves the disk I/O involved in such an operation, since all rows are available in the course of index creation.
Now why can’t this happen on a
CREATE TABLE AS SELECT command? Most if not all of the statistics gathering steps (high/low value, number of distinct values, and even possibly histograms) can be gathered over the course of a simple table read, which is happening anyway.
Oracle 12.1 maybe?
Want to talk with an expert? Schedule a call with our team to get the conversation started.
And there should be a clause to tell oracle to not to gather stats during CTAS.
My feature wish list include Read-only partitions in a table.
For now, in my data warehouse, I assign partitions to their own tablespaces, so that I could make those tablespaces read only. It would be great to mark partitions read only, and RMAN should also recognize that to improve backup and recovery.
Histograms can be quite heavy to compute. It requires grouping and sorting. Doing this for every column can consume significant sorting space.
Same for distinct values, unless the new method is used, which uses an advanced hashing technique to quite accurately approximate the number of distinct values.
For the distinct values, it’s jsut a call to the ndv functions. Since a CTAS reads all the rows anyways, this should be straightforward to calculate on the fly.
For histograms, there may be additional work needed to first determine what buckets to use and then populate them. Nonetheless, calculating this during the CTAS avoids at least one pass through the data, so should certainly save I/O.
You could certainly simulate this to some extent I think, though it would be the sort of work that you would only undertake if this was a regular process that you do quite frequently.
Other than histograms, the significant metrics for the table and columns can be calculated with analytic functions as part of the select, and a multitable insert could be constructed to insert table data into the target table and the calculated statistics into another table (one row, of course). You could then use the statistical data as the basis for setting statistics for the table and columns.
It’d be pretty straightforward for the table I expect, and a pain for the columns.
However, one reason why it might not be worth doing is that estimating statistics can often be both very fast and very accurate, particularly when a table has just been created and you don’t need to worry about the variability in the number of rows per block that comes about from normal delete operations. Block-based low percentage estimation is worth a look.