Although many electrons have been expended discussing Exadata’s features, storage indexes tend to figure last, with a vague mention of row elimination in heavily clustered data. Even Oracle’s Exadata software user guide devotes barely half a page to them. Unlike the better known smart scanning features though, storage indexes have an important advantage: rather than offloading workload to storage cells, they eliminate the need to do the I/O at all.
Here are some sample statistics taken from an actual production system:[sql] SQL> select name,value from v$sysstat where name in (‘physical read total bytes’,’cell physical IO bytes saved by storage index’);
physical read total bytes 468779565615616
cell physical IO bytes saved by storage index 251319174832128
That’s right, over a third of all I/O was avoided entirely because of storage indexes. Interested now?
Storage indexes work by dividing to storage on each ASM griddisk into chunks. The first time an entire chunk is read (from a full-table scan, for example), the storage server keeps track of the highest and lowest values for each column in the chunk, and stores them in an in-memory structure. Whenever an insert or update comes in, each column value is checked against min and max values, and the values are updated if necessary. Deletes are a bit trickier, but I suspect the storage indexes are left as-is rather than undergoing the overhead of reading the entire chunk to determine new min-max values.
The next time a table scan request comes along, the storage server compares fixed predicated in the WHERE clause against the min and max values, and if the requested rows fall outside the range, they skip I/O against the chunk entirely.
Now you might think that typical column values in a chunk of data 1 MB or larger would have a wide range, and that storage indexes would give very little benefit. And it is true that a chunk with a single very high and a single very low value in a given column will match most queries and thus requiring the whole chunk to be read. However a lot of data, particularly the incrementing ID values so beloved of data architects, naturally tends to be correlated with the time the row was inserted into the database. And if you delete old rows in bulk or not at all, the storage chunks tend to have data inserted in the same time period. This correlation means that many values, ID value especially, tend to be clustered together within storage chunks.
When compared to regular indexes, storage indexes have a few important advantages. They consume no storage space at all, require negligible overhead to maintain or to use, and they cover all commonly queried columns of every table, even in combination.
Storage indexes however do have some drawbacks. First, they are designed for use with full-table scans. Index-based access paths will see fewer benefits because they tend to read fewer rows in the first place. Second, as in-memory structures, storage indexes do not survive a storage server shutdown, and must be rebuilt on subsequent startup. I suspect this drawback will disappear in future versions though, since saving storage index values to disk should be a simple exercise. And third, storage indexes don’t store min and max values for encrypted tables, as this could inadvertently disclose some of the data.
But overall storage indexes can provide important performance benefits with almost zero overhead, and, joining Netezza and InfiniDB, expect to see more and more DBMS products offering them.
There’s a couple of (possible) implications to this, i think.
What I think it implies is that smart designers should be considering applying a sort as part of the (direct path) insert that loads the table, on columns that are frequently subject to predicates that could benefit from storage indexing. Rather as you can do to reduce the clustering factor on an index.
And speaking of indexes, clustering factor is of course taken into account in costing some index-based access methods — does the same apply to storage indexing? In other words, does the optimizer have any information from the storage indexing that can influence its selection of an execution plan? My thought is that it probably doesn’t, although if the column has an index with a stored clustering factor on it maybe that could theoretically be leveraged. Maybe the SQL Tuning Advisor takes it into account.
Indeed storage indexes open up opportunities to optimize data clustering with physical layout. If your data loads are large, infrequent, and insert-only (the best kind), then physically ordered data brings the same benefits as partition pruning, but without the column and other data-model restrictions of partitioning. You just need to be very disciplined to make sure the physical ordering stays consistent: no more manual updates to fix data.
I did a quick and totally unscientific comparison of the same large table in Exadata and non-Exadata platforms, and I couldn’t see any differences in relative costing between more or less-clustered columns. So while the optimizer shows it knows about storage indexes with the “TABLE ACCESS STORAGE FULL” step on all full-table scans, I haven’t seen it actually affect plan selection.
I want one of those machine !
A few million bucks, and you can have one of your own. I’m putting together a future blog post about Exadata packaging and pricing.
[…] 14-How does exadata storage indexes work? Marc Fielding-Exadata’s Best Kept Secret: Storage Indexes […]
[…] combined with smart and innovative software. One of these innovative features is the Storage Index (see also Marc’s post here) – a game changing feature in my […]
[…] Storage Indexes […]
your query is only determining on one node. Should you be using gv$sysstat to determine for the entire exadata
The query was meant for illustrative purposes only to show what kind of I/O benefits storage indexes can provide. Since it queries a single instance on a single database, it’s not intended to show the performance of the entire database machine. Whether to use v$ or gv$ views it a bit outside the scope of this blog post, as there are additional considerations. For example, is this database multi-node RAC? Is the workload being measured running across all nodes?
Now if you wanted to sum this metric for an entire multi-database DBM, it gets a bit more involved. DB links to query gv$sysstat is one option, or simply a script that connects to databases in order and uses script-level logic to do the calculations. Perhaps a user-defined Enterprise Manager metric, and aggregating the results right off the EM repository database.