750G Disks are BAHD for DBs: A Call To Arms

Posted in: Technical Track

I was reading the morning newspaper with a cup of coffee, well, actually I was reading slashdot.org, and I tripped across this story about some new 750G disks @ 7200 RPM soon to be released by Seagate. This filled me with a sense of dread about having to, once again, go through the process of convincing purchasing managers at various customer sites that actually, no, they can not just buy three of these and RAID-5 them together into a huge storage area for their terabyte database.

But, but, why, you may ask?

Think of a disk array as a warehouse. No, not a data warehouse, an actual brick and mortar warehouse. Imagine it as a big building in which you store physical stuff, like books or paper forms or cases of wine or something. Visualize the warehouse as having several loading docks for delivering new stuff or for loading up containers to ship stuff out. Then, imagine the access road or, for a large warehouse, various access roads leading to the loading docks. Are you with me so far?
Now, let’s map this analogy back to the array:

The square feet of your warehouse is the size of your array in gigabytes.

The loading docks are the separate disks in your array.

The access roads are the number of controllers in your server servicing these disks.

So now, tell me, what happens when you use very big disks for high-performance applications? You have way, way too many square feet to service with far, far too few loading docks (and usually only one access road!!!).

In the “good old days” when 9G disks were big, we didn’t have this problem. Really, this problem is new since then. Back then, if we wanted 200G of storage RAID1, we needed about 45 of those disks. Controllers could only handle 7 of them, you see (the 8th device on the bus was the controller itself) and that meant we had proportionally lots more access roads, and lots more loading docks per square foot of warehouse space than you typically have today. Now, some may look it up and say that Ultra-320 SCSI has 32 times the bandwidth capability of those ancient controllers! But note the following: In 1998, Storage Review’s editor’s choice for enterprise hard disk was the Seagate Cheetah 9LP This drive featured 10000 RPM and 5.4ms seek time and could deliver 10MB/sec to the controller. Now compare that the the specs of these newly announced Barracudas: 7200 RPM, unpublished seek times and 100MB/sec maximum theoretical peak delivery to the controller. For reference, the 7200.7 had 8.5 ms seek, the 7200.8 had 8ms seek and the 7200.9 had 11ms seeks, so we’re likely to have seeks somewhere in the 1.5x slower range than my 9GB reference.

I note:

  • The seek time is actually slower today.
  • The bandwidth performance is maximum only 10 times better
  • Your controller is no more than 32 times faster
  • The disk, however is about 83 times bigger!

Now, I admit things got faster as they got bigger, but they did not get as faster as fast as they got bigger.

That’s why I’m founding a new club, in the spirit of BAARF, the Battle Against Any Raid Five.

I’m calling it the Battle Against Huge Disks for Databases, or “BAHD for DBs”. You can either join me in my battle against huge disks for Databases. Or not. Together, we can relegate these monsters to their intended purpose, whatever that may be. Or not.

To join, simply post a comment to this article with a story of how you have fought the BAHD for DB. I will keep the following list of charter members up to date:

# Name Battle story
1. Paul Vallee Wrote the original BAHD for DBs call to arms
2. Mark Brinsmead …short story made long, here’s the problem: disks are now almost 200x bigger than they were back then, but they are nowhere near 200x faster! Not if you use the entire disk, anyway. (Perhaps I’ll elaborate on that another day…)
3. Jonathan Gennick You mean I should use more than one drive?
4. Doug Burns Then again, if I were to put 5 of those 750Gb disks in my server at home… Mmmm.
5. Pete Scott Well, big disks could be useful to hold an online disk backup of a database. But other than that
6. Stephen Booth In the long run (due to better throughputs &c) using lots of smaller faster disks will be much, much cheaper but it could put up the initial implementation costs by 5%. As the implementation team have no responsibility for the long term running costs but do for the initial costs they go for what ever’s cheapest.
7. Connor McDonald As long as their in a SAN, it will all be okay. Doesn’t matter what crap disk it is, apparently, if its in a SAN, disk performance will be magically awesome. All the SAN vendors tell me this all the time.
8. Mogens Norgaard This is very good. Please make me a member
PS: The Cash you pay for the Cache will of course remove (alleviate? is that the word?) all problems you might encounter with big disks.
9. Thet Win Absolutely! Call for arms, it is. Count me in.
10. Marco Gralike What there aren’t 200Mb disk any more?
Were did they go? You traded them in for only a view 750Gb disks? He, thats not an army! No wonder you you were written out of the script.
11. Carel-Jan Engel Well, this should make me the 11th member, the same seq# as I have at the BAARF.
We need a Small Disk Liberation Army.
12. Andrei Kriushin The history evolves in spirals indeed. Does anybody remember “magnetic drums”? Seems they are coming back ;-)
13. This space intentionally left blank. ;-)
14. Joel Garry I remember when carrying around 20M disk platter stacks was a decent workout. Now I lose thumb drives. There’s no substitute for cubic inches, but you need to get the power to the ground. Bandwidth Über alles.
15. Jay Miller I’ve lost this fight many times in the past and expect to lose it many times in the future.Them: But we’re giving you much faster CPUs, that should make up for it!
Me: We’re not CPU bound, we’re i/o bound!
16. James Morle I’m holding out for the 1TB drive. It would make the most perfect ironic
bedfellow for the 3-disk RAID-5 volume (The Most Ridiculous Configuration
In The World). Conversely, if we could get these storage densities into a
73GB 15K drive (thus minimising seeks) that might be a nice drive.
42. Jared Still It appears that soon we may have multi petabyte disks to contend with, and using storage virtualization software to manage many database in our 2 disk SANS. (RAID 1 you know)
What could make life easier than that.Sign me up please, and make me member #42, as Mogens didn’t ask for it.
email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

As Pythian’s Chief Executive Officer, Paul leads this center of excellence for expert, outsourced technical services for companies whose systems are directly tied to revenue growth and business success. His passion and foresight for using data and technology to drive business success has helped Pythian become a high-growth global company with over 400 employees and offices in North America, Europe, and Asia. Paul, who started his career as a data scientist, founded Pythian when he was 25 years old. In addition to driving the business, Paul is a vocal proponent of diversity in the workplace, human rights, and economic empowerment. He supports his commitment through Pythian’s hiring and retention practices, his role as board member for the Basic Income Canada Network, and as a supporter of women in technology.

28 Comments. Leave new

Mark Brinsmead
April 21, 2006 3:14 pm

*Sigh*. I’m afraid I’m with you — for the most part, at least.

To be sure, there are legitimate uses for really huge disks, but few — if any — of these relate to the class(es) of problems to which we customarily turn to databases for solutions. In fact, I’ll go out on a limb here, and suggest that most applications that perform really well with enormous disk drives will probably perform (almost) equally well using tape.

As Paul has already pointed out, disk capacities keep growing (by orders of magnitude) but their ability to serve random I/O requests has remained fixed for some time. That is to say, RIOPs (random I/Os per second) is almost purely a function of average (or maybe track-to-track) seek time and rotational latency. But we’ve all watched disks grow from 9GB, to 18GB, to 36GB, to … now 700GB with little or no improvement in rotational latency or seek times. (Well, okay, it is possible that 15kRPM disks might have appeared with the advent of 18GB devices, but I don’t recall seeing anything faster since.)

So, as the capacity of individual disk devices continues it’s exponential climb, the usable random I/O capability (measured in RIOPs/GB) asympotically approaches zero! Basically, very large disk drives are turning into little more than tape drives with very small start-stop latency and no rewind time. Note, however, that I did not say no seek time! Yes, seeks on these very large disks may be faster than typical tape drives, but they are sadly far from zero…

My memory may go back even farther than Pauls. I remember building a very large database in the early 90’s, composed of (or planned to use) about 600 – 2GB disk drives. Sure, the platters spun at only 4800RPM, and the seek times were nothing to write home about — at least according to today’s standards — but this number of disks (with the accompanying 50+ SCSI channels to attach them) were capable of handling far more RIOPs than any computer we could build at the time could possibly ever throw at them. (At the time, a very large database server might have somewhere between 10 and 20 CPUs — the ones I was using each had 12, and they were very fast for the time: 50MHz 486’s!)

I recall at the time having feelings of great trepidation when my manager (oh my, *there* is a subject for a blog, all in itself, but don’t expect to see that one anytime soon) told me that we needed to move from 2GB to 4GB disks. I’ll tell you — I was seriously concerned over that. Although the aggregate Random I/O capacity for this system was (close to) collossal, we were certainly not without some rather serious bottlenecks. We had spent months configuring and benchmarking the I/O system. (Okay, I wasn’t personally involved in that effort, but that is probably to my rather than to my detriment; another interesting blog that you’ll probably never see…) It took some serious work — and maybe even a few liberties with the truth — to convince the customer that the system could ever reach its performance targets with the smaller disks.

As it happened, my fears at the time were (mostly) unfounded. Along with the move from 2GB to 4GB spindles, the rotation rates went from 4800RPM to 7200RPM (or was it 3600 to 4800? It was so long ago!) and the seek times improved rather markedly too.

In the end, it turned out that the 4GB disks provided almost equal performance characteristics to twice as many 2GB disks. In short, we got lucky. Very, very lucky.

But, short story made long, here’s the problem: disks are now almost 200x bigger than they were back then, but they are nowhere near 200x faster! Not if you use the entire disk, anyway. (Perhaps I’ll elaborate on that another day…)

Yeah, I think I’ ready to sign up for BAHD.

Reply
Jonathan Gennick
April 21, 2006 5:57 pm

You mean I should use more than one drive?

<grin>?</grin>

Sign me up Paul.

Reply

Sign me up ;-)

Then again, if I were to put 5 of those 750Gb disks in my server at home …. Mmmm.

Reply

Well, big disks could be useful to hold an online disk backup of a database. But other than that…
In the data warehouse systems I design/develop/support it is not very common to seek single rows, reads of millions of rows are common. Large disks reduce the number of cabinets in the data centre – 4TB worth of 9GB disks does take up a lot of floor space but at least there were enough disk controllers to ensure good throughput on table scans.

My real fear is that these huge disks disappear into network storage devices and only the high-priests of SAN admin know how physical disk is divvied up between LUNS

Reply
Stephen Booth
April 22, 2006 5:30 am

Please sign me up.

I think that a big part of the problem, everywhere I’ve worked at least, is that the team (as far up as the executive sponsor) that puts in an system have no real connection to the ongoing running of the system but do have a very tight budget. In the long run (due to better throughputs &c) using lots of smaller faster disks will be much, much cheaper but it could put up the initial implementation costs by 5%. As the implementation team have no responsibility for the long term running costs but do for the initial costs they go for what ever’s cheapest.

A particular problem where I’m working now is that project manager’s can’t tell the difference between contractors (whose job it is to implment a workign system) and sales people (whose job it is to sell you as much kit as possible then disapear as soon as the cheque clears).

Stephen

Reply

As long as their in a SAN, it will all be ok….Doesn’t matter what crap disk it is, apparently, if its in a SAN, disk performance will be magically awesome…All the SAN vendors tell me this all the time…

Reply
Mogens Nørgaard
April 22, 2006 11:38 am

Paul,

This is very good. Please make me a member.

Mogens

PS: The Cash you pay for the Cache will of course remove (alleviate? is that the word?) all problems you might encounter with big disks.

Reply

Absolutely! Call for arms, it is. Count me in.

tw-

Reply
Marco Gralike
April 23, 2006 6:18 pm

Yeah, lets do it, call for arms brother Tuck! The more the merrier. What there arn’t 200Mb disk any more?
Were did they go? You traded them in for only a view 750Gb disks? He, thats not an army! No wonder you you were written out of the script…(https://www.imdb.com/title/tt0787985/) How are we now going to battle at the seagate…?

Reply
Andrey Kriushin
April 24, 2006 3:19 am

The history evolves in spirals indeed. Does anybody remember “magnetic drums”? Seems they are coming back ;-)

Please, sign me up.
Andrey

Reply
Carel-Jan Engel
April 24, 2006 2:51 am

Well, this should make me the 11th memeber, the same seq# as I have at the BAARF.
We need a Small Disk Liberation Army.

Reply

I remember when carrying around 20M disk platter stacks was a decent workout. Now I lose thumb drives. There’s no substitute for cubic inches, but you need to get the power to the ground. Bandwidth über allen.

Reply

Please sign me up as well. I’ve lost this fight many times in the past and expect to lose it many times in the future.

Them: But we’re giving you much faster CPUs, that should make up for it!
Me: We’re not CPU bound, we’re i/o bound!

Reply

It appears that soon we may have multi petabyte disks to contend with, and using storage virtualization software to manage many database in our 2 disk SANS. (RAID 1 you know)
What could make life easier than that.

Sign me up please, and make me member #42, as Mogens didn’t ask for it.

Reply
Marco Gralike
April 25, 2006 9:36 am

Forgotten to mention? Please sign me up ;-)

Reply
Chris Gralike
April 25, 2006 11:17 am

I do agree with the point of view submitted in the above section. Next to the fact I dont know verry much about the effects this might have on the performance and data consistancy conserning Databases.

I do know that it is still “Size that matters” with the “users”, they dont consern themselfs with performance. And next to that it is true that applications with the same functionality are growing in size, and noone seems to bother with why that might be, Or seem to care that bigger apps mean more I/O and (also usually overkilled) CPU times / Threads. Luckly there are still peeps arround that prove that it is still possible to write “ass breaking” code without the need for to much “space”, “I/O”, “CPU”. Problem is that the market still thinks they cant make money on it :(

https://en.wikipedia.org/wiki/Demoscene

Therefor Sign me Up scotty ;-)

Reply
Mark Brinsmead
April 25, 2006 11:23 am

Hmmm… Andrey Kriushin’s comment about drum storage is quite interesting. This is, in fact, one of the special cases I had made reservations about, where stinking huge disks ™ actually can be useful.

If you can convince the powers that be to allow you to use only the outer 1% to 10% of the disk, you now (almost) effectively have a drum. And a pretty fast one, too; I’m pretty sure that back in the 60’s and 70’s when people actually manufactured these things, I’m pretty sure nobody was crazy enough to try and manufacture one that spins at 15,000 RPM! (Did they even do 1,000 RPM? I’m afraid I don’t remember, as they are well before my time…)

But then, that’s the whole issue with BAHD, isn’t it? As far as the “decision makers” are concerned, even though I have managed to build myself a very fast (and very cheap!) 10GB “drum” using the outer edge of a 750GB disk, all they can see is 740GB of “perfectly good” storage that has “gone to waste”. And you just know they’re gonna use it for something stupid. Surely, there can’t be a problem with placing the corporate e-mail system on the unused space on the “drum” you’re using for online REDO, right?

Still, a good quality 750GB disk should cost less than $10k. The outer track is probably pretty big. Maybe if we could get the storage vendors to sell these devices as 2GB “virtual drums”, so our pointy-haired bosses wouldn’t know that there is “wasted” space on them, we could achieve something pretty cool.

Personally, I blame most of this mess on the craze for “consolidation”. Server consolidation, and storage consolidation. Ultimately, it is the slavish devotion to the idea (planted by hardware vendors, of course) that it is somehow “evil” to allow a CPU to go less than 90% utilised, or to “waste” a single kilobyte of storage that seems to be at the root of these problems…

But then, on the storage side, there is also the lack of basic understanding. Few managers seem to understand the differences between capacity, throughput, and reponse time; perhaps this is because many harken from the days when purchasing sufficient capacity almost always assured you of also having enough of the other two. Or not… (Oddly, though, many of the ones I know did understand these things before they were managers…)

Reply
Christo Kutrovsky
April 25, 2006 12:14 pm

I will join the dark side. It’s just not fair. What if we use those for the archived redo logs ? Only in extremly rare cases you would archive more then 1 log at a time, and it only takes 3 disk drives to max out a 2gbit fibre channel. (3x 70 Mb/sec = 210 Mb.sec, fibre is ~240 Mb/sec)

You could also use them for the hot backup. You could argue that the bandwith wont be sufficient, but consider it like natural bandwith limitation devices.

If you ask me, those are perfect for flash recovery area, as long as the $ per GB is there.

With 10 of these I could flashback to 3 months ago !

Big fat tapes, with decent random access speed. How is that bad ?

Reply

“You could also use them for the hot backup.”

My favourite, most likely and reasonable use. I’ve spent some time recently debating this with a sys admin who is too frightened to put in an appearance here ;-) (Hi, Mike!)

They will have perfectly valid uses.

However, we all know that some of those horrible, slimy big things will inevitably slide into the system where they are not welcome!

Reply
John (Sing-Cheong) Chen
April 25, 2006 4:34 pm

Barracuda 7200.10 series drives are for desktop, and it is available in UltraATA, and SATA models only. It will takes several years before an i-SCSI, FC, SCSI models will be available. 120 GB 15k FC disks takes about 3 years longer than SATA, and UltraATA to release. It takes another 1-2 years to integrate into SAN and NAS storage sub-system.

Therefore, this battle is not valid at all at the moment, at least for another 3 years.

I believe a 750 GB storage is certainly require for desktop database. Firstly, expansion for desktop is limited. Unless it uses SAN, or external storage, using EIDE, and SATA connection, it will requires big size HD to archive a terabyte database.

Moreover, this hard disk can be used for hot backup, archive log, database duplication, (low cost) standby database, database archival, RMAN catalog database, OEM database, OID database, AS10g App Svr metadata repository, low priority standby disk, user storage, storage area for data warehouse data load, multimedia storage, audio/video streaming server, voice (media) recorder database.

In most OLTP database, the popular access tables are always limited. Mostly less than 10 tables are very active. With careful planning of the data distribution, 10 units of 750 GB HD, proper indexing, SGA, it is still possible to archive high respond time. Moreover, it is possible to implement partitioning to make some old data off, so that table access could be faster. This will require effort from both developer, and users to the DBA, and storage engineer to archive this.

I don’t agree so much that it is suitable to hold online redo log, unless the writing (update, insert, delete) activities is low, or the storage sub-system’s cache size is near 30% of the online redo log size.

I checked HP StorageWorks EVA 4000, and entry level SAN, but it doesn’t support SATA disk, nor EIDE disk. Therefore, it is very likely that it cannot use in storage sub-system with decent RAID controller, 8 GB cache (HP StorageWorks EVA 8000), and high redundancy.

In short, the battle should not be now. Even if now, reasonable performance is archiveable with careful planning. High performance storage system are suitable for high performance database, which equips with multi channels 32 GB cache, and big size HD is still logical

Reply

> I’ve spent some time recently debating this with a sys admin who is too frightened to put in an appearance here ;-) (Hi, Mike!)

[Mike pulls flame proof body armour from cupboard and stuggles into it, noting that it doesn’t fit like it did last year. He makes a mental note to get some excercise – perhaps some running, but little does he realise that he may be running sooner than he thinks..]

Once upon a time when I was a junior sysadmin, we’d just taken delivery of our first 4GB drives. Quantity was absolutely required, as it was for storing seismic datasets. At the time, I opposed the rollout on the basis that 4GB was an awful lot of data to lose at one go and have to restore from 4mm DAT, should the drive fail (this was in the days before easy mirroring). Of course… I lost that debate – the engineers required the larger dataset in order to do their job, and we simply couldn’t stick to 2GB drives.

I see a lot of this situation in this for/against the 750GB drives – the circumstances are somewhat different, but I don’t think it’s acceptable to have a blanket dislike for a large capacity drive, simply because it has limitations over the previous generation (in this case, the capacity is increasing faster than overall performance).

There is *of course* opportunity for the inadequecies of the drive/interconnect to be abused and for that to cause performance problems – technical staff WILL stick a pair of 750GB drives into a 1U high machine and run the OS/DBMS and App from the same mirrored pair of spindles. Is this so much a crime? – A nice small system with a tiny power/space footprint – it might not run as fast as it *could*, but sometimes 75% of potential performance is sufficient (probably also worth mentioning that a huge amount of performance issues are due to application inadequacies, bottlenecking peformance before it gets anywhere near the database).

As I said to Doug.. “Do *all* parts of *all* databases have to run at phenomenal speed? – what happens when quantity is more important than delivery?”

Of course – there are absolutely exceptions to this – some databases really do merit performant environments, which doesn’t necessarily preclude them from these large capacity spindles.. it just requires that the topography and usage of the drives be managed with a lot more care.

The physical characteristics of these drives also need to be considered.. I suspect that in many situations, the DBA team is (WARNING..vast generalisation alert…) one degree of separation away from the physical datacentre requirements. This is a particularly hot topic at the current client, but I know of many, many places that are in similar situations.

We have a number of Symetrix currently configured with 73GB drives. We also have business requirements for more storage.. and we’ve got next to no power available in our datacentres. If we can swap our 73GB drives for 500GB drives, with a relatively small increase in power draw, the benefits can be enormous.

Right – there you go.. I said it.

[sound of running, wheezing sysadmin fades to distance ]

Reply
Niall Litchfield
April 26, 2006 5:42 am

John

A couple of points to pause for thought. First up the same company already supplies 300gb fibre channel and U320 scsi discs. The future isn’t as far away as you might think.

Second a lot of purchasers will exactly be looking at ditching SCSI based RAED arrays (where the E stands for expensive) for SATA based RAID arrays where the I stands for inexpensive. Things like the Apple. 7tb for 13k what could possibly go wrong with that.

Reply
John (Sing-Cheong) Chen
April 27, 2006 2:38 pm

A little research on SAN and NAS for SATA drive, and found EMC has 2 models available to support SATA disk. Entry-level CLARiiON AX150, which is low profile 2U enclosure, which support 12 disks. However, only 512 MB cache, so I’m not interested to bring it up here. The next higher end mid-level CLARiiON series will worth mentioning. EMC CLARiiON CX300 NAS support 500 GB SATA 7,200 rpm disk, 60 disk drives. This middle class storage system has 2 processors, FC (2 Gb), or iSCSI (1 Gb), and 1 GB RAM/processor. With 1 GB of cache, performance of 700 GB SATA disks, if released, may not be as bad as expected.

Moreover, CLARiiON CX700 support cache upto 4 GB/processor, 2 processors, 4 FC port/processor. This highest end model in mid class system can certainly keep the smile for people with tight budget, yet keeping the performance.

Anyone who plan for quick migration for production database, or upgrade, go get a aloner unit of SATA storage system. Here goes a practical usage of high capacity disk with good performance.

Note 1: CLARiiON CX series are CX300, CX500, CX700
Note 2: 700 GB SATA disks are not supported yet by EMC
Note 3: Seagate haven’t release 15k rpm 300 GB SCSI disk yet. Only 10k rpm 300 GB available under Cheetah 10K.7 product

Ref: CLARiiON CX300 Spec Sheet https://www.emc.com/products/systems/clariion_cx300/pdf/C1078_cx300_ss_ldv.pdf
CLARiiON CX700 Spec Sheet
https://www.emc.com/products/systems/clariion_cx700/pdf/C1080_cx700_ss_ldv.pdf
EMC CLARiiON Family
https://www.emc.com/products/systems/clariion.jsp

Reply

What, you mean that BIG is not GOOD. You must be the only idiot in the world to think that. ;-)) I don’t care how much experience you have with what. Storage vendors says we need only 2 500Gb disks. Rock On!!!

Reply
Liveblogging Larry Ellison’s Keynote
September 24, 2008 5:28 pm

[…] Sense. Refer to BAHD again. Man I feel a bit smart right now. These icons link to social bookmarking sites where […]

Reply
The Oracle Database Machine, In Partnership with HP.
September 24, 2008 5:37 pm

[…] Sense. Refer to BAHD again. Man I feel a bit smart right […]

Reply
Unveiling the OLTP Oracle Database Machine & Exadata v2 | Pythian Group Blog
September 16, 2009 7:35 am

[…] are 600 GB (+33%) and SATA disks are 2 TB which doubles what ODBM v1 offered. We all know that big disks are bad but that’s where Sun FlashWire technology comes into play with flash cache on the controllers […]

Reply
Big Discs are Bad « Martin Widlake’s Yet Another Oracle Blog
September 27, 2009 6:21 pm

[…] Tags: performance, Storage, system development, VLDB trackback I recently came across this article on large discs for database by Paul Vallee. The article is over 3 years old but is still incredibly valid. It’s a very […]

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *