Different Technology Stacks On Production and DR?

Posted in: Technical Track

Last week, I was at the NetApp office in North Sydney for the presentation on NetApp SnapManager for Oracle. It was good opportunity to learn more about NetApp snapshots while working on a project for one of our clients in Sydney. It was an especially interesting topic as I have some experience using Veritas Checkpoints (see my presentation on test systems refreshes), and it was interesting to see what’s different and new in the NetApp implementation. But I digress.

I learned that NetApp can provide access to the same LUNs via either Fiber-Channel (FC) or iSCSI. And this is when the interesting argument surfaced. Apparently, some companies aim to have the technology stack on their disaster-recovery site as different as possible from the primary production site. Their argument is that if one technology fails at the primary site (like FC to access storage), then the DR site using a different technology stack will more likely be unaffected.

Hrm . . .  I had never thought about this, and when I consider it now, it still doesn’t appeal to me. If I design a highly-available solution with a disaster-recovery site in place, one of my priorities would be to switch between the sites comfortably at any time. The more differences two sites have, the lower my comfort level is.

The only reason why I think some companies can “demand” having different storage technology stacks at production and DR is to justify a more convenient (a cheaper?) implementation.

Thoughts? Comments?

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

What does it take to be chief technology officer at a company of technology experts? Experience. Imagination. Passion. Alex Gorbachev has all three. He’s played a key role in taking the company global, having set up Pythian’s Asia Pacific operations. Today, the CTO office is an incubator of new services and technologies – a mini-startup inside Pythian. Most recently, Alex built a Big Data Engineering services team and established a Data Science practice. Highly sought after for his deep expertise and interest in emerging trends, Alex routinely speaks at industry events as a member of the OakTable.

15 Comments. Leave new

Hi Alex,

We have the concept of “genetic diversity”, I don’t know if others recognisse this term, but the basic concept being if an O/S, or piece of software is subject to a security compromise we can take this bit of the stack completely out of the equation if required.

We take this concept to the router, server, O/S and application software level.

Paranoid? You bet!

jason.

Reply
Alex Gorbachev
February 10, 2009 4:06 pm

Interesting. Thanks Jason — appreciate your feedback.

Reply

Security would be the major factor for such a choice.
As Jason said: genetic diversity. It’s one of the corner stones of highly secure sites that require HA.

Does every HA site require that level of security? I don’t think so, particularly given that HA and secure access are not synonyms: the two are quite different and have little in common.

Not to say it can’t be done but the budget will take a hit, if not in the hardware, then in the consultancy fees…

Reply

Is it just me, or does accessing two technology stacks through the same manager kind of miss the point of having two technology stacks? Surely the security argument is invalidated…

Reply

Initial reaction – yuck!

I can even understand some of the technical reasons why you might want to do this, but my personal opinion is that HA without as much simplicity as you can manage is unlikely to be HA in practise.

Of course I don’t mean *technical* simplicity but administrative simplicity because this stuff will ultimately be managed by humans.

Reply
Alex Gorbachev
February 11, 2009 3:27 am

Noons,

Well, security often means more complexity (no?) and “complexity is the enemy of availability”. I think my experience with how people make things secure might be bad according to this.

Joel,

Or through the same application. There will always be same’s otherwise, it’s insane. :)

Reply

“security often means more complexity ”

No, not at all. Security often means more diversity.

If you make everything the same, all a hacker has to do is find a technique to break in once and it’s repeatable everywhere in that site.

If the hacker has to break into everything every step of the way and for each step it’s a different lock, then things are a lot harder for their kind – and a lot easier for us to find them by increasing the chances they step on a trip wire. The Cucko’s Egg book explains this at length.

Is that necessary everywhere? I don’t think so. Not every site needs mil-level security – when we talk genetic diversity that’s what it’s about.

For the vast majority of places, I would not bother to that level: many other security options available with less impact, before one jumps to the ultimate.

Reply
Alex Gorbachev
February 12, 2009 6:07 am

Thanks Doug.

If takes to the extreme, we will end up with two different applications (.NET vs Java), different DB vendors (MySQL or SQL Server vs Oracle), and local storage with cloud – for a good measure. With this, “we cannot fail”. Eh?

Reply

““we cannot fail”. Eh?”

We *never* fail. At least not in those meetings when we discuss it ;-)

Reply
Alex Gorbachev
February 13, 2009 5:18 am

I imagine diversity adds complexity most often.

More skills, different procedures, different bugs, issues, huge integration headache. I just don’t see how you can keep simplicity with increasing diversity. I guess we are going a bit sideways from the original topic… not that it’s matter. :)

Reply

“If takes to the extreme, … ”
You stopped far to early. Let’s rearch for the really extreme, don’t stop at the technology stack.
In addition to diversity on pure technology, also seperated operational and engineering staff is needed, to avoid the same person makes systematic flaws even on different stacks.
This is also true for developers, QA departement, and in last consequence also business departements, and business itselve.
To bring it to an end, bring diversity to management also. Even a CEO can fail.
Perfect! We now have full diversity on ALL layers: 2 totally different companies!
But wait, these 2 totally seperated companies both suffer from single points of failure all over again. That is nothing to fear, as we have a great method to solve this: diversity!

Reply

Depends on the threat model. If the most critical threat is one which will take down both sites (i.e. some kind of exploit that both are exposed to) then it could be rationalised, I suppose. But otherwise common sense arguments would prevail.

NetApp? Schmeh. Use ZFS!

Reply

The rules of economics and the customers ability to deliver technology outweigh the “tech” of the design.

If the customers DR design prevents then from using the DR equipment in an active fashion, then “cheaper” alternatives are selected due to the economics of the design. A 2x cost on infrastructure to support DR where 1/2 is sitting idle does not get approved. So, alternatives and cost cutting options prevail from less compute power (less cpu) to less io performance capability. This is further defined by “If we are in a DR situation we will be able to run at partial capacity for 2-3, 5 months.”

Other economics prevent “false-designs” no different then “false positives”. A lot of customers, frankly overpurchase technology. And this is due to the negotiation power of vendor sales reps and large OEMs. Simple example, larger companies never look at Oracle Standard edition as a database option, they run Oracle Enterprise at some 55% higher the cost annualized yearly in increased support cost. That cost goes into someone elses pocket. And is $$$ that could have been spent to have proper infrastructure, equipment, projects and technology implmeneted to run their business. But, they will all complain that “oracle” is too expensive. Same paradigm happens with hardware where customers by compute capacity because their skills and architecture can not align with annualized planning to incrementally deploy scalable solutions. And to this I say, “It is a lot harder to incrementally add a $750k server.”

There is value in using different solutions in tiered approaches where impacted is isolated. Like virus definitions. Clearly you can not do this same thing in the “Application” design like swap out Weblogic for IIS.

I think like the last year of the US economy has made families really look a their spending and return on their $$$, so will happen in IT investments. And cost value additions like Cloud Computing and IT purchases looking at the annualized cost forcasting will make the customer aligned to making smarter decisions.

Reply
Alex Gorbachev
February 15, 2009 5:04 pm

@Martin: LOL. Good one.

@Toby: Agree on ZFS but I wish it’s available on all platforms!

@Neil: “annualized cost forcasting” — does it remind me ROI criteria? The problem is that people can make such forecasting including what they think is appropriate to make numbers look right. Until, this forecasting is separated into independent business unit and/or audited, there is no way to apply it flawlessly.

Reply
Muhammad Ashraf
March 23, 2014 9:51 pm

If RTO and RPO have been determinen properly and accurately and SVA done efficiently,most of the job for designing HADR will be easier and more economical.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *