Why your data hub belongs in the cloud

Posted in: Big Data, Business Insights, Cloud

Whether your core infrastructure is on-prem or in the cloud, there’s only one choice for your data hub

The public cloud effectively provides you with three options: Infrastructure as a Service, Platform as a Service or Software as a Service. And with IaaS, people often face the choice of “rent vs buy”. And that’s fair enough, as it really is a cost decision. When it comes to software choices, features and benefits are considered, alongside the cost of both the software and the ongoing support.

But a modern data hub needs to take advantage of PaaS – using data warehousing such as Google BigQuery, Amazon Redshift or Microsoft Azure SQL Data Warehouse; or managed Hadoop environments such as Google’s DataProc, Amazon’s EMR or Microsoft’s HDinsight. Trying to recreate the features of these systems on-premises would, for most companies, be a poor decision, and here is why:

  1. Building a modern data hub, to take advantage of all the different types of data you need to ingest (sensor data, relational data, third party, etc.) is an incredibly complex job.
  2. Complexity means one thing: it’s going to be expensive to build.
  3. Actually, it means another thing: it’s going to take a long time. Even if you can find the necessary resources, these are projects that can take months or years.
  4. And sorry, it means one more thing on top of that: the ongoing support costs will substantial. So, deep pockets are required.
  5. The cloud costs that would be avoided by going on-premises are not that much anyway – storage is cheap, and you can spin up services as and when needed; whereas buying hardware at the right capacity will be a significant part of your budget.  
  6. The PaaS platforms are really future-proof; at Pythian, we’re already seeing companies who invested in on-premises Hadoop data lakes start to move away from that; and although no-one can really tell what’s around the corner, the likelihood is that the major cloud vendors will be ahead of the rest of us, in building the technology.
  7. A lot of the data you will want to ingest is going to start in the cloud anyway, so do you really want to bring all that into your network to analyse?
  8. Security worries? I think we’ve gotten over those concerns now. They’re not unhackable, but it’s going to be a lot more difficult.
  9. Compliance worries? You can choose data centres in a huge number of countries, so that box is checked.
  10. And finally, in the future, you will probably want to do some clever data science work, and that is likely to involve using a whole load to pre-packaged cloud-based algorithms…  is that when you want to re-build things to make the data available to those toolsets?

Want to learn more about how a cloud-based analytics solution can help your organization? Read our white paper, The data warehouse is dead. Long live the data platform.

Interested in finding out how Pythian builds modern data warehouses? Get in touch via www.pythian.com

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *