The Problem
As an IT Manager you may be thinking of acquiring a data warehouse to develop insights for your business. You are presented with a number of options like installing it in your own data center, using a cloud service or even purchasing a hardware/appliance ready for this purpose. This is a quick guide to help you compare certain options. The article will not describe any specific product.
Reference notes
This is a quick overview to get to know certain options in the market. It is not, however, a comprehensive comparison. Some of the features and characteristics mentioned in this post might change in the near future as cloud services improve in a fast paced manner. All of the mentioned information is current as of December 2016.
Glossary
QA: Quality assurance
BI: Business Intelligence
DW: Data Warehouse
MS: Microsoft
AWS: Amazon Web Services (known as Amazon Cloud)
Azure: Microsoft Cloud
R&D: research and development
OLTP: Online Transaction Processing
The Options
The following are your options for acquiring a data warehouse:
- Install your Own : trough business intelligence software from Microsoft, Oracle, so on, install a data warehouse server and some type of reporting tool (e.g tableau, MS Power BI) in your own data center.
- Use a Cloud Service: signup for any cloud service like AWS RedShift or Azure Data warehouse.
- Purchase a state of the art appliance: you can buy an appliance like Microsoft APS or Oracle Exadata, get it shipped to your data center and start using it.
Some Background Information
I often get asked, “How come you can’t just put your data warehouse information in a regular database service instead of a data warehouse service?”
The main thing is that the data organization for data warehouses is designed differently. The relations between tables works differently and most likely will be de-normalized, and also this means the amount of data is going to be a lot bigger than of the OLTP database (typically), access to this type of data organization needs to happen in a specific manner, also the way data is stored should be different as well to ensure optimal performance. So even though you can query your data warehouse and your main database with a SELECT statement, what happens underneath is quite different.
Among the “under the hood” characteristics that make a data warehouses different than an OLTP is the usage of columnar storage, compression and data shuffling. The explanation for all these terms need an article itself but I will just mention this so you can understand that having these features requires extra consumption of computing resources. This means in order to have a data warehouse, most likely you will need more CPU and better storage. This translates into the fact that a data warehouse service or appliance could be a bit more expensive than an OLTP solution depending on the angle you look at it.
The Options in Terms of Value to Your Company
Keep in mind 2 important terms, CAPEX and OPEX:
CAPEX: being your capital investment, in other words, the money or budget you are allocating for this project
OPEX: the cost of maintaining this solution, this value can be calculated in different forms taking into account variables like cooling costs, electricity costs, rent costs, server admin, database admin, application development cost, so on.
If you choose to install your own solution in a private data center, it will provide you with:
Higher complexity and weeks to get insights. Provisioning servers, installing OS, installing your DW and BI tools, loading your data, QA and production release will require a group of specialized individuals not just in the Data warehouse area but also in server management, networking, so on. ( high CAPEX and OPEX)
Average to good industry level performance. Depending on the power of your hardware, your architecture and data warehouse design you will get a very good performance. This will be impacted by the tools, design, type and amount of data and hardware configuration ( high CAPEX and OPEX)
If you choose to use a cloud solution, it will provide you with:
Low complexity and days to get insights. You don’t need to install servers, tools, software, using a service will provide to you everything ready for usage, backups, OS patches, antivirus and all the operational costs will be included in your price rate, so you don’t need to worry about that.( very low CAPEX and very low OPEX).
Average to very good industry level performance. With cloud services you are able to define how much power you are going to buy, if you start in the lower configurations your performance is not going to be the best compared to industry standards (is not going to be bad neither), as you increase the power and buy some more credits/units/space/nodes/etc your performance will start increasing in the industry spectrum ( very low CAPEX and very low OPEX).
If you choose to buy an appliance, it will provide you with:
Lower complexity and weeks/months to get insights. You don’t need to install servers, tools, software, as the appliance will arrive all installed and configured according to best practices, however, you do need to plug it into your network and do various configurations. The cost of this appliance is very high and therefore, normally depends on the approval and signature of several individuals making the purchase process longer than usual. Because of the price, knowledgeable technicians are not that easy to find, since not a lot of companies buy these, not a lot of professionals with experience use it. Very High CAPEX and low to medium OPEX).
High industry level performance. The data warehouse appliance will provide top industry performance, hence the expensive cost, you are paying for great throughput, so depending on the type of project you want to start this might or might not make sense to your company. Most companies acquiring an appliance already own a data ware house, have realized they need more power and after making a cost/benefit analysis, reach to the conclusion that they need to invest in an appliance. (Very High CAPEX and low to medium OPEX).
Recommendations
- Let’s talk about elasticity. With cloud services you are able to allocate a small data warehouse, do your initial proof of concept, run this for a month and if your managers, users, and stakeholders are not really feeling the solution, then you can just turn down your server and off you go the next idea. On the other hand, should you be in the right direction, you can start buying more power and increasing the system resources as you start seeing more and more users, or get more and more data in. The elasticity allows you to do this, increase and decrease depending on the business needs. If you are an agile company this should align very good to the principle of failing fast and providing value in short time-boxed iterations.
- If you are a mature company with already multiple data warehouse and business intelligence solutions and you are mainly worried about performance, you will need to estimate the cost of acquiring a top configuration in the cloud and the process of moving your data to it against buying a DW appliance. Make sure to factor in the scalability, scaling an appliance is not that easy and also expensive if scaling is an important factor seriously think of using high-end cloud configurations.
- Check your software licenses and repositories, you might already have in your hand a business intelligence tool, or licenses, like in the case of SQL Server it comes with the BI tools as well, so you might have the ability to use it without paying any extra. If you feel you already have the tools, the technical team and some good candidates for servers, you might as well just use what you have in hand to at least get a proof of concept rolling. A case like this might provide to you faster execution as the red tape should be less to get it approved.
- If your company already owns cloud services accounts like in AWS or Azure, check what you have purchased and available, most of the times it makes sense to leverage this situation to get your system in the cloud from scratch.
- Analyze the amount of data, security regulations for transferring it and also add into this formula the location of the data sources, this will indicate you if you are going to use a complete cloud solution or hybrid, you might have some systems in the cloud , others on premise, so what is going to be better for you? Consolidate them in the cloud? On-premise? Or both?
- If you don’t know where to start, begin by checking your cloud solution provider terms. Most have free trials or X amount of free credits for you to test their various services. Depending on the provider you might be able to run a test without cost leaving you with physical proof to show your business and make a case for your desired project.
- The trend is clear, everything is pointing to the cloud these days, however, check your company culture, regulations and needs. You might realize the cloud is not ready for your enterprise or vice versa, then pushing for other mentioned alternatives might make sense.
- Start-up? Or R&D? Using cloud services for testing ideas, and controlling your limited budget is the best option out there. This is a great environment for proof of concepts, demos, or even your complete low cost IT infrastructure.
Closing Point
As you can probably already see, making this decision, installing it, configuring it and maintaining it requires expert advice and technical experience. Lots of companies trust in Pythian for this role and also for the added value of being a one-stop shop for database administration, system administration, cloud services administration, and more. This is what we call the Yin and Yang of IT, and you can find out more about our vision here.
No comments