Five things to know before migrating your data warehouse to Google BigQuery

Posted in: Business Insights, Cloud, Google Cloud Platform, Technical Track

 

In September, Google announced enhancements to BigQuery, the Google Cloud Platform service for large-scale data analytics. So how do you know if Google BigQuery is the right tool for what you want to do—and if it is, what’s the best way of migrating your data into it? We’ve compiled some tips to help companies answer these and related questions. Here are five things you should know before migrating your data warehouse to Google BigQuery:

 

1. Know your requirements

It’s important to appreciate what Google BigQuery is—and what it isn’t. What it is is a powerful analytics engine for processing big datasets. Which means if you’re working with a small dataset, you’re not going to realize its full potential. There are also some specific database functions it doesn’t support (e.g., it’s not an OLTP database, it doesn’t support locking, multi-row/table transactions, primary keys and referential integrity). Make sure your data and your data processing goals are well aligned with BigQuery’s capabilities.

2. Validate your assumptions

If BigQuery seems like the right fit, test it out. Identify a “lighthouse” project—some kind of leading initiative or an area with substantial cost or performance impact—to put BigQuery through its paces. As you do, set measurable goals for performance, cost and usability, and see how Google BigQuery delivers.

3. Think integration

Google BigQuery will give you a powerful analytics engine, but you’ll most likely still want to draw on your existing tools for data transformation, visualization, and so on. Confirm ahead of time how BigQuery will integrate with the rest of your data environment and determine how you’ll need to adjust your existing data pipeline to make it fit.

4. Factor your costs

Make sure you understand Google’s pricing structure and what option will work best for your enterprise. For Google Cloud Platform, computational resource and storage costs are usage-based and calculated independently. And although storage is always volume based with some automatic discounts for idle data (which is currently set at more than 90 days), firms that prefer a predictable monthly cost can reserve computational resources.

5. Look beyond migration

Get familiar with the ways Google BigQuery lets you monitor and analyze service usage to ensure you’ll have the analytics you need to evaluate performance, resource demand and cost over time. BigQuery supports Stackdriver integration, audit logging, and more—so it should yield insights that you can convert into action for your organization, but make sure up front.

In addition to these considerations, you will need to consider storage for things like landing zone, archiving, staging, etc.. And with Goolge’s new Cloud Storage offerings (Multi-Region, Nearline and Coldline – https://cloud.google.com/storage/pricing), adopting a Google Cloud Platform solution may be even more attractive. The new storage offering will benefit any organization that recognizes that data is key to driving better business outcomes. This solution provides: more storage options to help companies create optimal performance/price mixes, well-priced low latency storage options, multi-regional storage to improve reliability and responsiveness for companies with geographically dispersed consumers.

 

If you’d like a more in-depth look at BigQuery and the migration process, download our white paper A Framework for Migrating Your Data Warehouse to Google BigQuery.

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Big Data Principal Consultant
Vladimir is currently a Big Data Principal Consultant at Pythian, and well-known for his expertise in a variety of big data and machine learning technologies including Hadoop, Kafka, Spark, Flink, Hbase, and Cassandra. As a big data expert with over 20 years of global experience, he has worked on projects for enterprise clients across five continents while being part of professional services teams for Apple Computers Inc., Sun Microsystems Inc., and Blackboard Inc. Throughout his career in IT, Vladimir has been involved in a number of startups. He was Director of Application Services for Fusepoint (formerly known as RoundHeaven Communications), which grew by over 1,400% in 5 years, and was recently acquired by CenturyLink. He also founded AlmaLOGIC Solutions Incorporated, an e-Learning Analytics company.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *