Advice from analytics pros: “Plan for 1000 sources of data and don’t move it any more than you have to”

Posted in: Business Insights

“Plan for 1000 sources of data” and “Don’t move data any more than you have to.” These were just two of the many interesting best practices shared with a room of IT executives at TDWI’s Orlando analytics leadership summit earlier this week. Analytics leaders from companies like Red Hat, Disney, Macy’s, Skullcandy and Quicken Loans shared their experiences from their journey towards becoming data enabled organizations.

When it comes to architecting analytics platforms to make data available to users, thinking big while staying efficient was a recurring theme throughout the conference. While not all platforms actually bring in thousands of data sources, the day is coming soon when they’ll have to handle these massive volumes. That means forward thinking and planning are critical when making decisions today on architecting or buying a platform to handle your data.

Getting ALL THE DATA in one place is also a key component of a solid data enablement strategy.  When exploring all of the options for data platform architectures, a number of solutions were discussed. All of them included data lakes in addition to data warehouses, along with a combination of open-source technologies. For example, in one scenario Hadoop was used for a company’s data lake and R was the solution for analytics, while commercial software like Tableau and SAS was used for visualizations. Most used a mix of cloud and on-premises environments. One speaker described his data lake on the cloud as “Deep Data” and his traditional data warehouse on-premises as “Shallow Data.” People spoke of the cloud as a facilitator of scale and agility, and sometimes— but not always, cost savings.

The need for access to ALL THE DATA is driven by more than just the desire for a richer data set. It’s a response to the growing awareness that different user personas exist and they all have different data access needs. While the majority of users will access nicely curated and governed data in the data warehouse, the growing ranks of data scientists want access to ALL THE DATA, even the messy ungoverned data that logically sits in the data lake. And with the emergence of “Citizen Data Scientists” or power users who fall somewhere between business users and data scientists, another class of data is emerging – a lightly governed and curated subset of the raw data also in the data lake. Data can no longer be assumed to be governed to be useful.  With ALL THE DATA comes choice.

Cloud was a recurring thread woven throughout many discussions around ALL THE DATA — and the realization that most of the data being brought into the analytics platforms comes from outside the enterprise, and from the cloud. Moving data from a source on the cloud to a data lake on the cloud is simply more efficient than moving it to and from the cloud, leading to the best practice “Don’t move your data any more than you have to.”  When you couple this advice with the knowledge that data flows not just from the data warehouse into the data lake but also from the data lake into the data warehouse, you can expect to see more data warehouses move to the cloud in the future using services like Microsoft’s Azure SQL Data Warehouse, Google’s BigQuery and AWS’s Redshift. We are seeing this every day at Pythian and we specifically designed our Kick Analytics as a Service offering to meet the growing demand for cloud-native analytics solutions.

email

Want to talk with a technical expert? Schedule a tech call with our team to get the conversation started.

About the Author

With more than two decades of experience in digital organizations, Lynda Partner is a seasoned thought leader with an exceptional record of driving businesses to financial, operational, and market success. Lynda is passionate about data and helping enterprises turn valuable insights from their data into revenue. She wears multiple hats at Pythian. As Vice President of Analytics, Lynda has been instrumental in growing and defining Pythian’s analytics practice and is the driving force behind its new Kick Analytics-as-a-Service solution. As Vice President of Marketing at Pythian, Lynda makes data-informed decisions every day, empowering the team to achieve greater results with measurable outcomes. Before joining Pythian, Lynda was Chief Communications Officer at publicly-traded Redline Communications where she and her team helped return the company to profitability after 15 straight years of losses. Lynda has led and founded several successful start-ups, including In-Touch Insights, and GotMarketing (Campaigner.com). Lynda has been a digital marketer for over a decade and is a certified quantitative market researcher.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *