Data is everything, and everything is data

Posted in: Business Insights

Exploring the phenomenon of “datafication”

In the year 2000, only a quarter of the world’s stored information was digital; the rest was on paper, film, and other analog media. Today, less than two percent of all stored information is nondigital. (1)

This is largely the result of “datafication”, a process that turns all aspects of life—preferences, opinions, telephone calls and sensor-driven information—into data.

Datafication is the driving force behind Big Data. It’s also causing a threefold shift in how we look for meaning in the information available to us: away from traditional sampling approaches, toward greater tolerance of messy, unstructured data, and into the search for correlations rather than absolute, singular causes to explain trends and events. These changes are already having major impacts in every area of our lives—from scientific research to business and finance, to healthcare, education and transportation.

Watch this video of Pythian President and CEO Paul Vallée, as he talks about datification and generating revenue.

From sampling to knowing

Representative sampling is based on the idea that, within a certain margin of error, we can make inferences about a total population from a small, randomized subset. This works well for simple questions like, “Which of our customers generate the most revenue?” but lacks the detail to effectively answer queries like, “Which customers are most profitable?” or, “Which customers are considering to leave us for another vendor?”

Inexpensive computer memory, powerful processors and sophisticated algorithms now allow us to analyze vast amounts of data rather than small samples. Using Big Data in this way has the considerable advantage of predictive capability—it can identify patterns and trends that aren’t detectable in a small sample, giving an unprecedented view of future behavior.

From clean to messy

What’s new about Big Data isn’t just that there’s lots of it. Because it comes from many different sources in many different formats, it’s not tidy like traditional datasets. Tolerating some inaccuracy may require data analysts to shift their outlooks a little, but when you’re trying to answer big, complex questions, the gain in data scope is a good trade-off against using smaller amounts of very exact data. Here’s an example.

In 2009, Google showed it’s possible to predict locations of seasonal outbreaks of the flu using nothing more than archived records of Google searches. The sheer size of the data set (think a billion searches a day in the U.S. alone) more than compensated for its messiness. After running nearly half a billion calculations against the data, Google identified 45 terms—words such as “headache” and “runny nose”—that had a strong correlation with the CDC’s data on flu outbreaks.

From cause to correlation

The Google example points to a third change brought about by datafication and Big Data: abandoning the search for certainty. Instead of looking for causes, innovative data users are looking for correlations. For example, automotive and aviation engineers are collecting and analyzing massive quantities of information on engines that have failed, looking for patterns that will help them predict when other engines might be at risk of failing in the future. They’re not seeking a single cause for a single event; they’re mapping correlations between huge numbers of events to recognize patterns that can be put to practical, preventative use.

The correlation approach has been used to spot infections in premature babies before overt symptoms appear and to predict everything from manhole cover failures to consumer purchasing habits.

Big Data insights require Big Thinking

Harnessing the powerful, often unpredictable, insights available from Big Data requires three things: as complete a dataset as possible, people with the skills required to collect, manage and analyze that data, and people who know how to ask unexpected, even visionary questions. It’s not just a matter of the right technologies—it’s about a fundamental shift in how we relate to data and what can be done with it.
Sources
1. https://www.foreignaffairs.com/articles/2013-04-03/rise-big-data

email

Want to talk with a technical expert? Schedule a tech call with our team to get the conversation started.

About the Author

Lynda Partner is a self-professed data addict and experiences the power of data every day as Pythian’s Vice President of Analytics-as-a-Service. The author of Pythian’s Love Your Data mantra, Lynda understands very well how data can transform companies into competitive winners and she was the driving force in adding an analytics practice to Pythian’s database focus. Lynda works with companies around the world and across industries to turn data into insights, predictions and products, and is the co-author of Designing Data Lakes in the Cloud.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *