I’ve been reading a book by my good friend Jeff Needham “Disruptive Possibilities: How Big Data Changes Everything” and it cemented some thoughts that had been forming in my head for a while and gave me bunch of new insights. Jeff managed to pack an incredible amount of information in a very concise form. I thoroughly recommend getting a copy for yourself when it’s published. (I’ve got an author’s copy from Hortonworks at StrataConf in Santa Clara few weeks ago.)
Supercomputers, and later High Performance Computing (HPC), have been around for a while. However, they were only available to organizations and projects that were extremely well funded. Military defense and intelligence departments are a good example. Oil and gas exploration that required an enormous amount of processing to predict where to drill with a high degree of certainty is another good example. They can afford to invest in supercomputers because the alternative was to drill expensive wells with little chance to hit the oil reserves. Since drilling a single well costs millions of dollars, companies can get a high return on investments into supercomputing.
Modern commercial supercomputing in the age of Datafication is what we today call Big Data. I think a better term for it would be Data Supercomputing, but the industry has already spoken, so Big Data it is. The architecture shifted from environments that required massively-parallel compute-intensive number crunching to massively-parallel data-volume-intensive processing. The performance of processors and storage media capacity has been growing much quicker than storage and network performance while data volumes growth trumped all of them. This was triggered by the shift in supercomputing architecture.
Hadoop is the first modern commercial supercomputing platform — it’s here to stay and evolve.