Today’s blog post completes our three-part series with excerpts from our latest white paper, Microsoft Hadoop: Taming the Big Challenge of Big Data. In the first two posts, we discussed the impact of big data on today’s organizations, and its challenges….
Read More >Today’s blog post is the second in a three-part series with excerpts from our latest white paper, Microsoft Hadoop: Taming the Big Challenge of Big Data. In our first blog post, we revealed just how much data is being generated globally every minute…
Read More >Today’s blog post is the first in a three-part series with excerpts from our latest white paper, Microsoft Hadoop: Taming the Big Challenge of Big Data. As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways…
Read More >Virtually everyone in data space claims today that they are a Big Data vendor and that their products are Big Data products. Of course, if you are not in Big Data then you are legacy. So how do you know whether a product is a Big Data product?
Read More >Who: Hosted by Blackbird, with a speaking session by Alex Morrise, Chief Data Scientist at Pythian. What: TechTalk presentation, beer, wine, snacks and Q&A Where: Blackbird HQ – 712 Tehama Street (corner of 8th and Tehama) San Francisco, CA When:…
Read More >When I’m not working on Big Data infrastructure for clients, I develop a few internal web applications and side projects. It’s very satisfying to write a Django app in an afternoon and throw it on Heroku, but there comes a…
Read More >One of the well-known best practices for HDFS is to store data in few large files, rather than a large number of small ones. There are a few problems related to using many small files but the ultimate HDFS killer…
Read More >Yesterday, Cloudera released the score reports for their Data Science Challenge 2014 and I was really ecstatic when I received mine with a “PASS” score! This was a real challenge for me and I had to put a LOT of…
Read More >Of course, everyone knows Hadoop as the solution to Big Data. What’s the problem with Big Data? Well, mostly it’s just that Big Data is too big to access and process in a timely fashion on a conventional enterprise system….
Read More >For those who aren’t familiar, Apache Ambari is the best open source solution for managing your Hadoop cluster: it’s capable of adding nodes, assigning roles, managing configuration and monitoring cluster health. Ambari is HortonWorks’ version of Cloudera Manager and MapR’s Warden, and it has been steadily improving with every release. As of version 1.5.1, Ambari added support for a declarative configuration (called a Blueprint) which makes it easy to automatically create repeatable clusters in the cloud. I’ll give an example of how to use Ambari Blueprints, and compare them with existing one-touch deployment methods for other distributions.
Read More >