Memoization is a powerful technique that allows you to improve performance of repeatable computations. Although it would be a pretty handy feature, there is no memoization or result cache for UDFs in Spark as of today. In fact it’s something…
Read More >There is an interesting bug that was found during the latest performance tuning we performed for Spark 2.2 (2.3 is also affected). It was a batch Spark job scheduled to be executed hourly and to process about 1Tb worth of…
Read More >Join Pythian and DBTA for a live roundtable webinar Architecting a Modern Data Warehouse Live Roundtable Webinar Thursday, November 16, 2017 11:00 am PT / 2:00 pm ET REGISTER TODAY Today, the world of decision-making, along with the data sources…
Read More >Harnessing the Hadoop Ecosystem Live Roundtable Thursday, August 24 at 11:00 am PT / 2:00 PM ET REGISTER With a stake at the center of how organizations are consuming and leveraging big data, Hadoop adoption in the enterprise is growing…
Read More >In this episode we discuss using Hadoop as the data store for a public facing, web based application. We talk about some of the challenges and how they were overcome.
Read More >If you’re thinking about moving from a traditional relational database management system (RDBMS), you should consider Apache™ Hadoop®—because your competitors probably are. According to Gartner, Hadoop joined the mainstream in 2016. And Allied Research says the Hadoop market will likely…
Read More >I started hearing the term ‘data lake’ a few years ago but didn’t pay a ton of attention to it. Today, the term’s still around and so is the hype. According to this article on Wikipedia the term is poorly…
Read More >Warner Chaves, Principal Consultant at Pythian and Microsoft MVP explores and explains the basic fundamentals of Azure Data Lake.
Read More >Rohan Bhagat, Systems Administrator at Pythian, details his findings and experiences after deploying his own Hadoop cluster.
Read More >Lynda Partner, Vice President of Marketing at Pythian, describes how to explain the benefits of big data systems like Hadoop, to people who have only ever known traditional data warehouses.
Read More >