Spark is an open-source, distributed processing system used to manage big data workloads. Spark uses in-memory caching and optimized query execution for fast analytic queries against any data size. Simply put, Spark is used to process data on a very…
Read More >Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on August 15, 2019. In this post, I’ll describe a…
Read More >Spark Overview Spark was created in 2009 as a response to difficulties with map-reduce in Hadoop, particularly in supporting machine learning and other interactive data analysis. Spark simplifies the processing and analysis of data, reducing the number of steps and…
Read More >Introduction The best definition you going to find for data is that data is the new oil in today’s world. Starting from that, we can define a new horizon and a new way of looking at how we treat…
Read More >Memoization is a powerful technique that allows you to improve performance of repeatable computations. Although it would be a pretty handy feature, there is no memoization or result cache for UDFs in Spark as of today. In fact it’s something…
Read More >I was working on an instrumentation framework for Scala UDFs in Spark when I noticed a subtle difference in the execution plan depending on whether I used wrappers or not. It looked like some code was added or was not…
Read More >There is an interesting bug that was found during the latest performance tuning we performed for Spark 2.2 (2.3 is also affected). It was a batch Spark job scheduled to be executed hourly and to process about 1Tb worth of…
Read More >With the extensive adoption of Elasticsearch as a search and analytics engine, more often we build data pipelines that interact with Elasticsearch. And apparently, most often the processing framework of choice is Apache Spark. Although reading data from Elasticsearch and…
Read More >Organizations are tapping into increasingly sophisticated analytics techniques to improve opportunities for growth, innovation and competitive advantage. Organizations are increasing the sophistication of their analytics strategies to improve opportunities for growth, innovation and to take a competitive advantage. The analytics…
Read More >