Tag: Spark

How to deploy machine learning on Google Cloud Platform

In this blog post, I will describe a few takeaways on how to deploy or submit Machine Learning (ML) tasks on Google Cloud Platform (GCP). If you have less experience as a ML engineer or if you are a solution…

Read More >

Big Data on Microsoft Azure – HDInsight

Introduction   The best definition you going to find for data is that data is the new oil in today’s world. Starting from that, we can define a new horizon and a new way of looking at how we treat…

Read More >

Spark UDF memoization

Memoization is a powerful technique that allows you to improve performance of repeatable computations. Although it would be a pretty handy feature, there is no memoization or result cache for UDFs in Spark as of today. In fact it’s something…

Read More >

Spark Scala UDF primitive type bug

I was working on an instrumentation framework for Scala UDFs in Spark when I noticed a subtle difference in the execution plan depending on whether I used wrappers or not. It looked like some code was added or was not…

Read More >

Spark performance regression with sum aggregations

There is an interesting bug that was found during the latest performance tuning we performed for Spark 2.2 (2.3 is also affected). It was a batch Spark job scheduled to be executed hourly and to process about 1Tb worth of…

Read More >

Updating Elasticsearch indexes with Spark

With the extensive adoption of Elasticsearch as a search and analytics engine, more often we build data pipelines that interact with Elasticsearch. And apparently, most often the processing framework of choice is Apache Spark. Although reading data from Elasticsearch and…

Read More >

Grow and innovate your business with powerful analytics with Cassandra and Spark (part 1)

Organizations are tapping into increasingly sophisticated analytics techniques to improve opportunities for growth, innovation and competitive advantage. Organizations are increasing the sophistication of their analytics strategies to improve opportunities for growth, innovation and to take a competitive advantage. The analytics…

Read More >