Author: Valentin Nikotin

Scala: generics, primitive types and nulls

While working on a Scala (2.12.4) project, I made a typo in my code and received an interesting output: While it was an absolutely useless construction, it still causes such misleading output as seen above: I would say that this…

Read More >

Scala TreeSet: mutable vs immutable

I was exploring immutable RedBlackTree tree implementation in Scala (2.12.4) when I noticed something that wasn’t clear to me: Comparing keys via compare is followed by explicit key not “equals” condition. I compared similar parts of mutable RedBlackTree implementation and…

Read More >

Spark UDF memoization

Memoization is a powerful technique that allows you to improve performance of repeatable computations. Although it would be a pretty handy feature, there is no memoization or result cache for UDFs in Spark as of today. In fact it’s something…

Read More >

Spark Scala UDF primitive type bug

I was working on an instrumentation framework for Scala UDFs in Spark when I noticed a subtle difference in the execution plan depending on whether I used wrappers or not. It looked like some code was added or was not…

Read More >

Spark performance regression with sum aggregations

There is an interesting bug that was found during the latest performance tuning we performed for Spark 2.2 (2.3 is also affected). It was a batch Spark job scheduled to be executed hourly and to process about 1Tb worth of…

Read More >

Building a custom routing NiFi processor with Scala

In this post we will build a toy example NiFi processor which is still quite efficient and has powerful capabilities. Processor logic is straightforward: it will read incoming files line by line, apply given function to transform each line into…

Read More >

Minimal Twitter to Google Pub/Sub example with Scala

Recently I was looking for a simple Twitter to Pub/Sub streaming pipeline and ended up with own implementation in Scala. I tried to make it as compact as possible. So I chose the dispatch and Google Pub/Sub client libraries for…

Read More >

Apache Beam pipelines with Scala: part 3 – dynamic processing

In the third part of the series we will develop a pipeline to transform messages from “data” Pub/Sub using messages from the “control” topic as source code for our data processor. The idea is to utilize Scala toolBox. It’s much…

Read More >

Apache beam pipelines With Scala: Part 2 – Side Input

In the second part of this series we will develop a pipeline to transform messages from “data” Pub/Sub topic with the ability to control the process via “control” topic. How to pass effectively non-immutable input into DoFn, is not obvious,…

Read More >

Apache beam pipelines with Scala: part 1 – template

In this 3-part series I’ll show you how to build and run Apache Beam pipelines using Java API in Scala. In the first part we will develop the simplest streaming pipeline that reads jsons from Google Cloud Pub/Sub, convert them…

Read More >
Page 1 of 212