Updating Elasticsearch indexes with Spark

With the extensive adoption of Elasticsearch as a search and analytics engine, more often we build data pipelines that interact with Elasticsearch. And apparently, most often the processing framework of choice is Apache Spark. Although reading data from Elasticsearch and…

Read More >

Minimal Twitter to Google Pub/Sub example with Scala

Recently I was looking for a simple Twitter to Pub/Sub streaming pipeline and ended up with own implementation in Scala. I tried to make it as compact as possible. So I chose the dispatch and Google Pub/Sub client libraries for…

Read More >

Apache Beam pipelines with Scala: part 3 – dynamic processing

In the third part of the series we will develop a pipeline to transform messages from “data” Pub/Sub using messages from the “control” topic as source code for our data processor. The idea is to utilize Scala toolBox. It’s much…

Read More >

Apache beam pipelines With Scala: Part 2 – Side Input

In the second part of this series we will develop a pipeline to transform messages from “data” Pub/Sub topic with the ability to control the process via “control” topic. How to pass effectively non-immutable input into DoFn, is not obvious,…

Read More >

Apache beam pipelines with Scala: part 1 – template

In this 3-part series I’ll show you how to build and run Apache Beam pipelines using Java API in Scala. In the first part we will develop the simplest streaming pipeline that reads jsons from Google Cloud Pub/Sub, convert them…

Read More >

Architecting a Modern Data Warehouse – Live Webinar

Join Pythian and DBTA for a live roundtable webinar Architecting a Modern Data Warehouse Live Roundtable Webinar Thursday, November 16, 2017 11:00 am PT / 2:00 pm ET REGISTER TODAY Today, the world of decision-making, along with the data sources…

Read More >

Datascape podcast episode 15 – machine learning primer for enterprise IT with Paul Spiegelhalter

Joining us today we have my esteemed colleague Paul Spiegelhalter. Paul is a data scientist and machine learning specialist with expertise in predictive analytics and algorithmic modeling across a number of industries, including computer vision, online advertising and user analysis,…

Read More >

Supervised machine learning: a conversational guide for executives and practitioners

This post gives a systematic overview of the vital points to consider when building supervised learning models. We address in Q&A style some of the key decisions/issues to go over when building a machine learning/ deep learning model. Whether you…

Read More >

Demystifying deep learning

Learning is a non-trivial task. How we learn deep representations as humans are high up there as one of the great enigmas of the world. What we consider trivial (and to some others natural) is a complex web of fine-grained…

Read More >

Join Pythian and DBTA on August 24, 2017 for a live roundtable webinar: harnessing the Hadoop ecosystem

Harnessing the Hadoop Ecosystem Live Roundtable Thursday, August 24 at 11:00 am PT / 2:00 PM ET REGISTER With a stake at the center of how organizations are consuming and leveraging big data, Hadoop adoption in the enterprise is growing…

Read More >
Page 3 of 1012345...10...Last Page »