Apache Beam pipelines with Scala: part 3 – dynamic processing

In the third part of the series we will develop a pipeline to transform messages from “data” Pub/Sub using messages from the “control” topic as source code for our data processor. The idea is to utilize Scala toolBox. It’s much…

Read More >

Apache beam pipelines With Scala: Part 2 – Side Input

In the second part of this series we will develop a pipeline to transform messages from “data” Pub/Sub topic with the ability to control the process via “control” topic. How to pass effectively non-immutable input into DoFn, is not obvious,…

Read More >

Apache beam pipelines with Scala: part 1 – template

In this 3-part series I’ll show you how to build and run Apache Beam pipelines using Java API in Scala. In the first part we will develop the simplest streaming pipeline that reads jsons from Google Cloud Pub/Sub, convert them…

Read More >

Architecting a Modern Data Warehouse – Live Webinar

Join Pythian and DBTA for a live roundtable webinar Architecting a Modern Data Warehouse Live Roundtable Webinar Thursday, November 16, 2017 11:00 am PT / 2:00 pm ET REGISTER TODAY Today, the world of decision-making, along with the data sources…

Read More >

Datascape podcast episode 15 – machine learning primer for enterprise IT with Paul Spiegelhalter

Joining us today we have my esteemed colleague Paul Spiegelhalter. Paul is a data scientist and machine learning specialist with expertise in predictive analytics and algorithmic modeling across a number of industries, including computer vision, online advertising and user analysis,…

Read More >

Supervised machine learning: a conversational guide for executives and practitioners

This post gives a systematic overview of the vital points to consider when building supervised learning models. We address in Q&A style some of the key decisions/issues to go over when building a machine learning/ deep learning model. Whether you…

Read More >

Demystifying deep learning

Learning is a non-trivial task. How we learn deep representations as humans are high up there as one of the great enigmas of the world. What we consider trivial (and to some others natural) is a complex web of fine-grained…

Read More >

Join Pythian and DBTA on August 24, 2017 for a live roundtable webinar: harnessing the Hadoop ecosystem

Harnessing the Hadoop Ecosystem Live Roundtable Thursday, August 24 at 11:00 am PT / 2:00 PM ET REGISTER With a stake at the center of how organizations are consuming and leveraging big data, Hadoop adoption in the enterprise is growing…

Read More >

Datascape podcast episode 10 – getting transactional with Hadoop

In this episode we discuss using Hadoop as the data store for a public facing, web based application. We talk about some of the challenges and how they were overcome.

Read More >

Blazing Disk IO with Java

It’s well known that mmap helps to improve performance in particular use-cases (especially if your working set fits an available memory). This doesn’t mean that the file itself has to fit memory. mmaps provides real benefits over the other IO…

Read More >
Page 3 of 1012345...10...Last Page »