Tag: Big Data

Snowflake System Function Error: Argument 0 to Function SYSTEM$PIPE_STATUS Needs to Be Constant

I recently encountered the above issue which prompted me to write this blog post so I can easily reference the solution whenever I need it. However, I also hope it might help anyone out there who hits a similar issue….

Read More >

Consuming Tweets Using Apache Beam on Dataflow

Apache Beam is an SDK (software development kit) available for Java, Python, and Go that allows for a streamlined ETL programming experience for both batch and streaming jobs. It’s the SDK that GCP Dataflow jobs use and it comes with…

Read More >

Dipping Your Toes Into Building an Analytics Platform on Google Cloud Platform

“We have many disparate data sources and we’re having a hard time getting a global view of all our data across our organization.” “Our data is currently all in <enter data warehouse name here> and we want to migrate it…

Read More >

Can you run Hadoop in the cloud?

As a solutions architect at Pythian, I often get questions from clients about the many solutions available to them to address their big data needs. Between Hadoop, cloud-based, and hybrid solutions, finding the best option for their unique needs can…

Read More >

Three reasons why you need a customer data platform right now

A Twitter user recently turned to the platform to issue an appeal to her bank: “Please don’t send me emails asking if I’m ready to buy a house ten minutes after emailing me an overdraft notice.” That one tweet neatly…

Read More >

Data modeling for cloud DW

In this blog post, I would like to share some options that you can consider to model your cloud DW for better query performance.  With a traditional EDW, we would either come up a STAR, Snowflake or similar schemas. These…

Read More >

Big Data on Microsoft Azure – HDInsight

Introduction   The best definition you going to find for data is that data is the new oil in today’s world. Starting from that, we can define a new horizon and a new way of looking at how we treat…

Read More >

Streaming Oracle to Kafka – stories from the message bus stop

Fascinated by streaming data pipelines, I have been looking at different ways to get data out of a relational database like Oracle and into Apache Kafka. I have presented about this topic at a number of conferences. There is a…

Read More >

Building a custom routing NiFi processor with Scala

In this post we will build a toy example NiFi processor which is still quite efficient and has powerful capabilities. Processor logic is straightforward: it will read incoming files line by line, apply given function to transform each line into…

Read More >

Updating Elasticsearch indexes with Spark

With the extensive adoption of Elasticsearch as a search and analytics engine, more often we build data pipelines that interact with Elasticsearch. And apparently, most often the processing framework of choice is Apache Spark. Although reading data from Elasticsearch and…

Read More >
Page 1 of 1012345...10...Last Page »