Tag: Big Data

Why your Cassandra cluster scales poorly

Apache Cassandra promises linear scalability and workload distribution, among many other features—and rightly so. However, as with many good things in life, these benefits come with a set of upfront conditions. When the use case aligns with the architectural limitations,…

Read More >

Data Streaming with Kafka and Flink on AWS – Part 2

Apache Kafka and Apache Flink are popular platforms for data streaming applications. However, provisioning and managing your own clusters can be challenging and incur operational overhead. Amazon Web Services (AWS) provides a fully managed, highly available version of these platforms…

Read More >

Data Streaming with Kafka and Flink on AWS – Part 1

Apache Kafka and Apache Flink are popular data streaming applications platforms. However, provisioning and managing your own clusters can be challenging and incur operational overhead. Amazon Web Services (AWS) provides a fully managed, highly available version of these platforms that…

Read More >

Datascape Episode 57: Building Big Data Platforms in Azure With Luan Moreno

datascape episode 57

Episode 57 Shownotes Welcome to another episode of the Datascape Podcast. On today’s show, Big Data expert and Microsoft data platform MVP Luan Moreno tunes in from Brazil.  Luan talks about the start of his career as a DBA, his…

Read More >

Python: Using Dataclasses to Model Your Data

3d chart data model tablet

Here at Pythian, we love our data. Our code is no exception (pun sort of intended), so I’ll be covering dataclasses in Python today. The problem As a Python developer, you’ve almost certainly run into code that looks like the…

Read More >

Snowflake System Function Error: Argument 0 to Function SYSTEM$PIPE_STATUS Needs to Be Constant

I recently encountered the above issue which prompted me to write this blog post so I can easily reference the solution whenever I need it. However, I also hope it might help anyone out there who hits a similar issue….

Read More >

Consuming Tweets Using Apache Beam on Dataflow

Apache Beam is an SDK (software development kit) available for Java, Python, and Go that allows for a streamlined ETL programming experience for both batch and streaming jobs. It’s the SDK that GCP Dataflow jobs use and it comes with…

Read More >

Dipping Your Toes Into Building an Analytics Platform on Google Cloud Platform

“We have many disparate data sources and we’re having a hard time getting a global view of all our data across our organization.” “Our data is currently all in <enter data warehouse name here> and we want to migrate it…

Read More >

Can you run Hadoop in the cloud?

As a solutions architect at Pythian, I often get questions from clients about the many solutions available to them to address their big data needs. Between Hadoop, cloud-based, and hybrid solutions, finding the best option for their unique needs can…

Read More >

Three reasons why you need a customer data platform right now

A Twitter user recently turned to the platform to issue an appeal to her bank: “Please don’t send me emails asking if I’m ready to buy a house ten minutes after emailing me an overdraft notice.” That one tweet neatly…

Read More >
Page 1 of 1012345...10...Last Page »