GoldenGate 12.2 big data adapters: part 1 – HDFS

Gleb Otochkin, Principal Consultant and Certified Oracle Expert at Pythian, discusses the HDFS adapter for the newest version of GoldenGate.

Read More >

Google Cloud Dataproc in ETL pipeline – part 1 (logging)

Pythian’s Big Data Principal Consultant at Pythian, Vladimir Stoyak talks about Google Cloud Dataproc, and provides an in depth look at logging in this technical blog post.

Read More >

Configure high availability – load balancing for Hiveserver2

Manoj Kukreja, Pythian Big Data Consultant, provides you with the right steps to ensure that you have a smooth and available Hive system, performing under increased workloads.

Read More >

How to deploy a cluster

Zunaira Jamil, Pythian co-op student in our big data practice, walks you through her lessons learned and solutions when installing a cluster using Cloudera manager for the first time.

Read More >

Issues with triggers in Cloudera manager

Valentin Nikotin explains why triggers in Cloudera Manager is a very useful feature, as well as how you can set them up to monitor tons of available metrics using tsquery language.

Read More >

Recursion in Hive – part 1

In Part 1 of this series, Valentin Nikotin, will teach you about migrating from RDBMS to Hive, while maintaining the simplicity and flexibility of a SQL approach.

Read More >

Comparing schemas between hive clusters

  When running several different hive instances, we found the process of maintaining/confirming synchronization to be quite time consuming. While several tools made available by Cloudera and other frameworks do provide a visual interface for an individual instance of hive, there was…

Read More >

Magic of “\d” in Vertica

A quick neat way to list down important and oft-needed information like names of databases, schemas, users, tables, projections etc. We can also use patterns with the ‘\d’ to narrow down the results. Let’s see it in action:

Read More >

Mongostat – A nifty tool for Mongo DBAs

One of the main MongoDB DBA’s task is to monitor the usage of MongoDB system and it’s load distribution. This could be needed for proactive monitoring, troubleshooting during performance degradation, root cause analysis, or capacity planning. Mongostat is a nifty…

Read More >

Ingest a single table from Microsoft SQL Server Data into Hadoop

Introduction This blog describes the best-practice approach in regards to the data ingestion from SQL Server into Hadoop. The case scenario is described as under: Single table ingestion (no joins) No partitioning Complete data ingestion (trash old and replace new)…

Read More >
Page 5 of 10« First Page...34567...10...Last Page »