Tag: python

Map Upstream Dependencies using N-grams in Python

l.e. Which option would you prefer to receive when asked to make many changes throughout a complex system you’re unfamiliar with? “fix this” (many times), or “fix this, which might be better known as a name, most likely by making…

Read More >

Generating Documentation for Your Python Code Using Cloud Build and Sphinx

documentation python code

Documentation. All developers want it, but no one wants to build or maintain it. Today, we’ll be looking at automating the documentation process using Sphinx, Cloud Build and Google Cloud Storage.     Step 1: Configuring Your Project   The…

Read More >

Python: Using Dataclasses to Model Your Data

3d chart data model tablet

Here at Pythian, we love our data. Our code is no exception (pun sort of intended), so I’ll be covering dataclasses in Python today. The problem As a Python developer, you’ve almost certainly run into code that looks like the…

Read More >

How to Implement Airflow Best Practices From a Data Scientist’s Perspective

Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on August 8, 2019. This blog post is a compilation…

Read More >

Docker Orientation

This weekend, I gave an orientation to Docker for a developer friend of mine who works in an enterprise environment and was preparing to take on new development projects using Docker. I have given several Docker 101 workshops, but it’s…

Read More >

How to creates Kubernetes jobs with Python

In this blog post I will do a quick guide, with some code examples, on how to deploy a Kubernetes Job programmatically, using Python as the language of choice. For this I’m using GKE (Google Kubernetes Engine), logging via StackTrace…

Read More >

Creating dynamic tasks using Apache Airflow

What’s Airflow? Apache Airflow is an open source scheduler built on Python. It uses a topological sorting mechanism, called a DAG (Directed Acyclic Graph) to generate dynamic tasks for execution according to dependency, schedule, dependency task completion, data partition and/or…

Read More >

Why the Python data model could be for you

David Salmela explains what is so great about the Python data model being so simple. Learn how Python will allow you to hit the ground running with its ease-of-use framework.

Read More >

Comparing schemas between hive clusters

  When running several different hive instances, we found the process of maintaining/confirming synchronization to be quite time consuming. While several tools made available by Cloudera and other frameworks do provide a visual interface for an individual instance of hive, there was…

Read More >

Expanding the Couchbase Collector for Diamond

The code For the impatient ones, the couchbase collector can be found in github: Couchbase Collector Follow the instructions in the README file to get it installed under your diamond! Intro If you have been involved with metric collections at…

Read More >