A great feature of Python is its simplicity. This simplicity allows developers to start coding immediately with very little formality. Python is also very good at doing ‘quick and dirty’ tasks. The result of this ability to get up and running with Python so quickly leads to many programmers using Python without necessitating a deeper understanding of the language. In many cases this is fine, but a deeper dive has some nice rewards.
Named Tuples are a nice lightweight construct that makes things easier when dealing with data that might otherwise be stored in a regular tuple or dict.
In a recent project we encountered named tuples when returning table schema from Hive Thrift API:
# Define the fieldschema tuple
FieldSchema = collections.namedtuple('FieldSchema', ['comment','type', 'name'])
# Create some fields for a table schema
customer_table= customer_name = FieldSchema('customer name', 'varchar(100)','name')
customer_addr = FieldSchema('address', 'varchar(100)','addr1')
customer_dob = FieldSchema('birthdate ', 'int','dob')
last_updated = FieldSchema('date last updated', 'datetime','last_updated')
# create two customer tables that have slightly different schemas
customer_table_a=[customer_name, customer_addr, customer_dob] customer_table_b = [customer_name, customer_addr, customer_dob, last_updated]
The Python Data Model
In Python, everything is an object and each object has an identity, a type and a value. Built in methods like id(obj) returns the object’s identity and type(obj) returns the object’s type. The identity of an object can never change. Objects can be compared with the “is” keyword. Example: X is Y.
Special or “dunder” methods.
Special or ‘Magic’ method names are always written with leading and trailing double underscores (i.e., __getitem__). So when you use the syntax myobj[key] it is really calling the __getitem__ special method behind the scenes. So the interpreter calls obj.__getitem(key) in order to evaluate myobj[key]. Because of the leading and trailing double underscores, these methods are sometimes referred to as ‘dunder’ (double-underscore) methods. These ‘dunder’ methods are invoked by special syntax. For example using the index  on a collections object invokes the __getitem__(key) special method. Example:
my_list = ['a','b','c']
Generally, these ‘dunder’ methods are meant to be used by the interpreter, and not by you. But the advantage of knowing something about special methods is that it makes class design more fluid. Here’s an example where we override the __sub__ method (subtract) so we can compare dataframes by subtracting one from another:
import pandas as pd
self.df = pd.DataFrame(tabledef)
def __sub__(self, other):
oj = other.df.merge(self.df, on=”name”, how=”outer”)
df_diff = oj[(pd.isnull(oj[“type_x”])) | (pd.isnull(oj[“type_y”]))] return df_diff
# Create Hive Tables
t = HiveTable([customer_name, customer_addr, customer_dob])
t2 = HiveTable([customer_name, customer_addr, customer_dob, last_updated])
# Compare them by ‘subracting’ them using the “-” operator.
date customer last updated
While Python is easy to get started with, it is also ‘deep all the way down’, continues to be interesting and reward with a deeper understanding. By implementing special methods in your own classes, your objects can behave more like the built in ones, allowing for ‘pythonic’ coding for your own classes.
Fluent Python by Luciano Ramalho
Interested in working with David? Schedule a tech call.