Though the nature of applications has changed greatly over the years, some of the basic concepts of working with data are foundational, and do not, or at least should not, change.
One of the precepts of Data Modeling is normalization; that is to ensure that you know what data is needed for an application, and how best to organize it.
Entity Relationship Diagrams seem to be somewhat out of vogue these days. For transactional systems however the requirements for well organized data do not change.
Let’s consider a simple database, one of phone numbers and the people that use them. This should be very simple, right?
Maybe something like this would suffice?
Simple to design, simple to implement.
Here’s the table
create table person_phones (
id number(12) not null,
last_name varchar2(30) not null,
first_name varchar2(30) not null,
work_phone varchar2(20),
mobile_phone varchar2(20),
home_phone varchar2(20)
)
/
Do you seen any problems with this implementation?
Let’s consider a few problems caused by this simple implementation.
Adding a new phone type requires adding a column, which in turn requires an outage.
Later adding country code requires adding 1 new column for each phone. Another outage.
This use of multiple columns to represent the same type of data is denormalized, which greatly increase application code complexity.
– changes to phone numbers after initial data creation are always an update; inserting, deleting and modifying a phone number are all updates
– the code must hardcode which column to update
Following some of the principals of normalization, this would be a much more effective schema.
Here’s one possible implementation. This is tables only for brevity.
create table phone_types (
phone_type_id number(2) not null,
type_name varchar2(10) not null
)
/
create table phones (
phone_number varchar2(20) not null
)
/
create table people (
person_id number (12) not null,
last_name varchar2(30)
first_name varchar2(30)
)
/
create table people_phones (
person_id number(12) not null,
phone_number varchar2(20) not null,
phone_type_id number(2) not null
)
Is more effort required to design and build it? Yes, more effort is required.
Nothing however comes for free. Flexibility in the design is more work upfront, but the rewards are great.
Let’s consider the same changes we looked at for the denormalized design.
Adding a new phone type is now a simple matter of adding a row to the table phone_types.
Adding a country code still requires adding a column, this time to the table phones. The difference here is that only one column is needed, and it only needs to be done once. Adding new phone types in the future is simply inserting a new row in a table.
We may even improve the design with a country-code table. Even then there is only a single new column added to the phones table.
Changes to phone numbers are now simple DML operations; delete, insert and update. No hardcoding is required in the application to differentiate betweeen the different operations.
Some of you may be wondering why there is a table for phone numbers that has only a single column.
There are a few reasons why this can be useful:
– Phone numbers should always be unique
– Phone numbers may be shared with multiple people
Another reason has to do with a slight performance advantage. With some data, such as that found in the PEOPLE table, names may change, and so are not good candidates for a primary key.
The ID column guarantees that all rows referencing a row in PEOPLE will always have the correct row.
With the phone number however, we will never change the actual value in the PHONE_NUMBERS table; we will only insert or delete. The optimization comes when a Foreign Key is added to the PEOPLE_PHONES table. As the phone number is the key, there is no need to actually read the PHONE_NUMBERS table.
That may change with inclusion of the country code. There are however multiple options for doing that, so that change may or may not require a PHONES.ID column.
Certainly this is fodder for further discussion, but not really necessary for this demonstration.
So, why normalize?
– application code is simplified
– future business changes become much simpler to implement
So there you have it; some things do remain the same, regardless of the enabling technologies.
3 Comments. Leave new
Thanks Jared!
Very well explained Normalization!!!
In addition to the foreign key constraint: the thing I often miss when using “domain tables” (in this case the phone_type table) is a unique constraint on the column ‘type_name’. This prevents duplicate phone types / confusion / future trouble
I’d added that this approach also allows to easily have different SCD types (https://en.wikipedia.org/wiki/Slowly_changing_dimension) for different entities/relations.
Btw, we can use denormalization sometimes to improve performance (reduce lookups or avoid joins). Imho, one of the easiest examples is bitmap join indexes