Database migration using Apache NiFi

Posted in: Technical Track

Introduction

Apache NiFi is a powerful tool for data migration. It provides an easy to use interface to connect to a database server and perform data ingestion and data extraction. This blog post explains a sample NiFi flow to migrate database tables from one database server (source database) to another database server (destination database) using Apache NiFi.

Database Connections

To start with, create two DBCPConnectionPool controller services, one for the source database and one for the destination database. Add database properties: ‘Database Connection URL’, ‘Database Driver Class Name’, ‘Database User’, and ‘Password’. Make sure that database driver jars are included in ‘nifi_home/lib/’. Alternatively, the location of database driver jars can also be added to the ‘Database Driver Location(s)’ properties.

NiFi Flow

NiFi flow for database migration has four phases:

 

 

Get Table List

  • ListDatabaseTables: There may be a need to migrate all tables or some of the tables only. So, the initial task is to list the tables to migrate. Use ListDatabaseTables processor to fetch the list of tables to be migrated. Edit the properties of the processor to connect to source database and filter out the list of tables. Properties: Catalog, Schema Pattern, Table Name Pattern and Table Types to filter out the list of tables. The processor outputs a single flowfile for each table in the list, and to each flowfile, adds the following properties: db.table.name, db.table.catalog, db.table.schema, db.table.name. This processor adds the details of source database and table associated with the flowfile.
  • UpdateAttribute: The ListDatabaseTables processor adds the table properties for the source database system. To add the destination database properties, use UpdateAttrbiute processor. Add following properties: ‘destination_table_catalog’ and ‘destination_table_schema’. Table name will be same as in the source database (db.table.name).

 Create Table

  • ExecuteSQL: After each flowfile has the information of the source database, the destination database, and the table to migrate, the next step is to create the table in the destination database. Use the ExecuteSQL processor to fetch the table schema of the table from the source database. The SQL statement to fetch the table schema may vary as per the type of the source database. The ExecuteSQL outputs the Avro flowfile which contains table schema.
  • ConvertAvroToJson: The Avro format flowfile is converted to Json format using the ConvertAvroToJson processor.
  • ExecuteScript: Use ExecuteScript processor to generate Data Definition Language (DDL) statement to create table for the destination database. Through the ExecuteScript processor, scripting language such as python, groovy, ruby etc. can be used to read the Json format schema of the table and create the DDL statement.
  • PutSQL: To execute the DDL statement to create table in the destination database, use PutSQL processor.

Extract Data

  • ExecuteSQL: Use ExecuteSQL processor to fetch table data from the source database. Add the SQL select statement ‘select * from ${db.table.catalog}.${db.table.schema}.${db.table.name};’.  This processor outputs the Avro format flowfile with all the data of the source table. Alternatively, GenerateTableFetch and QueryDatabase processors can be used to extract data, which will be discussed in the next blog post.

Ingest Data

  • PutDatabaseRecord: The PutDatabaseRecord processor reads the Avro data, creates a batch upload and uploads the data to the table created in the destination table. Data ingestion can also be performed using ConvertAvroToJson and PutSQL processor. However, PutDatabaseRecord is better and faster way to upload data. PutSQL processor can possibly be used if the extracted data requires some transformations.

This blog post discussed elementary NiFi flow to migrate tables across the database. The upcoming blog post will cover details on handling large database tables, and other issues such as datatype inconsistencies etc.

Hope this post helped you understand the NiFi flow and processors used to migrate database tables. If you have any questions or comments, do post them in the comments section.

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

19 Comments. Leave new

This blog gives good information and elementary NiFi flow to migrate tables across the database.Good post.

Reply

This is a great use case. NiFi is an obvious ETL replacement tool for many people with experience of graphical ETL tools e.g. SSIS, Talend, DataStage ….
Some more in-depth details of how processors are configured would be very valuable.

Thanks for posting

Reply

hi Gurinderbeer ,
excellent blog!! would appreciate if you put more light on why to use a particular processor and alternate ways of processing data. i think i am asking much but if its supplied then its super for any tyro learner.

Reply

Hi Gurinderbeer,

Thanks for the great post. By the way, do you have the template file for this? I want to do baby steps in analyzing this simple flow including all of the configuration parameters.

Thanks much.

Reply

Hi Gurinderbeer,

Thanks for the post. As am new to Nifi, please share the template as it will help us understand more and have a hands-on easily.

Thanks.

Reply

Great post! Do you have sample template for this?

Reply

Hi Hemant,
Do you have smaple template for migration.
Ajay

Reply

Very nicely explained!! Could you give some more information on different ways to migrate the data from structured to non-structured form?

Reply

Really cool!!!! thanks
can you please share template will help us lot

Reply

Still waiting , if you can share template will help a lots

Reply

if any has template please share
Ajay

Reply

Hi, i am new to nifi.
Could you please suggest how to get the value(row, column) of a table created in Mysql database and insert those records into postgresql using nifi processor. In postgresql i have created the table with same field as in mysql.

Reply

Hi
Only one processor is enough to generate various tables for data migration —- ExecuteSQL,”create table ${target.catlog}.${a.table} like ${db.table.fullname}” .Is it necessary to fetch per the table schema from the source database????

Reply

how to get the column of a table

Reply

no templates?

Reply

Do you have any intention of providing the template ?

Reply
Naraindra kumar Palani
April 10, 2021 7:31 am

after the first time, can we get the incremental updates?

Reply

Very well explained!

Reply
Dipankar Sarkar, Mehedi Abir
March 29, 2022 7:29 am

You are a genius man. After 2 days I’m able to execute the process . Ummahhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *