Some time ago I published a post regarding the Database Migration Service (DMS) on AWS. The service had just appeared at that time and looked very promising. It is still appealing and has a good use and potential to mature and be better. I was able to test and try it in several cases moving data to AWS and between RDS databases inside AWS. During those exercises, I ran into some issues and wanted to make other people aware of some things to keep in mind when starting or planning to use the DMS. Most of my experience with DMS is related to migrations to and from Oracle databases. So the following tips are about Oracle migrations and replications.
Before planning any kind of a logical replication based on transaction logs please check what kind of data types you have and whether you have primary keys on all your replicated tables. The primary key existence in some cases is not only desirable but is required to properly replicate the objects. As an example, if you want to replicate a table with some lob objects using DMS you must have a primary key for the table because if you don’t your lob columns will be excluded from replication and you end up with null values instead of the lobs on the target database.
Check for data types used for the tables. All logical replications have some limitations in terms of replicated data types. You may find some of your tables can be replicated with some limitations or not replicated at all.
Let’s speak about DMS. When you plan your DMS replication tasks keep in mind that you can combine replication of several schemas to one task. It may significantly reduce load on the source system. Each migration or replication task may apply a considerable load to the source system. In my experience, we hit 100% CPU and max IO load running 8 replication tasks on the source m3.large system.
Remember you cannot change the created task. You are able to stop, start or resume the task but you cannot alter any rules or parameters for the task. It will maybe change soon but currently, it is not possible yet.
If you choose the truncate option for your migration and replication task it may change your metadata. You may find your indexes and constraints to be renamed and you even can lose some of the indexes. In my case, I got renamed primary key and unique key constraints and couple of indexes were lost. Please be careful. After that case, I tried to choose “do nothing” mode and do everything by myself preparing the target for data migration and replication.
You can use RDS snapshot to instantiate your data if you plan to setup a replication between two RDS instances of the same type. In our case, it was done to minimize downtime when the database was migrated to an encrypted storage. When you use RDS snapshot you can use snapshot creation time as “Custom CDC start time” for your DMS replication task.
If you use one universal user for your endpoints to replicate multiple schemas you will need to use transformation rules because the replication will try to use schema from endpoint as destination target by default. By other words, you set up a target endpoint using user “DMS” and try to replicate schema SCOTT it will use schema “DMS” as a destination by default if you don’t have the transformation rules.
You have to enable minimal supplemental logging on database level and supplemental logging for all columns explicitly for each replicated table. Even you enable supplemental logging for all columns on database level using “alter database” you still need to add it on table level. The DMS task will be aborted without it complaining about the lack of supplemental logging for all column for the table.
If you create a table in one of your replicated schema you need to add a clause for supplemental logging like “CREATE TABLE …ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS; ” . In that case, the table and data will be replicated.
It is better to use AWS cli to get most of the service since all new features and additions will be in cli first. As an example, if you want a debug logging for your replication you have to use AWS cli.
If you have only unique index on a table and it is function based index your data migration and replication task can fail. Also, it may fail if you have more than one unique index on a table.
I hit some issues with monitoring. The idea looks good but it requires some fixing. It looks like it doesn’t work correctly in Firefox and Safari. At least for me, it was not working right.
The status of a task may not tell you everything. Sometimes it shows state “ERROR” but nevertheless, it works and replicates data behind the scenes. So, it can be bit misleading. I look to statistics and monitoring pages for the task to get the full picture.
As a summary, I can say the service deserves attention and can be considered as a valuable option when you plan your migration strategy and AWS DMS team works hard to make it better. Happy migrations!
17 Comments. Leave new
Thanks Gleb for sharing your experience about AWS DMS. The article is very useful and well articulated in terms of pros and cons of this utility.
You very welcome Krishna. Glad you’ve found it helpful.
Cheers,
Gleb
Thanks Gleb, I hope someone at AWS is reading this.
I had a good email thread and phone conversation with one guy from the AWS DMS team about the service and some issues. And I sent them a short summary. They are actively working implementing new features and fixing some issues.
HI Gleb,
Thanks for nice insights on DMS. I am bit confused on following around supplemental logging:
You have to enable minimal supplemental logging on database level and supplemental logging for all columns explicitly for each replicated table. Even you enable supplemental logging for all columns on database level using “alter database” you still need to add it on table level. The DMS task will be aborted without it complaining about the lack of supplemental logging for all column for the table.
IF I enable minimal supplemental logging at database level and supplemental logging for all primary key columns in Database, will i still need to enable for all the tables for all columns.
alter database add supplemental log data;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
I am assuming all my tables have primary keys (in addition to unique keys) and no FK’s. AWS doc says to enable table level supplemental logging on primary key columns if table has PK.
Hi Raj,
As you can see the blog post has been created more than one year ago and DMS is getting better implementing new features pretty fast. It was my experience in the past when the engine required the explicit supplemental logging for each table. With the current replication engines (either with 1.9.0 or 2.4.0) you can define it on database level for primary keys and it will work just fine.
Maybe I need to write a new blog or add an update to this one to reflect the changes.
Cheers,
Gleb
Thanks Gleb.
I wanted to check on Oracle Binary Binary Reader option of DMS for CDC of LOB’s. It says, it needs ASM USER and Oracle USER credential in endpoint if source db is using ASM. Does that mean ASMSNMP with sysasm role and Oracle OS user or an Oracle user with SYSDBA or SYSOPER privs. It’s very appalling that AWS DMS documentation just has a line for the same. https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.Oracle.html#CHAP_Source.Oracle.LogMiner.Privileges
Imagine, if you don’t own your on-prem db env, its very difficult to specify the exact requirement, which has to pass thru Security policy of On-Prem Sec team.
If you can clarify this will be great.
Hi Raj,
Of course the first question coming to my mind is why do you want to use binary reader instead of logminer, but I believe you have your reasons. For binary you have to have an ASM user with password . It is the same as for any replication tool not using Oracle logminer session to read the redo logs. By another words you have to be able to connect to ASM instance with username and password. It makes sense since you have to be able to parse the redo logs somehow and need to read the files. In case of logminer you just connect to Oracle database and an logminer session does it for you.
Cheers,
Gleb
HI Gleb,
Thanks for your valuable inputs. Its Hybris DB and its makes use of LOB data type extensively and keep updating them as often as it need. Most of them are CLOBS storing java messages. We must be able to capture the updates on LOB data, hence Binary reader option. I also came across GGate classic capture mode doc where it use ASM instance to read archivelogs directly (seems DMS doing the same). We are having hard time convincing On-Prem admins to allow ASM access.
Hi Raj,
Yes, it makes sense in that case. And speaking about lobs I would be careful and do proper testing before relying to the replicated data on the target. In some cases in the past I saw malformed clobs on targets. But I need to mention that it was about a year ago or so when I hit it last time.
Good luck,
Gleb
Hi Glep,
It’s much-needed inputs we required. I have to migrate RDS to On-Premises , is it possible to move through data pump or DMS is only option to do it
You can use data pump in AWS RDS. To achieve that you will need to use Oracle Data Pump and the DBMS_FILE_TRANSFER package.
Regards,
Gleb
Hey,
We are moving large amount data from Oracle database to AWS Aurora with AWS DMS utility and there will be ongoing replication between these databases with the same tool. For initial load AWS DMS default tool is not very efficient and will take days. Is there any other AWS utility we can use? I heard about AWS Snowball to move petabyte of data. I could not find any test case where AWS DMS and Snowball used for that purpose. Any tips?
Thanks
Hi,
Agreed, depending from the initial data volume and data types load by the DMS can take quite a long time.
Snowball can help in case of big dataset. You upload data to the snowball, transport, connect to AWS and import the data to the target.
The question is really how you plan to offload the data and import it to the Aurora. If you have simple data types you can use simple CSV files. But keep in mind you need a consistent export for a some predefined timestamp to be able to start your DMS replication exactly from that point of time.
So, it boils down to two tasks – export/import with data types conversion and consistency of the export.
Thanks,
Gleb
Hey Gleb,
Thanks for your reply. I have another question, about AWS DMS with BinaryReader. Our database is on RAC and we use ASM disgroup for FRA for onlinelogs and archivelogs. The tool wants “create any directory” privilege to create OS directories (accessed through oracle directories) on the DB server which is very difficult to approve from security point of view. They also promote the idea of creating the directory manually and give then READ/Write access to manipulate the logs. This does not work per their code logic. Have you any experience with AWS DMS utility with BinaryReader on Oracle RAC system as source?
Thanks in advance,
Quamrul Polash
Hi,
The DMS needs access to your logs if you are using binary mode. So, to access the logs it wants to create an oracle directory. It is going to create directory in Oracle with R access to copy the archivelogs from ASM using DBMS_FILE_TRANSFER. And it has to copy the logs somewhere to parse them. It is why it is asking about a temporary directory on FS (the logs will be deleted later using DBMS_FILE_GROUP package).
It is the reason for all those grants. It is, if you will, the price you pay to use binary mode for DMS. Maybe I need to write a blog about it comparing the two modes.
Cheers,
Gleb
“If you have only unique index on a table and it is function based index your data migration and replication task can fail. Also, it may fail if you have more than one unique index on a table.”
Is this for the source or target? If it is for the source, are there any workarounds?
I have a table with two unique indices, one of them is function based and CDC is failing with the message “SYS_NC000X” is not supported.