Replicating in Google Cloud SQL using Tungsten

Ivan Groenewold

December 12, 2017

While investigating alternatives to migrate to Google Cloud SQL, I encountered a lack of support for external masters. However, it's possible to overcome this limitation by replicating into Google Cloud SQL using Tungsten replicator. Cloud SQL is Google's database-as-a-service solution, similar to RDS for Amazon Web Services. You can get a fully managed database in only a few clicks (or API calls). At the time of writing this, the only supported databases are MySQL and Postgres.

Cloud SQL alternatives

Google offers two different options for MySQL deployments. 1st generation instances:

Only MySQL versions 5.5 and 5.6 can provisioned
Max memory is limited to 16 Gb
Max of 250 Gb storage (up to 500 Gb with Silver or higher support package)
MyISAM and InnoDB
Asynchronous replication for read replicas

2nd generation instances:

Only MySQL versions 5.6 and 5.7 can be deployed
Maximum memory is limited to 205 Gb
Maximum of 10 Tb storage
InnoDB storage engine only
Semi-sync replication only
GTID replication only
No support for external master

There are some limitations that are common to both flavors:

no SUPER privilege
no triggers
no performance schema
no replication between 1st and 2nd generation instances is possible

From the above, it is quite obvious most production deployments would want to use 2nd gen instances. The problem is there is no migration path that doesn't involve stopping application activity to be able to take a dump of the data, due to the fact that external masters are not supported on 2nd gen instances. So how do we migrate our database to Google Cloud SQL while keeping downtime as low as possible?

The solution

The answer is to use Tungsten Replicator, so that replication is completely external to the database. Note that since we only need the replicator, the FOSS version available on Github is enough for our purposes. There is no need to buy Tungsten commercial version, which includes the cluster functionality and official support. We will need to install two different Tungsten processes: One will attach to the source database to read transactions from the binary logs, while the second will apply those transactions to the Cloud SQL instance. This is what it looks like:

Preparing the environment

The first thing you will need is a place to install the replicator. I suggest to provision a dedicated instance (instance-1 in the diagram) in the same zone as your Cloud SQL instance. At minimum you would want an n1-standard-1 size, as Tungsten does consume its fair share of memory. Tungsten needs some packages (do check out the complete requirements list here) so let's go ahead and install them: apt-get install ruby default-jre Now we need a database user for Tungsten on the source and target databases: GRANT ALL ON *.* TO tungsten@'%' IDENTIFIED BY 'secret'; It is usually a good idea to have a dedicated OS user as well: useradd -m -d /opt/continuent tungsten Get the replicator package from GitHub and extract the contents as tungsten OS user:

 wget https://github.com/continuent/tungsten-replicator/releases/download/5.2.1/tungsten-replicator-5.2.1.tar.gz tar zxf tungsten-replicator-5.2.1.tar.gz

Tungsten installation

Prepare the Tungsten config files on instance-1 as follows. instance-1 is the server where Tungsten will run from, and instance-2 is the server that has the source database. Process that will read from MySQL vi /etc/tungsten/tungsten-mysqlreader.ini

[defaults]
 replication-user=tungsten
 replication-password=secret
 skip-validation-check=MySQLUnsupportedDataTypesCheck
 skip-validation-check=MySQLPermissionsCheck
 skip-validation-check=MySQLMyISAMCheck
 
 [mysqlreader]
 install-directory=/opt/continuent/mysqlreader
 master=instance-1
 members=instance-1
 datasource-host=instance-2
 datasource-user=tungsten
 datasource-password=secret

Process that will write to Cloud SQL vi /etc/tungsten/tungsten-writetocloudsql.ini

[defaults]
 replication-user=tungsten
 replication-password=secret
 
 [writetocloudsql]
 datasource-type=mysql
 install-directory=/opt/continuent/writetocloudsql
 master=instance-1
 members=instance-1
 topology=master-slave
 datasource-host=cloudsql_ip_address
 datasource-user=tungsten
 datasource-password=secret
 privileged-slave=false
 skip-validation-check=InstallerMasterSlaveCheck
 skip-validation-check=MySQLPermissionsCheck
 skip-validation-check=MySQLBinaryLogsEnabledCheck
 rmi-port=10002
 master-thl-port=2112
 master-thl-host=instance-1
 thl-port=2113

Note the use of privileged-slave=false and the various validation checks that need to be skipped for the applier process. That allows us to get past the SUPER requirement and the other Cloud SQL limitations. Since we are running both extractor and applier processes on the same instance, we need to manually specify the ports on the second process so that there are no conflicts. Now we are ready to install the replicators, by running the following as tungsten OS user: cd tungsten-replicator-5.2.1/tools tpm install

Replicating into Google Cloud SQL using Tungsten

At this point you would start the reader process using trepctl online command to start capturing events from the source instance, and use something like mysqldump to dump & load your dataset into a Google Cloud SQL instance. Make sure you have the binlog coordinates of that dump available. Once the Cloud SQL instance is properly seeded, we would start the applier process from the appropriate position e.g.

 /opt/continuent/writetocloudsql/tungsten/tungsten-replicator/bin/trepctl online -from-event 'mysql-bin.000011:0000000000002552;0'

Checking the status of each process:

 tungsten@instance-1:~$ /opt/continuent/mysqlreader/tungsten/tungsten-replicator/bin/trepctl status Processing status command... NAME VALUE ---- ----- appliedLastEventId : mysql-bin.000007:0000000000000520;112 appliedLastSeqno : 1 appliedLatency : 0.355 autoRecoveryEnabled : false autoRecoveryTotal : 0 channels : 1 clusterName : mysqlreader currentEventId : mysql-bin.000007:0000000000000520 currentTimeMillis : 1511354659857 dataServerHost : instance-2 extensions : host : instance-2 latestEpochNumber : 0 masterConnectUri : thl://localhost:/ masterListenUri : thl://instance-1:2112/ maximumStoredSeqNo : 1 minimumStoredSeqNo : 0 offlineRequests : NONE pendingError : NONE pendingErrorCode : NONE ndingErrorEventId : NONE pendingErrorSeqno : -1 pendingExceptionMessage: NONE pipelineSource : jdbc:mysql:thin://instance-2:3306/tungsten_mysqlreader?noPrepStmtCache=true relativeLatency : 3.857 resourcePrecedence : 99 rmiPort : 10000 role : master seqnoType : java.lang.Long serviceName : mysqlreader serviceType : local simpleServiceName : mysqlreader siteName : default sourceId : instance-2 state : ONLINE timeInStateSeconds : 72.806 timezone : GMT transitioningTo : uptimeSeconds : 74.46 useSSLConnection : false version : Tungsten Replicator 5.2.1 Finished status command...

 tungsten@instance-1:/etc/tungsten$ /opt/continuent/writetocloudsql/tungsten/tungsten-replicator/bin/trepctl status Processing status command... NAME VALUE ---- ----- appliedLastEventId : NONE appliedLastSeqno : -1 appliedLatency : -1.0 autoRecoveryEnabled : false autoRecoveryTotal : 0 channels : -1 clusterName : writetocloudsql currentEventId : NONE currentTimeMillis : 1510939133227 dataServerHost : cloudsql extensions : host : cloudsql latestEpochNumber : -1 masterConnectUri : thl://localhost:/ masterListenUri : thl://instance-1:2113/ maximumStoredSeqNo : -1 minimumStoredSeqNo : -1 offlineRequests : NONE pendingError : Replicator configuration failed pendingErrorCode : NONE pendingErrorEventId : NONE pendingErrorSeqno : -1 pendingExceptionMessage: Unable to translate property value: key=serverId value = 3555962359 pipelineSource : UNKNOWN relativeLatency : -1.0 resourcePrecedence : 99 rmiPort : 10002 role : master seqnoType : java.lang.Long serviceName : writetocloudsql serviceType : unknown simpleServiceName : writetocloudsql siteName : default sourceId : 35.184.133.21 state : OFFLINE:ERROR timeInStateSeconds : 431.65 timezone : GMT transitioningTo : uptimeSeconds : 433.596 useSSLConnection : false version : Tungsten Replicator 5.2.1 Finished status command...

I've discovered there is a bug with Tungsten 5.2, where high values of server-id parameter prevent replicator from working. CloudSQL sets very high server-ids by default, and this cannot be modified by a user. I have already reported this to Continuent so hopefully they will come up with a way to fix this soon. In the meantime the only way to get past this is to open a ticket with Google support, and have them modify the server-id on Cloud SQL instance for you.

Conclusion

Cloud SQL is a very interesting platform for those wanting a fully managed database solution. Until 2nd gen instances have the ability to replicate from an external master, replicating into Google Cloud SQL using Tungsten after the initial load is one valid alternative. By doing so, you can keep data in sync until you are ready to do the cutover to the new platform.

Insight and analysis of technology and business strategy