Backup and data streaming with xbstream, tar, socat, and netcat

Posted in: MySQL, Open Source, Technical Track

On April 4th 2012 Xtrabackup 2.0 was released in to GA by Percona along with a new streaming feature called xbstream. This new tool allowed for compression and parallelism of streaming backups when running xtrabackup or innobackupex without having to stream using tar, then pipe to gzip or pigz, then pipe to netcat or socat to stream your backup to the recipient server. This resulted in simplifying the command structure a great deal and fast became the preferred way of streaming backups from a origin server to its destination.

In recent months we’ve had discussions internally as to whether xbstream would be a better way of streaming large amounts of data between servers for use cases outside of xtrabackup. And which is better, socat or netcat? So I decided to put this to the test.

In order to test this I created two m5.xlarge EC2 instances as this provided an “up to 10 gigabit” level of network performance. I also put both instances in the same availability zone in order to reduce the chance of poor networking skewing my results. Once this was done I installed Percona XtraDB Server 5.6, Xtrabackup 2.4.9, and created a simple database with a data set size of 90Gb.

For my first test I started by using a streaming backup of the entire data set using both the xbstream and tar streaming methods. Compression was not used so to evaluate the streaming methods equally. Both socat and netcat were evaluated.

XBSTREAM / NETCAT TESTS

[root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=1 ./ | nc 172.31.55.250 10001
171228 15:11:13 innobackupex: Starting the backup operation
.....
xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
171228 15:25:22 completed OK!
real 14m9.385s
user 3m27.392s
sys 3m34.420s

[root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=2 ./ | nc 172.31.55.250 10001
171228 15:38:50 innobackupex: Starting the backup operation
.....
xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
171228 15:50:42 completed OK!
real 11m51.915s
user 3m31.808s
sys 3m34.740s

[root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=4 ./ | nc 172.31.55.250 10001
171228 15:38:50 innobackupex: Starting the backup operation
.....
xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
171228 16:07:28 completed OK!
real 11m51.923s
user 3m27.836s
sys 3m30.088s

XBSTREAM / SOCAT TESTS

[root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=1 ./ | socat -u stdio TCP:172.31.55.250:10001
171228 16:13:51 innobackupex: Starting the backup operation
.......
xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
171228 16:26:55 completed OK!
real 13m3.911s
user 3m8.208s
sys 2m35.160s

[root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=2 ./ | socat -u stdio TCP:172.31.55.250:10001
171228 16:28:16 innobackupex: Starting the backup operation
.....
xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
171228 16:40:08 completed OK!
real 11m51.984s
user 3m8.148s
sys 2m28.860s

[root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=4 ./ | socat -u stdio TCP:172.31.55.250:10001
171228 16:44:54 innobackupex: Starting the backup operation
.......
xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
171228 16:56:46 completed OK!
real 11m51.916s
user 3m7.460s
sys 2m24.968s

TAR / NETCAT TEST

[root@ip-172-31-54-219 ~]# time innobackupex --stream=tar --parallel=1 ./ | nc 172.31.55.250 10001
171228 17:02:26 innobackupex: Starting the backup operation
.......
xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
171228 17:16:09 completed OK!
real 13m42.910s
user 3m19.696s
sys 3m47.672s

TAR / SOCAT TEST

[root@ip-172-31-54-219 ~]# time innobackupex --stream=tar --parallel=1 ./ | socat -u stdio TCP:172.31.55.250:10001
171228 17:19:59 innobackupex: Starting the backup operation
......
xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
171228 17:33:03 completed OK!
real 13m3.940s
user 2m59.468s
sys 2m29.388s

Here is a summary of the output noted above, in seconds.

 

You’ll notice that the xbstream method outperformed the tar method once we started introducing parallel threads. You may also note that performance gains ended after 2 threads were in use and this is likely due to the fact we may have hit a networking bottleneck. Another interesting thing to note is that with a single thread, socat outperformed netcat, but when it came to using multiple threads, they were about equal.

So what does this mean for moving data outside of xtrabackup / innobackupex? For my next test I decided to focus on just the large data files that I created in the test schema directory, the main reason being that xbstream can handle files, but not directories and cannot act recursively. First I used xbstream and then tried again using tar. Again, compression was not used so we could look at just the streaming method. Both netcat and socat were evaluated

XBSTREAM / NETCAT TESTS

[root@ip-172-31-54-219 streamtest]# time xbstream -c -p 1 ./t* | nc 172.31.55.250 10001
real 12m25.439s
user 0m20.928s
sys 3m43.492s

[root@ip-172-31-54-219 streamtest]# time xbstream -c -p 2 ./t* | nc 172.31.55.250 10001
real 12m28.086s
user 0m22.996s
sys 3m50.972s

[root@ip-172-31-54-219 streamtest]# time xbstream -c -p 4 ./t* | nc 172.31.55.250 10001
real 13m15.775s
user 0m21.460s
sys 3m50.336s

XBSTREAM / SOCAT TESTS

[root@ip-172-31-54-219 streamtest]# time xbstream -c -p 1 ./t* | socat -u stdio TCP:172.31.55.250:10001
real 11m47.781s
user 0m17.132s
sys 2m38.168s

[root@ip-172-31-54-219 streamtest]# time xbstream -c -p 2 ./t* | socat -u stdio TCP:172.31.55.250:10001
real 11m47.707s
user 0m15.816s
sys 2m22.884s

[root@ip-172-31-54-219 streamtest]# time xbstream -c -p 4 ./t* | socat -u stdio TCP:172.31.55.250:10001
real 11m47.805s
user 0m16.796s
sys 2m36.588s

TAR / NETCAT TEST

[root@ip-172-31-54-219 streamtest]# time tar -cf - ./t* | nc 172.31.55.250 10001
real 11m47.942s
user 0m5.260s
sys 2m32.048s

TAR / SOCAT TEST

[root@ip-172-31-54-219 streamtest]# time tar -cf - ./t* | socat -u stdio TCP:172.31.55.250:10001
real 11m47.914s
user 0m4.860s
sys 1m37.632s

Here is a summary of the output noted above, in seconds.

In this test we can see that almost all the methods worked equally well, with the only less efficient process being xbstream / netcat combination. Keep in mind that the changing of parallel threads with the xbstream -p option didn’t really seem to have an effect because xbstream will not leverage parallel threads on its own. It needs to be working with another tool like xtrabackup that will be able to take advantage of the parallelism.

CONCLUSION

When working with xtrabackup / innobackupex, it looks like xbstream and socat is the way to go. If you’re steaming backups and are not taking advantage of multiple threads, you should consider it.

For large data copies from one server to another. It looks like you’re safe using xbstream or tar, so long as the combination of xbsteam and netcat is avoided. Considering that xbstream will not work with directories or act recursively natively, it may just be easier to stick with tar.

email

Interested in working with Peter? Schedule a tech call.

About the Author

Internal Principal Consultant
Peter Sylvester is one of the Internal Principal Consultants in the Open Source Database Consulting Group at Pythian. He has been with Pythian since January of 2015 and has been working with MySQL since 2008. Apart from work, Peter is an avid ice hockey player to stay in keeping with the stereotypical Canadian lifestyle, playing typically no less than twice a week!

1 Comment. Leave new

Hey , You really did a very good analysis. I really appreciate the info you provided. I am a newbie I have setup Galera Cluster MairaDB 10.2, I would like to use mariabackup but haven’t used it since I dont know how to use it. Could u please give the commands to stream backup from Server to a Backup machine using socat and stream it using parallel=2. Could u post the command with a bit of how its working; Basically I think it will like opening a port from Server and stream file thru it to over to remote backup server, I am not sure of how its done. Could u show me . Any help will be much appreciated.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *