We have seen agile development become more popular in recent years thanks in part to the evolution of continuous integration environments like Ruby on Rails. These development frameworks leverage quick testing and the ability to easily deploy over clusters. As deployments happen more often, we look for ways to minimize the disruption to users. Cluster architecture already contains the components we need, with multiple servers for each role, allowing us to update a subset of the system while the rest serve the business.
We recently worked with the DevOps team at one of our clients to develop an architecture that allows them to run rolling changes through the cluster using the Ruby on Rails framework. There are two principal components of change: application code on the application servers and database structure on the database servers. The application servers can easily be targeted specifically through deploy.rb. The database side of things, however, is a bit more complicated.
In order to have zero downtime, half the application servers are taken offline and updated, then the pair of master/slave standby databases have the DDLs applied to them. Traffic is then switched to these servers.
HAProxy is put in between the application and the MySQL databaseservers to act as a router for database traffic (see https://palominodb.com/blog/2011/12/01/using-haproxy-mysql-failovers for specifics), providing the ability to flip the active and standby roles of the two server pairs (as well as providing a High Availability solution). Since there are slave relationships, we needed to be able to pause replication through the deployment application (Capistrano). We were able to accomplish this by adding a file, deployrake.util, to the Rake subsystem under lib/tasks:
namespace :deployutil do
desc ‘Checks the replication status of the primary database’
task :replication_test => :environment do
# find the state of replication
mysql_res = ActiveRecord::Base.connection.execute(“SHOW SLAVE STATUS”)
mysql_res.each_hash do |row|
if row[‘Slave_IO_Running’] == “Yes” and row[‘Slave_SQL_Running’]
== “Yes” and row[‘Seconds_Behind_Master’].to_s == “0”
puts “ReplicationGood”
elsif row[‘Seconds_Behind_Master’].blank?
puts “ReplicationBroken”
else
puts “ReplicationBehind_” + row[‘Seconds_Behind_Master’].to_s
end
end
mysql_res.free
end
task :start_slave => :environment do
ActiveRecord::Base.connection.execute(“START SLAVE”)
end
task :stop_slave => :environment do
ActiveRecord::Base.connection.execute(“STOP SLAVE”)
end
We can then create tasks in deploy.rb to call these
desc “shows the current passive slave replication status”
task :get_slave_replication_status, :roles => :cron do
dbrails_env = fetch(:rails_env) + ‘_passiveslave’
# find the state of replication
set :slave_replication_status, capture(“cd #{latest_release} ;
RAILS_ENV=#{dbrails_env} rake deployutil:replication_test”).chomp
end
desc “stop passive slave replication”
task :stop_passiveslave_repl, :roles => :cron do
dbrails_env = fetch(:rails_env) + ‘_passiveslave’
run “cd #{latest_release} ; RAILS_ENV=#{dbrails_env} rake
deployutil:stop_slave”
end
etc….
We also want to be able to limit changes to specific databases so that the changes won’t go into the bin logs and propagate when the slaves are turned back on. See https://palominodb.com/blog/2011/11/21/rails-and-database-session-variables for details on how to do this through an extension of ActiveRecord. A word of caution here: setting sql_log_bin=0 to skip logging these changes will invalidate using the binlogs for point in time recovery.
You will need a full backup after the change.
No comments