ABOUT THE SETUP
I operate a busy OpenStack environment that is used by lots of people making changes everyday. A typical day will see over 100 images spawned and terminates, but some of these last longer than a day. Because OpenStack maintains a stateDB mapping the OpenStack layers to the KVM/Libvirt layers it’s critical to have not only backups, but mysql replication in place. A failure 18 hours after my last backup could leave me scrambling through over 100 instances looking and performing some horrible manual fixes.
With that in mind I’m building a second controller for OpenStack which will keep a replicated copy of the database. The plan is to add an HAProxy in to the mix to manage all connections from the Compute Nodes to the Controllers as well as incoming https connections from my users into the Controllers.
In the end, that’s where I’m going. This article is just about the first part. MySQL Replication. BTW: This is Ubuntu 12.04
Like all of the documentation from MySQL it’s written at a pretty high-level and is 100% factually correct while also being slightly unclear and missing steps. What is wrong with examples? Why not add some?
Setup Simple Replication
In my case, ctrl01 is the MASTER and ctrl02 is the SLAVE or standby server. The process is essentially some basic config file setup followed by locking out all writes to the MASTER while you copy the DB over the the SLAVE, then starting the SLAVE and unlocking the MASTER. NB: unique numeric server-ids:
[mysqld] # # * Master Replication Config # log-bin=mysql-bin server-id=1 # Set to 0 to refuse slaves innodb_flush_log_at_trx_commit=1 sync_binlog=1
On the SLAVE setup like below. A lot of instructions say you only need the server-id which is true. however, in the event of a catastrophic fail all you need to do is promote the SLAVE if you maintain the additional harmless settings in your secondary. Your recovery process would be promote the SLAVE to MASTER and rebuild the failed MASTER as the SLAVE.
[mysqld] # # * Slave Replication Config # log-bin=mysql-bin server-id=2 # Set to 0 refuse MASTER data innodb_flush_log_at_trx_commit=1 sync_binlog=1
Build out the replicant user and the sqldumper user. The sqldumper is a temporary user that I’m creating to clone the data from the MASTER to the SLAVE. It’s only used for the initial copy and can be removed later:
# Setup a user for the remote systems to be authorized through. CREATE USER 'replication'@'<>' IDENTIFIED BY 'password'; GRANT REPLICATION SLAVE ON *.* TO 'replication'@'ipaddress'; flush privileges;
# Setup a user for the remote systems to be authorized through. CREATE USER 'sqldumper'@'<>' IDENTIFIED BY 'password'; GRANT ALL PRIVILEGES ON *.* TO 'sqldumper'@'ipaddress'; flush privileges;
I’m using IP address in my GRANT statement to limit the users and locations that have access to the stateDB.
Next we start 4 different connection, two per server. On the first MASTER connection we grab some data that we need in subsequent steps and populate a little cut and paste down below. We are getting pretty close to a place where you’ll be taking the MySQL server to read-only mode so this will be your last time to prep your scripts and minimize the outage.
# On the first connection let’s grab our log file; mysql> SHOW MASTER STATUS; +------------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +------------------+----------+--------------+------------------+ | mysql-bin.000001 | 6147196 | | | +------------------+----------+--------------+------------------
Take this info and let’s populate a few steps down where it’s bolded in RED
OUTAGE NOTICE – This step is going to lock the DB so that keystone cannot create tokens. On second connection we lock the DB and leave this connection persistent. Exiting will unlock the DB, so don’t do that. We cannot make a clone of a system that is changing.
# On the MASTER/second Connection. Purging isn’t necessary FLUSH TABLES WITH READ LOCK; PURGE MASTER LOGS TO 'mysql-bin.000001';
Now that the DB is quiesced we can make a copy and populate the new MySQL DB. Note that I’m doing this on the remote system which combines my dump process with the data transmission:
# On the SLAVE mysqldump --host=ctrl01.myopenstack.com -usqldumper -ppassword --all-databases--events --master-data > ctrl01.db
In my case this is a relatively fast process. If your OpenStack DB is larger than 1GB I’d be surprised. So plan on a couple of minutes. When done import the data to the SLAVE:
mysql -uroot -p < ctrl01.db
Now, let’s roll fast. The clock is ticking:
# Restart mysql as NOT a SLAVE service mysql stop /usr/sbin/mysqld --skip-slave-start --log-warnings & # Get in there and make sure it’s not a slave mysql -uroot -p stop slave; # Point SLAVE to MASTER CHANGE MASTER TO MASTER_HOST='ctrl01.mystack.com', MASTER_USER='replication', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=4661249; # Start the SLAVE START SLAVE; # Grab some status show slave status \G; show master status;
Looking good? Maybe check the log files for errors, remembering that we started with –log-warnings. If it’s good let’s restart mysql and release the MASTER so that it can take new table writes:
# Make sure it’s really dead. A failure here is going to get messy. /etc/init.d/mysql stop ps -ef |grep mysql service mysql restart # On the Master where you performed the READ LOCK: UNLOCK TABLES;
Now everything should be working which means the OUTAGE is over and you can make changes in OpenStack. So spin up a new instance with a unique name like ‘replicationtest’. You expect this data to be written to the first MASTER DB and replicated across to the SLAVE. Check both DBs with for that new instance. I haven’t explored what happens when the systems get out of sync. I’d like to hear from you is you have notes about syncing, failover, HAProxy with OpenStack or just want to school me in how to do things better.
select * from nova.instances where hostname LIKE '%replicationtest%' \G;