Upload
louis-liu
View
236
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
1
MHA Failover 过程解析
DBA Team
二零一三年三月
文档修订版历史
日期 版本 说明 作者 审阅
2013-03-27 邱伟胜
2
目录
目录
1.MHA 场景:.................................................................................................................32.MHA 切换过程.............................................................................................................3
2.1 Phase 1: Configuration Check Phase...................................................32.2 Phase 2: Dead Master Shutdown Phase.................................................32.3 Phase 3: Master Recovery Phase...........................................................32.4 Phase 4: Slaves Recovery Phase...........................................................92.5 Phase 5: New master cleanup phase...................................................12
3
1.MHA1.MHA1.MHA1.MHA场景:场景:场景:场景:
在下面的集群中,通过手工控制,模拟出 master 和各个 slave 不一致。如 master
上表 qwsh 有四条记录,而 10.0.0.75 上只有一条记录:
10.0.0.13 (current master)
+--10.0.0.74
+--10.0.0.11
+--10.0.0.75
Server Role Table Column Rows
10.0.0.13 Master Qwsh Aa int 1,2,3,4
10.0.0.11 Slave Qwsh Aa int 1,2,3
10.0.0.74 Slave(candidate master) Qwsh Aa int 1,2
10.0.0.75 slave Qwsh Aa int 1
2.MHA2.MHA2.MHA2.MHA切换过程切换过程切换过程切换过程
以下通过 manual failover 来详细解析一下过程:
2.12.12.12.1 PhasePhasePhasePhase 1:1:1:1: ConfigurationConfigurationConfigurationConfiguration CheckCheckCheckCheck Phase..Phase..Phase..Phase..
主要是检查各节点的状态:
一是 dead 与 alive;
二是 Primary candidate for the new Master 等
2.22.22.22.2 PhasePhasePhasePhase 2:2:2:2: DeadDeadDeadDead MasterMasterMasterMaster ShutdownShutdownShutdownShutdown Phase..Phase..Phase..Phase..
一是检查是否可以 ssh 到 Dead Master
二是对 Dead Master 做一些处理,如 Disable VIP,Shutdown 主机等
4
2.32.32.32.3 PhasePhasePhasePhase 3:3:3:3: MasterMasterMasterMaster RecoveryRecoveryRecoveryRecovery Phase..Phase..Phase..Phase..
2.3.12.3.12.3.12.3.1 Phase 3.1: Getting Latest Slaves Phase..
根据各 slave 的同步情况得到 Latest slaves(mysql-bin.000034:250773)和
Oldest slaves(mysql-bin.000034:250405)
2.3.22.3.22.3.22.3.2 PhasePhasePhasePhase 3.2:3.2:3.2:3.2: SavingSavingSavingSaving DeadDeadDeadDead Master'sMaster'sMaster'sMaster's BinlogBinlogBinlogBinlog Phase..Phase..Phase..Phase..
如果Dead Master仍是可以ssh到,获取lasted slave 与 master 之间的bin log
(start mysql-bin.000034:250773)
save_binary_logs --command=save --start_file=mysql-bin.000034
--start_pos=250773 --binlog_dir=/data/mysql/arch
--output_file=/var/tmp/saved_master_binlog_from_10.0.0.13_3306_201303
25143805.binlog --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.55
如下为对应的 bin log 的内容:
[root@db-13~]# mysqlbinlog
/var/tmp/saved_master_binlog_from_10.0.0.13_3306_20130325143805.binlo
g
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET
@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#130325 10:40:31 server id 1 end_log_pos 107 Start: binlog v 4, server
v 5.5.27-log created 130325 10:40:31 at startup
ROLLBACK/*!*/;
BINLOG '
H7lPUQ8BAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAfuU9REzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
#130325 14:18:47 server id 1 end_log_pos 250841 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364192327/*!*/;
SET @@session.pseudo_thread_id=21/*!*/;
5
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,
@@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1,
@@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET
@@session.character_set_client=33,@@session.collation_connection=33,@
@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 175
#130325 14:18:47 server id 1 end_log_pos 250930 Query
thread_id=21 exec_time=0 error_code=0
use test/*!*/;
SET TIMESTAMP=1364192327/*!*/;
insert into qwsh values(4)
/*!*/;
# at 264
#130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425
COMMIT/*!*/;
# at 291
#130325 14:19:42 server id 1 end_log_pos 250976 Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
2.3.32.3.32.3.32.3.3 PhasePhasePhasePhase 3.3:3.3:3.3:3.3: DeterminingDeterminingDeterminingDetermining NewNewNewNew MasterMasterMasterMaster Phase..Phase..Phase..Phase..
检查 latest slave 是否有所有的 relay log 用来修复其他的 slave(oldest pos:
mysql-bin.000034:250405)。然后根据候选规则,选出新的主库(会检查是否有
设置 candidate_master=1 和 no_master=1 等):
apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000034
--latest_rmlp=250773 --target_mlf=mysql-bin.000034
--target_rmlp=250405 --server_id=3 --workdir=/var/tmp
--timestamp=20130325143805 --manager_version=0.55
6
--relay_log_info=/data/mysql/data/relay-log.info
--relay_dir=/data/mysql/data/
2.3.42.3.42.3.42.3.4 PhasePhasePhasePhase 3.4:3.4:3.4:3.4: NewNewNewNew MasterMasterMasterMaster DiffDiffDiffDiff LogLogLogLog GenerationGenerationGenerationGeneration Phase..Phase..Phase..Phase..
候选 master 与 lasted slave 比较,是否要生产差异 log (10.0.0.74 received
relay logs up to: mysql-bin.000034:250589 , the latest slave(10.0.0.11)
up to: mysql-bin.000034:250773 )
apply_diff_relay_logs --command=generate_and_send --scp_user=root
--scp_host=10.0.0.74 --latest_mlf=mysql-bin.000034
--latest_rmlp=250773 --target_mlf=mysql-bin.000034
--target_rmlp=250589 --server_id=3
--diff_file_readtolatest=/var/tmp/relay_from_read_to_latest_10.0.0.74
_3306_20130325143805.binlog --workdir=/var/tmp
--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.55
--relay_log_info=/data/mysql/data/relay-log.info
--relay_dir=/data/mysql/data/
如下为对应的 bin log 的内容:
[root@db-11~]#mysqlbinlog
/var/tmp/relay_from_read_to_latest_10.0.0.74_3306_20130325143805.binl
og
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET
@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, server
v 5.5.27-log created 130325 11:03:52
BINLOG '
mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
#700101 8:00:00 server id 1 end_log_pos 0 Rotate to
mysql-bin.000034 pos: 107
# at 150
#130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, server
7
v 5.5.27-log created 130325 10:40:31
BINLOG '
H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 253
#130325 14:12:19 server id 1 end_log_pos 250657 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364191939/*!*/;
SET @@session.pseudo_thread_id=21/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,
@@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1,
@@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET
@@session.character_set_client=33,@@session.collation_connection=33,@
@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 321
#130325 14:12:19 server id 1 end_log_pos 250746 Query
thread_id=21 exec_time=0 error_code=0
use test/*!*/;
SET TIMESTAMP=1364191939/*!*/;
insert into qwsh values(3)
/*!*/;
# at 410
#130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424
COMMIT/*!*/;
# at 437
#130325 14:12:36 server id 3 end_log_pos 250938 Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
8
2.3.52.3.52.3.52.3.5 PhasePhasePhasePhase 3.5:3.5:3.5:3.5: MasterMasterMasterMaster LogLogLogLog ApplyApplyApplyApply Phase..Phase..Phase..Phase..
一是 Waiting until all relay logs are applied。
二是合并 lasted slave 和 dead master 的日志,因为有些日志的 events 可能
不完整,合并过程中要检查:All apply target binary logs are concatinated
at /var/tmp/total_binlog_for_10.0.0.74_3306.20130325143805.binlog .
以下是对应的 log 内容:
[mysql@db-74 ~]$ mysqlbinlog
/var/tmp/total_binlog_for_10.0.0.74_3306.20130325143805.binlog
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET
@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, server
v 5.5.27-log created 130325 11:03:52
BINLOG '
mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
#700101 8:00:00 server id 1 end_log_pos 0 Rotate to
mysql-bin.000034 pos: 107
# at 150
#130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, server
v 5.5.27-log created 130325 10:40:31
BINLOG '
H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 253
#130325 14:12:19 server id 1 end_log_pos 250657 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364191939/*!*/;
SET @@session.pseudo_thread_id=21/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,
@@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
9
SET @@session.auto_increment_increment=1,
@@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET
@@session.character_set_client=33,@@session.collation_connection=33,@
@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 321
#130325 14:12:19 server id 1 end_log_pos 250746 Query
thread_id=21 exec_time=0 error_code=0
use test/*!*/;
SET TIMESTAMP=1364191939/*!*/;
insert into qwsh values(3)
/*!*/;
# at 410
#130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424
COMMIT/*!*/;
# at 437
#130325 14:12:36 server id 3 end_log_pos 250938 Stop
# at 456
#130325 14:18:47 server id 1 end_log_pos 250841 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364192327/*!*/;
BEGIN
/*!*/;
# at 524
#130325 14:18:47 server id 1 end_log_pos 250930 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364192327/*!*/;
insert into qwsh values(4)
/*!*/;
# at 613
#130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425
COMMIT/*!*/;
# at 640
#130325 14:19:42 server id 1 end_log_pos 250976 Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
10
三是记录新的 master 的 log file 和 pos:
All other slaves should start replication from here. Statement should be:
CHANGE MASTER TO MASTER_HOST='10.0.0.74', MASTER_PORT=3306,
MASTER_LOG_FILE='mysql-bin.000003', MASTER_LOG_POS=475,
MASTER_USER='repl', MASTER_PASSWORD='xxx';
四是 Executing master IP activate script;
五是 Set read_only=0 on the new master
2.42.42.42.4 PhasePhasePhasePhase 4:4:4:4: SlavesSlavesSlavesSlaves RecoveryRecoveryRecoveryRecovery Phase..Phase..Phase..Phase..
2.4.12.4.12.4.12.4.1 PhasePhasePhasePhase 4.1:4.1:4.1:4.1: StartingStartingStartingStarting ParallelParallelParallelParallel SlaveSlaveSlaveSlave DiffDiffDiffDiff LogLogLogLog GenerationGenerationGenerationGeneration Phase..Phase..Phase..Phase..
判断各个 slave 与 lastest slave 是否存在 relay log 差异,在 latest slave
上执行如下命令,生成差异 relay log 文件,并通过 scp 拷贝到对应的从库上:
(Server 10.0.0.75 received relay logs up to: mysql-bin.000034:250405.
Need to get diffs from the latest slave(10.0.0.11) up to:
mysql-bin.000034:250773)
apply_diff_relay_logs --command=generate_and_send --scp_user=root
--scp_host=10.0.0.75 --latest_mlf=mysql-bin.000034
--latest_rmlp=250773 --target_mlf=mysql-bin.000034
--target_rmlp=250405 --server_id=3
--diff_file_readtolatest=/var/tmp/relay_from_read_to_latest_10.0.0.75
_3306_20130325143805.binlog --workdir=/var/tmp
--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.55
--relay_log_info=/data/mysql/data/relay-log.info
--relay_dir=/data/mysql/data/
2.4.22.4.22.4.22.4.2 PhasePhasePhasePhase 4.2:4.2:4.2:4.2: StartingStartingStartingStarting ParallelParallelParallelParallel SlaveSlaveSlaveSlave LogLogLogLog ApplyApplyApplyApply Phase..Phase..Phase..Phase..
一是 Waiting until all relay logs are applied
二是检查是否有最新的 relay log,然后合并后应用
10.0.0.11 有 lasted relay log:
11
apply_diff_relay_logs --command=apply --slave_user='root'
--slave_host=10.0.0.11 --slave_ip=10.0.0.11 --slave_port=3306
--apply_files=/var/tmp/saved_master_binlog_from_10.0.0.13_3306_201303
25143805.binlog --workdir=/var/tmp --target_version=5.5.27-log
--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.55 --slave_pass=xxx
10.0.0.75 没有最新的 relay log,需要合并 relay log 和 dead master 的 bin
log:
apply_diff_relay_logs --command=apply --slave_user='root'
--slave_host=10.0.0.75 --slave_ip=10.0.0.75 --slave_port=3306
--apply_files=/var/tmp/relay_from_read_to_latest_10.0.0.75_3306_20130
325143805.binlog,/var/tmp/saved_master_binlog_from_10.0.0.13_3306_201
30325143805.binlog --workdir=/var/tmp --target_version=5.5.27-log
--timestamp=20130325143805 --handle_raw_binlog=1 --disable_log_bin=0
--manager_version=0.55 --slave_pass=xxx
以下是对应 log 的内容:
[mysql@db-75 data]$ mysqlbinlog
/var/tmp/total_binlog_for_10.0.0.75_3306.20130325143805.binlog
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET
@OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#130325 11:03:52 server id 3 end_log_pos 107 Start: binlog v 4, server
v 5.5.27-log created 130325 11:03:52
BINLOG '
mL5PUQ8DAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
#700101 8:00:00 server id 1 end_log_pos 0 Rotate to
mysql-bin.000034 pos: 107
# at 150
#130325 10:40:31 server id 1 end_log_pos 0 Start: binlog v 4, server
v 5.5.27-log created 130325 10:40:31
BINLOG '
H7lPUQ8BAAAAZwAAAAAAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
12
# at 253
#130325 14:09:57 server id 1 end_log_pos 250473 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364191797/*!*/;
SET @@session.pseudo_thread_id=21/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0,
@@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1,
@@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET
@@session.character_set_client=33,@@session.collation_connection=33,@
@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 321
#130325 14:09:57 server id 1 end_log_pos 250562 Query
thread_id=21 exec_time=0 error_code=0
use test/*!*/;
SET TIMESTAMP=1364191797/*!*/;
insert into qwsh values(2)
/*!*/;
# at 410
#130325 14:09:57 server id 1 end_log_pos 250589 Xid = 2423
COMMIT/*!*/;
# at 437
#130325 14:12:19 server id 1 end_log_pos 250657 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364191939/*!*/;
BEGIN
/*!*/;
# at 505
#130325 14:12:19 server id 1 end_log_pos 250746 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364191939/*!*/;
insert into qwsh values(3)
/*!*/;
# at 594
#130325 14:12:19 server id 1 end_log_pos 250773 Xid = 2424
COMMIT/*!*/;
13
# at 621
#130325 14:12:36 server id 3 end_log_pos 250938 Stop
# at 640
#130325 14:18:47 server id 1 end_log_pos 250841 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364192327/*!*/;
BEGIN
/*!*/;
# at 708
#130325 14:18:47 server id 1 end_log_pos 250930 Query
thread_id=21 exec_time=0 error_code=0
SET TIMESTAMP=1364192327/*!*/;
insert into qwsh values(4)
/*!*/;
# at 797
#130325 14:18:47 server id 1 end_log_pos 250957 Xid = 2425
COMMIT/*!*/;
# at 824
#130325 14:19:42 server id 1 end_log_pos 250976 Stop
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
三是 Executed CHANGE MASTER
2.52.52.52.5 PhasePhasePhasePhase 5:5:5:5: NewNewNewNew mastermastermastermaster cleanupcleanupcleanupcleanup phase..phase..phase..phase..
Resetting slave info on the new master