介绍
MHA(Master High Availability)是一套相对成熟的MySQL高可用方案,能做到在0~30s内自动完成数据库的故障切换操作,在master服务器不宕机的情况下,基本能保证数据的一致性。
它由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。其中,MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave上。MHA Node则运行在每个mysql节点上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它自动将最新数据的slave提升为master,然后将其它所有的slave指向新的master。
在MHA自动故障切换过程中,MHA试图保存master的二进制日志,从而最大程度地保证数据不丢失,当这并不总是可行的,譬如,主服务器硬件故障或无法通过ssh访问,MHA就没法保存二进制日志,这样就只进行了故障转移但丢失了最新数据。可结合MySQL 5.5中推出的半同步复制来降低数据丢失的风险。
MHA软件由两部分组成:Manager工具包和Node工具包,具体说明如下:
MHA Manager:
1. masterha_check_ssh:检查MHA的SSH配置状况 2. masterha_check_repl:检查MySQL的复制状况 3. masterha_manager:启动MHA 4. masterha_check_status:检测当前MHA运行状态 5. masterha_master_monitor:检测master是否宕机 6. masterha_master_switch:控制故障转移(自动或手动) 7. masterha_conf_host:添加或删除配置的server信息 8. masterha_stop:关闭MHA
MHA Node:
save_binary_logs:保存或复制master的二进制日志 apply_diff_relay_logs:识别差异的relay log并将差异的event应用到其它slave中 filter_mysqlbinlog:去除不必要的ROLLBACK事件(MHA已不再使用这个工具) purge_relay_logs:消除中继日志(不会堵塞SQL线程)
另有如下几个脚本需自定义:
1. master_ip_failover:管理VIP 2. master_ip_online_change: 3. masterha_secondary_check:当MHA manager检测到master不可用时,通过masterha_secondary_check脚本来进一步确认,减低误切的风险。 4. send_report:当发生故障切换时,可通过send_report脚本发送告警信息。
集群信息
角色 | ip地址 | ssh端口 | server-id | 类型 |
master | 192.168.170.10 | 22 | 10 | 读写 |
candicate master | 192.168.170.11 | 22 | 11 | 只读 |
slave | 192.168.170.12 | 22 | 12 | 只读 |
monitor | 192.168.170.13 | 22 | 13 | 监控集群 |
vip | 192.168.170.14 |
注:操作系统均为centos6.9,其中,master对外提供写服务,备选master提供读服务,slave也提供相关的读服务,一旦master宕机,将会把备选master提升为新的master,slave指向新的master,下面是最终的架构图
测试,注意此测试是在上篇博文mysql主从复制测试成功的基础上测试的
1,设置四台主机hosts文件都加入下面解析,可以通过主机名来互相访问
[root@master ~]#vim /etc/hosts 192.168.170.10 mha-master 192.168.170.11 mha-slave1 192.168.170.12 mha-slave2 192.168.170.13 mha-manager,
2,然后在master节点操作:做到任意一台机器登录到另外机器都无密码
[root@master ~/.ssh]#ls authorized_keys known_hosts [root@master ~/.ssh]#ssh-keygen [root@master ~/.ssh]#ls authorized_keys id_rsa id_rsa.pub known_hosts [root@master ~/.ssh]#cat id_rsa.pub > authorized_keys [root@master ~/.ssh]#scp -P 22 authorized_keys id_rsa mha-slave1:/root/.ssh [root@master ~/.ssh]#scp -P 22 authorized_keys id_rsa mha-slave2:/root/.ssh [root@master ~/.ssh]#scp -P 22 authorized_keys id_rsa mha-manager:/root/.ssh
经测试4节点都可以无秘钥登录到其他主机,注意,这里必须是ssh的22号端口,因为mha没有指定自定义端口的选项,并且mha严重依赖ssh,所以端口必须是默认
3,在所有节点上安装MHA node
下面的站点是mha托管的站点,可以下载到最新的包,这里都使用0.56的包
https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
在MySQL服务器上安装MHA node所需的perl模块(DBD:mysql) yi perl-DBD-MySQL 安装node包 yi mha4mysql-node-0.56-0.el6.noarch.rpm
4,在Monitor host节点上部署MHA Manager
yi perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager yi mha4mysql-manager-0.56-0.el6.noarch.rpm
如果上面的网站打不开,这里有下载好的包
5,时间同步
下面是搭建主从的过程,已经完成
四个节点都操作
mysql> grant replication slave,replication client on *.* to 'repl'@'192.168.170.%' identified by '234567';
叭叭叭,看其他博文
下面是需要操作的
1,查看下rpm包都安装哪些东西
[root@master ~]#rpm -ql mha4mysql-manager /usr/bin/masterha_check_repl /usr/bin/masterha_check_ssh /usr/bin/masterha_check_status /usr/bin/masterha_conf_host /usr/bin/masterha_manager /usr/bin/masterha_master_monitor /usr/bin/masterha_master_switch /usr/bin/masterha_secondary_check /usr/bin/masterha_stop [root@master ~]#rpm -ql mha4mysql-node /usr/bin/apply_diff_relay_logs /usr/bin/filter_mysqlbinlog /usr/bin/purge_relay_logs /usr/bin/save_binary_logs 其实这些rpm包就是一些脚本文件,
2,在master节点操作,从节点会自动复制执行这些语句,用于monitor节点操作数据库用
grant all privileges on *.* to 'monitor'@'192.168.170.%' identified by '345678';
3,将master节点和两个slave节点的防火墙的3306端口都打开,因为当主节点宕机后,slave1变为主后,其他节点需要从这个节点进行同步数据,
-A INPUT -s 192.168.170.0/24 -p tcp --dport 3306 -j ACCEPT
4,master节点和slave1,slave2节点都需要设置开启binlog和relaylog,因为当主宕机再重新启动后会变为从节点,需要relaylog功能。但是初始化的时候,从节点要设置readonly,manager根据这个来区分哪个是主哪个是从,当从转换为主的时候,readonly会自动设置为关闭
innodb_file_per_table skip_name_resolve=1 log-bin=master-bin relay-log=relay-bin server_id=2 read_only=1 #只需在从节点设置,一般都是刚开始的状态,后面就不受配置文件控制了 relay_log_purge=0
配置mha
[root@master /etc/mha]#cat mha.conf [server default] user=monitor//设置管理用户 password=345678//设置管理用户的密码 manager_workdir=/etc/mha//设置manager的工作目录 manager_log=/var/log/mysqld/mha.log//设置manager的日志 ping_interval=1//设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候进行自动failover remote_workdir=/tmp//设置远端mysql在发生切换时binlog的保存位置 ssh_user=root//设置ssh的登录用户名,基于秘钥认证 repl_user=rep_user//设置复制环境中的复制用户名 repl_password=234567//设置复制用户的密码 master_binlog_dir=/data/mysql_binlog//设置master默认保存binlog的位置,以便MHA可以找到master的日志 master_ip_failover_script=/etc/mha/master_ip_failover//设置自动failover时候的切换脚本,脚本在最后面,会自动调用该脚本 master_ip_online_change_script=/etc/mha/master_ip_online_change//设置手动切换时候的切换脚本,手动切换命令是switch那个命令,这里这个脚本文件会自动调用 secondary_check_script=/etc/mha/masterha_secondary_check -s 192.168.170.11 -s 192.168.170.12 --user=root \#不知道能不能换行,写成一行比较好 --master_host=192.168.170.10 --master_ip=192.168.170.10 --master_port=3306//一旦MHA到master的监控之间出现问题,MHA Manager将会判断其它两个 //slave是否能建立到master_ip 3306端口的连接 shutdown_script=""//设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机防止发生脑裂) report_script=/etc/mha/send_report//转移完成后发送报告给用户 [server1] hostname=192.168.170.10 port=3306 candidate_master=1 [server2] hostname=192.168.170.11 port=3306 candidate_master=1//设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中最新的slave check_repl_delay=0 //默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个 //slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了 //candidate_master=1的主机非常有用,因为它保证了这个候选主在切换过程中一定是最新的master [server3] hostname=192.168.170.12 port=3306
注意:
1> 在编辑该文件时,后面的注释切记要去掉,MHA并不会将后面的内容识别为注释。
2> 配置文件中设置了master_ip_failover_script,secondary_check_script,master_ip_online_change_script,report_script,对应的文件见文章末尾。
为上面的脚本文件设置执行权限
[root@master /etc/mha]#chmod +x master_ip_failover master_ip_online_change masterha_secondary_check send_report [root@master /etc/mha]#ll total 36 -rwxr-xr-x 1 root root 5165 Aug 7 10:37 masterha_secondary_check -rwxr-xr-x 1 root root 4195 Aug 7 10:29 master_ip_failover -rwxr-xr-x 1 root root 10534 Aug 7 10:34 master_ip_online_change -rw-r--r-- 1 root root 784 Aug 7 11:17 mha.conf -rwxr-xr-x 1 root root 2397 Aug 7 11:02 send_report
为每个slave节点设置relaylog清除
设置relay log清除方式(在每个Slave上),因为主节点有可能也会成为slave,所以下面的配置三个节点都需要操作,已经在配置文件中设置自动清除关闭,这里为了说明作用
mysql> set global relay_log_purge=0; Query OK, 0 rows affected (0.00 sec)
MHA在发生切换过程中,从库在恢复的过程中,依赖于relay log的相关信息,所以我们这里要将relay log的自动清楚设置为OFF,采用手动清楚relay log的方式。
在默认情况下,从服务器上的中继日志会在SQL线程执行完后被自动删除。但是在MHA环境中,这些中继日志在恢复其它从服务器时可能会被用到,因此需要禁用中继日志的自动清除。改为定期手动清除SQL线程应用完的中继日志。
在ext3文件系统下,删除大的文件需要一定的时间,这样会导致严重的复制延迟,所以在Linux中,一般都是通过硬链接的方式来删除大文件。
设置定期清理relay脚本
MHA节点中包含了purge_relay_logs脚本,它可以为relay log创建硬链接,执行set global relay_log_purge=1,再执行set global relay_log_purge=0,等待几秒钟以便SQL线程切换到新的中继日志,
下面看看脚本的使用方法:
# purge_relay_logs --user=monitor --password=345678 -disable_relay_log_purge --workdir=/data/tmp 其中, --user:mysql用户名 --password:mysql用户的密码 --host: mysqlserver地址 --workdir:指定创建relay log的硬链接的位置,默认的是/var/tmp。由于系统不同分区创建硬链接文件会失败,故需要指定具体的硬链接的位置。 --disable_relay_log_purge:默认情况下,如果relay_log_purge=1,则脚本会直接退出。通过设置这个参数,该脚本会首先将relay_log_purge设置为1, #清除掉relay log后,再将该参数设置为0。
下面是常用的执行命令
-
设置crontab来定期清理relay log
MHA在切换的过程中会直接调用mysqlbinlog命令,故需要在环境变量中指定mysqlbinlog的具体路径。将mysqlbinlog的路径添加到环境变量中
因为我测试环境中relaylog在/data/目录下面,而这个目录挂载在单独的分区,所以我设置工作目录为:/data/tmp,要和relaylog在同一个分区,定时任务输出目录和工作目录要事先创建
# vim /etc/crontab 0 4 * * * /usr/bin/purge_relay_logs --user=monitor --password=345678 --host=192.168.170.12 -disable_relay_log_purge --workdir=/data/tmp \ &>> /var/log/mysqld/purge_relay_logs.log 注意:最好是每台slave服务器在不同时间点执行该计划任务。
下面的命令是在manager节点进行操作,其他节点不用
-
检测ssh配置,即是否可以通过任意一个节点连接到另外两个节点,在monitor节点操作
[root@master ~]#masterha_check_ssh --conf=/etc/mha/mha.conf Sun Aug 6 20:33:12 2017 - [info] All SSH connection tests passed successfully.
-
检测整个集群的复制状态是否符合mha设置
[root@master ~]#masterha_check_repl --conf=/etc/mha/mha.conf 检测主从复制检测 MySQL Replication Health is OK.
-
开启MHA Manager监控,就是开启服务,这里是后台运行,如果测试的话可以放到前台执行查看效果,mha完成一次master转移后,mha进程就会自动退出,这时,需要手动修复好宕机的机器,加入集群,然后重新启动mha进程。
[root@master ~]#nohup masterha_manager --conf=/etc/mha/mha.conf --remove_dead_master_conf --ignore_last_failover &>> /var/log/mysqld/manager.log &
其中,remove_dead_master_conf:该参数代表当发生主从切换后,老的主库的IP将会从配置文件中移除。
ignore_last_failover:在默认情况下,MHA发生切换后将会在工作目录下产生mha.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件且两次切换的时间间隔不足8小时的话,将不允许触发切换。除非在第一次切换后手动rm -rf 。该参数代表忽略上次MHA触发切换产生的文件。
可以在前台运行这个进程查看过程,会发现,当一次主从切换完成后,进程就会推出,需要一个脚本来检测进程是否退出,是否需要重新启动进程
查看MHA Manager监控是否正常
[root@master ~]#masterha_check_status --conf=/etc/mha/mha.conf mha (pid:21956) is running(0:PING_OK), master:192.168.170.10
-
关闭MHA Manager监控进程,就是关闭这个服务
[root@master ~]#masterha_stop --conf=/etc/mha/mha.conf Stopped mha successfully.
–
官方对于master_ip_failover,master_ip_online_change,send_report脚本,给出的只是sample,切换的逻辑需要自己定义。很多童鞋对perl并不熟悉,觉得无从下手,其实,完全可以调用其它脚本,譬如python,shell等。
[root@node4 ~]# cat test.pl #!/usr/bin/perl use strict; my $cmd='python /root/test.py'; system($cmd); [root@node4 ~]# cat test.py #!/usr/bin/python print "hello,python" [root@node4 ~]# perl test.pl hello,python
注意
测试过程中可以查看mha的日志文件信息,非常详细,最后会给用户发送转移成功的邮件
当mha完成一次master转移后,mha进程会退出,需要修复主机后重新启动mha进程,mha进程应该被监控起来,比如zabbix,当宕机后,需要通知管理员。
整个master转移是自动进行的,不用人工干预,当原master宕机,机器启动后,需按照主从同步那一讲中,将最新master机器的sql文件复制到原master机器,使之能与现在的master成功同步复制,并且设置readonly=1,这样,新的集群就配置好了,用mha的脚本检查各配置和网络是否测试成功,然后将mha启动,进行下一轮监控。
vip刚开始需要手动添加到主节点上面,剩下的转移,会根据下面的脚本来进行vip转移,下面的配置要懂
脚本配置文件
下面是master_ip_failover脚本文件,需要修改的地方有:my $vip = '192.168.170.14';然后将脚本的路径添加到上面的配置文件中
[root@master /etc/mha]#cat master_ip_failover #!/usr/bin/env perl # Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This is a sample script and is not complete. Modify the script based on your environment. use strict; use warnings FATAL => 'all'; use Getopt::Long; use MHA::DBHelper; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password ); my $vip = '192.168.170.14'; my $key = "2"; my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip/24"; my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down"; my $ssh_send_garp = "/sbin/arping -U $vip -I eth0 -c 1"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, ); exit &main(); sub main { if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; eval { print "Disabling the VIP an old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); ## Set read_only=0 on the new master $new_master_handler->disable_log_bin_local(); print "Set read_only=0 on the new master.\n"; $new_master_handler->disable_read_only(); ## Creating an app user on the new master # print "Creating app user on the new master..\n"; # FIXME_xxx_create_user( $new_master_handler->{dbh} ); $new_master_handler->enable_log_bin_local(); $new_master_handler->disconnect(); print "Enabling the VIP $vip on the new master: $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; # If you want to continue failover, exit 10. exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { # do nothing exit 0; } else { &usage(); exit 1; } } sub start_vip(){ `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; `ssh $ssh_user\@$new_master_host \" $ssh_send_garp \"`; } sub stop_vip(){ return 0 unless ($ssh_user); `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }
下面是master_ip_online_change脚本文件,需要修改的地方:my $vip = '192.168.170.14';然后将脚本的路径添加到上面的配置文件中
[root@master /etc/mha]#cat master_ip_online_change #!/usr/bin/env perl # Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This is a sample script and is not complete. Modify the script based on your environment. use strict; use warnings FATAL => 'all'; use Getopt::Long; use MHA::DBHelper; use MHA::NodeUtil; use Time::HiRes qw( sleep gettimeofday tv_interval ); use Data::Dumper; my $_tstart; my $_running_interval = 0.1; my $vip = '192.168.170.14'; my $key = "2"; my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip/24"; my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down"; my $ssh_send_garp = "/sbin/arping -U $vip -I eth0 -c 1"; my ( $command, $orig_master_is_new_slave, $orig_master_host, $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, $orig_master_ssh_user, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password, $new_master_ssh_user, ); GetOptions( 'command=s' => \$command, 'orig_master_is_new_slave' => \$orig_master_is_new_slave, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'orig_master_user=s' => \$orig_master_user, 'orig_master_password=s' => \$orig_master_password, 'orig_master_ssh_user=s' => \$orig_master_ssh_user, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, 'new_master_ssh_user=s' => \$new_master_ssh_user, ); exit &main(); sub start_vip(){ `ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`; `ssh $new_master_ssh_user\@$new_master_host \" $ssh_send_garp \"`; } sub stop_vip(){ `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub current_time_us { my ( $sec, $microsec ) = gettimeofday(); my $curdate = localtime($sec); return $curdate . " " . sprintf( "%06d", $microsec ); } sub sleep_until { my $elapsed = tv_interval($_tstart); if ( $_running_interval > $elapsed ) { sleep( $_running_interval - $elapsed ); } } sub get_threads_util { my $dbh = shift; my $my_connection_id = shift; my $running_time_threshold = shift; my $type = shift; $running_time_threshold = 0 unless ($running_time_threshold); $type = 0 unless ($type); my @threads; my $sth = $dbh->prepare("SHOW PROCESSLIST"); $sth->execute(); while ( my $ref = $sth->fetchrow_hashref() ) { my $id = $ref->{Id}; my $user = $ref->{User}; my $host = $ref->{Host}; my $command = $ref->{Command}; my $state = $ref->{State}; my $query_time = $ref->{Time}; my $info = $ref->{Info}; $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info); next if ( $my_connection_id == $id ); next if ( defined($query_time) && $query_time < $running_time_threshold ); next if ( defined($command) && $command eq "Binlog Dump" ); next if ( defined($user) && $user eq "system user" ); next if ( defined($command) && $command eq "Sleep" && defined($query_time) && $query_time >= 1 ); if ( $type >= 1 ) { next if ( defined($command) && $command eq "Sleep" ); next if ( defined($command) && $command eq "Connect" ); } if ( $type >= 2 ) { next if ( defined($info) && $info =~ m/^select/i ); next if ( defined($info) && $info =~ m/^show/i ); } push @threads, $ref; } return @threads; } sub main { if ( $command eq "stop" ) { ## Gracefully killing connections on the current master # 1. Set read_only= 1 on the new master # 2. DROP USER so that no app user can establish new connections # 3. Set read_only= 1 on the current master # 4. Kill current queries # * Any database access failure will result in script die. my $exit_code = 1; eval { ## Setting read_only=1 on the new master (to avoid accident) my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error(die_on_error)_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); print current_time_us() . " Set read_only on the new master.. "; $new_master_handler->enable_read_only(); if ( $new_master_handler->is_read_only() ) { print "ok.\n"; } else { die "Failed!\n"; } $new_master_handler->disconnect(); # Connecting to the orig master, die if any database error happens my $orig_master_handler = new MHA::DBHelper(); $orig_master_handler->connect( $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, 1 ); ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand $orig_master_handler->disable_log_bin_local(); # print current_time_us() . " Drpping app user on the orig master..\n"; #drop_app_user($orig_master_handler); ## Waiting for N * 100 milliseconds so that current connections can exit my $time_until_read_only = 15; $_tstart = [gettimeofday]; my @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_read_only > 0 && $#threads >= 0 ) { if ( $time_until_read_only % 5 == 0 ) { printf "%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_read_only * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_read_only--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); } ## Setting read_only=1 on the current master so that nobody(except SUPER) can write print current_time_us() . " Set read_only=1 on the orig master.. "; $orig_master_handler->enable_read_only(); if ( $orig_master_handler->is_read_only() ) { print "ok.\n"; } else { die "Failed!\n"; } ## Waiting for M * 100 milliseconds so that current update queries can complete my $time_until_kill_threads = 5; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_kill_threads > 0 && $#threads >= 0 ) { if ( $time_until_kill_threads % 5 == 0 ) { printf "%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_kill_threads * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_kill_threads--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); } ## Terminating all threads print current_time_us() . " Killing all application threads..\n"; $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 ); print current_time_us() . " done.\n"; $orig_master_handler->enable_log_bin_local(); $orig_master_handler->disconnect(); ## Droping the VIP print "Disabling the VIP an old master: $orig_master_host \n"; &stop_vip(); ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { ## Activating master ip on the new master # 1. Create app user with write privileges # 2. Moving backup script if needed # 3. Register new master's ip to the catalog database # We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery. # If exit code is 0 or 10, MHA does not abort my $exit_code = 10; eval { my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); ## Set read_only=0 on the new master $new_master_handler->disable_log_bin_local(); print current_time_us() . " Set read_only=0 on the new master.\n"; $new_master_handler->disable_read_only(); ## Creating an app user on the new master #print current_time_us() . " Creating app user on the new master..\n"; # create_app_user($new_master_handler); print "Enabling the VIP $vip on the new master: $new_master_host \n"; &start_vip(); $new_master_handler->enable_log_bin_local(); $new_master_handler->disconnect(); ## Update master ip on the catalog database, etc $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { # do nothing exit 0; } else { &usage(); exit 1; } } sub usage { print "Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; die; }
下面是脚本masterha_secondary_check文件,不需要修改文件内容,都是通过参数传递到脚本里面的
[root@master /etc/mha]#cat masterha_secondary_check #!/bin/env perl # Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA use strict; use warnings FATAL => 'all'; use English qw(-no_match_vars); use Getopt::Long; use Pod::Usage; use MHA::ManagerConst; my @monitoring_servers; my ( $help, $version, $ssh_user, $ssh_port, $ssh_options, $master_host, $master_ip, $master_port, $master_user, $master_password, $ping_type ); my $timeout = 5; $| = 1; GetOptions( 'help' => \$help, 'version' => \$version, 'secondary_host=s' => \@monitoring_servers, 'user=s' => \$ssh_user, 'port=s' => \$ssh_port, 'options=s' => \$ssh_options, 'master_host=s' => \$master_host, 'master_ip=s' => \$master_ip, 'master_port=i' => \$master_port, 'master_user=s' => \$master_user, 'master_password=s' => \$master_password, 'ping_type=s' => \$ping_type, 'timeout=i' => \$timeout, ); if ($version) { print "masterha_secondary_check version $MHA::ManagerConst::VERSION.\n"; exit 0; } if ($help) { pod2usage(0); } unless ($master_host) { pod2usage(1); } sub exit_by_signal { exit 1; } local $SIG{INT} = $SIG{HUP} = $SIG{QUIT} = $SIG{TERM} = \&exit_by_signal; $ssh_user = "root" unless ($ssh_user); $ssh_port = 22 unless ($ssh_port); $master_port = 3306 unless ($master_port); if ($ssh_options) { $MHA::ManagerConst::SSH_OPT_CHECK = $ssh_options; } $MHA::ManagerConst::SSH_OPT_CHECK =~ s/VAR_CONNECT_TIMEOUT/$timeout/; # 0: master is not reachable from all monotoring servers # 1: unknown errors # 2: at least one of monitoring servers is not reachable from this script # 3: master is reachable from at least one of monitoring servers my $exit_code = 0; foreach my $monitoring_server (@monitoring_servers) { my $ssh_user_host = $ssh_user . '@' . $monitoring_server; my $command = "ssh $MHA::ManagerConst::SSH_OPT_CHECK -p $ssh_port $ssh_user_host \"perl -e " . "\\\"use IO::Socket::INET; my \\\\\\\$sock = IO::Socket::INET->new" . "(PeerAddr => \\\\\\\"$master_host\\\\\\\", PeerPort=> $master_port, " . "Proto =>'tcp', Timeout => $timeout); if(\\\\\\\$sock) { close(\\\\\\\$sock); " . "exit 3; } exit 0;\\\" \""; my $ret = system($command); $ret = $ret >> 8; if ( $ret == 0 ) { print "Monitoring server $monitoring_server is reachable, Master is not reachable from $monitoring_server. OK.\n"; next; } if ( $ret == 3 ) { if ( defined $ping_type && $ping_type eq $MHA::ManagerConst::PING_TYPE_INSERT ) { my $ret_insert; my $command_insert = "ssh $MHA::ManagerConst::SSH_OPT_CHECK -p $ssh_port $ssh_user_host \'" . "/usr/bin/mysql -u$master_user -p$master_password -h$master_host " . "-e \"CREATE DATABASE IF NOT EXISTS infra; " . "CREATE TABLE IF NOT EXISTS infra.chk_masterha (\\`key\\` tinyint NOT NULL primary key,\\`val\\` int(10) unsigned NOT NULL DEFAULT '0'\) engine=MyISAM; " . "INSERT INTO infra.chk_masterha values (1,unix_timestamp()) ON DUPLICATE KEY UPDATE val=unix_timestamp()\"\'"; my $sigalrm_timeout = 3; eval { local $SIG{ALRM} = sub { die "timeout.\n"; }; alarm $sigalrm_timeout; $ret_insert = system($command_insert); $ret_insert = $ret_insert >> 8; alarm 0; }; if ( $@ || $ret_insert != 0 ) { print "Monitoring server $monitoring_server is reachable, Master is not writable from $monitoring_server. OK.\n"; next; } } print "Master is reachable from $monitoring_server!\n"; $exit_code = 3; last; } else { print "Monitoring server $monitoring_server is NOT reachable!\n"; $exit_code = 2; last; } } exit $exit_code; # ############################################################################ # Documentation # ############################################################################ =pod =head1 NAME masterha_secondary_check - Checking master availability from additional network routes =head1 SYNOPSIS masterha_secondary_check -s secondary_host1 -s secondary_host2 .. --user=ssh_username --master_host=host --master_ip=ip --master_port=port See online reference (http://code.google.com/p/mysql-master-ha/wiki/Parameters#secondary_check_script) for details. =head1 DESCRIPTION See online reference (http://code.google.com/p/mysql-master-ha/wiki/Parameters#secondary_check_script) for details.
下面是脚本:send_report文件,需要修改发送邮件部分的配置,然后添加到上面的配置文件中
[root@master /etc/mha]#cat send_report #!/usr/bin/perl # Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This is a sample script and is not complete. Modify the script based on your environment. use strict; use warnings FATAL => 'all'; use Mail::Sender; use Getopt::Long; #new_master_host and new_slave_hosts are set only when recovering master succeeded my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body ); my $smtp='smtp.ym.163.com'; my $mail_from='a@163.com'; my $mail_user='a@163.com'; my $mail_pass='xxxxx'; my $mail_to=['b@163.com']; GetOptions( 'orig_master_host=s' => \$dead_master_host, 'new_master_host=s' => \$new_master_host, 'new_slave_hosts=s' => \$new_slave_hosts, 'subject=s' => \$subject, 'body=s' => \$body, ); mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body); sub mailToContacts { my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_; open my $DEBUG, "> /tmp/monitormail.log" or die "Can't open the debug file:$!\n"; my $sender = new Mail::Sender { ctype => 'text/plain; charset=utf-8', encoding => 'utf-8', smtp => $smtp, from => $mail_from, auth => 'LOGIN', TLS_allowed => '0', authid => $user, authpwd => $passwd, to => $mail_to, subject => $subject, debug => $DEBUG }; $sender->MailMsg( { msg => $msg, debug => $DEBUG } ) or print $Mail::Sender::Error; return 1; } # Do whatever you want here exit 0;
参考文档:mha架构
–
–
–
评论前必须登录!
注册