pt-heartbeat should detect master-server-id

Bug #1365024 reported by Matthew B
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Opinion
Undecided
Unassigned

Bug Description

The documentation says:

"examines the replicated heartbeat record from its immediate master or the specified --master-server-id..."

Thus, this leaves one to believe, that in the case of simple Master->Slave replication, pt-heartbeat should grab the server_id from SHOW SLAVE STATUS.

Only if there is doubt, like Master_Server_Id NOT FOUND from SELECT * FROM heartbeat WHERE server_id = XX should the script abort with the error about not being able to determine masters id.

Yes, there is an existing wishlist, #1097997 for this, however, this seems to be more of a bug as the app doesn't behave like the documentation suggests.

In any case, pt-heartbeat --monitor/--check should FIRST get the Master ID from slave status. Then, query the heartbeat table. If multiple serverids are found, fail and ask user to specify. If serverid not found, fail and ask. If found, and 1 row continue as normal.

The idea here is to make pt-heartbeat easier to use, not harder/more complicated.

Revision history for this message
Nilnandan Joshi (nilnandan-joshi) wrote :

Unable to reproduce the same. After I ran pt-heartbeat on slave, I would able to see server-id in heartbeat table.
Here, 1 is master server-id and 101 is slave server-id.

nilnandan@Dell-XPS:~$ pt-heartbeat -D test --check h=localhost,u=root,p=msandbox,S=/tmp/mysql_sandbox20083.sock
105.00
nilnandan@Dell-XPS:~$
mysql> select * from heartbeat;
+---------------------+-----------+------+----------+-----------------------+---------------------+
| ts | server_id | file | position | relay_master_log_file | exec_master_log_pos |
+---------------------+-----------+------+----------+-----------------------+---------------------+
| 2014-09-15 13:06:26 | 1 | NULL | NULL | NULL | NULL |
| 2014-09-15 13:06:26 | 101 | NULL | NULL | NULL | NULL |
+---------------------+-----------+------+----------+-----------------------+---------------------+
2 rows in set (0.01 sec)

Can you please explain what exactly you want us to check? testcase with steps will be helpful.

Changed in percona-toolkit:
status: New → Incomplete
Revision history for this message
Matthew B (utdrmac) wrote :

This is simple example. Extremely simple. I have a master which I used innobackupex on to create a new slave, this one. pt-heartbeat was already running on the master. After starting up the slave and starting replication, I wanted to check the delay. pt-heartbeat should have seen that only 1 row was in the heartbeat table, compared that to show slave status Master_Server_Id and said "bingo! there's my master. no need to ask user for information i already know."

[mboehm@Master-DB ~]$ pt-heartbeat --monitor --database percona
The --master-server-id option must be specified because the heartbeat table `percona`.`heartbeat` uses the server_id column for --update or --check but the server's master could not be automatically determined.
Please read the DESCRIPTION section of the pt-heartbeat POD.

[mboehm@Master-DB ~]$ mysql -e "SELECT * FROM percona.heartbeat"
+----------------------------+-----------+---------------------+-----------+-----------------------+---------------------+
| ts | server_id | file | position | relay_master_log_file | exec_master_log_pos |
+----------------------------+-----------+---------------------+-----------+-----------------------+---------------------+
| 2014-09-16T17:13:30.000640 | 1316 | mysql-binlog.001260 | 111425973 | NULL | NULL |
+----------------------------+-----------+---------------------+-----------+-----------------------+---------------------+

I just don't see how "server's master could not be automatically determined" when show slave status clearly defines the masters' id. Only in the case of multiple rows in the heartbeat table or a single row not matching S.S.S id should the program abort with error.

In your example, why would you run pt-heartbeat in update mode on a slave? Slave's are supposed to be read-only and running pt-h changes data on a slave. Running in --check or --monitor mode should not be writing anything to the table. So I don't understand how your heartbeat table got a row with the slave's server_id.

Yes, if you have master->slaveA->slaveB you might want to run pt-h on master and slaveA. But the behavior of pt-h should still be pretty automatic. if you run --check/--monitor on slaveA, that's a simple case of S.S.S get masterid and compare from table. if you run --check/--monitor on serverB, in this case, it shouldn't say "cannot determine master id" it should say something more appropriate like "your direct master is slaveA but slaveA also has a master. please indicate which you would like to monitor in relation to"

Or heck, just have it output a column for each master if ran on slaveB. *shrug

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Percona Toolkit because there has been no activity for 60 days.]

Changed in percona-toolkit:
status: Incomplete → Expired
Revision history for this message
Matthew B (utdrmac) wrote :

Post to keep bug alive.

Changed in percona-toolkit:
status: Expired → Opinion
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-1238

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.