Percona Toolkit moved to https://jira.percona.com/projects/PT

Check PXC/Galera-specific replication latency

Registered by Daniel Nichter on 2013-03-27

For example: pt-osc, pt-table-checksum. These tools check if any slaves are behind and meter themselves accordingly.

For Galera there are a few things that are important:

- Galera uses 'flow control' as a replication lag feedback loop. If the replication queue gets too large on any node, it will use flow control to slow down writes. This causes write-stalls (by design). These tools should avoid that.
- The default queue size (gcs.fc_limit - measured in pending transactions) is 16 (which changes a bit by default depending on how many nodes you have). This can be tune up to the several hundreds. Typically any queue sizes > 0 may indicate some amount of lag on the slaves.

There are several status variables that should be useful here:
- wsrep_flow_control_paused -- % of time (between 0 and 1) that flow control was in effect since the last SHOW GLOBAL STATUS
- wsrep_flow_control_sent -- FC messages SENT by a node (indicates the node that is laggy). This might be better since it's a global counter, but you'd need to check all nodes for this.
- wsrep_flow_control_recv -- FC messages received (from anywhere in the cluster) -- just checking the local node for this should be sufficient.
- wsrep_local_recv_queue -- current size of the recv queue
- wsrep_local_recv_queue_avg -- average queue size since last SHOW GLOBAL STATUS

Blueprint information

Status:: Not started

Approver:: None

Priority:: Medium

Drafter:: None

Direction:: Needs approval

Assignee:: None

Definition:: New

Series goal:: Accepted for 2.2

Implementation:: Not started

Milestone target:: None

Related branches

Related bugs

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

Jay Janssen

Mrten