Initial features set

Registered by Vlad Lesin on 2012-07-03

===Common description==

percona-playback is a tool for playback queries on sql server. Currently it can read queries from mysql query-log and tcpudmp files and playback them on mysql server. It has plugin architecture and can be extended with plugins.

There are four categories of plugins for percona-playback:

"input" - responsible for where input data is given from,
"db" - where queries should be played,
"report" - how to represent results,
"other" - plugins that doesn't belong to the previous categories.

Each plugin can have own set of command line options which are usually provided with help messages.

At this moment the following plugins are implemented:

1) "input"
a) query_log - reads queries from query-log files
b) tcpdump - reads queries from tcpdump files

2) "db"
a) libmysqlclient - plays queries in mysql server
b) null - doesn't play queries anywhere but useful for testing

3) "report"
a) simple_report - output information about executed queries in simple form

The engine's architecture is "thread-per-connection". Each thread has queries queue. "input" plugin parses input data and pass parsed queries to the engine. The engine pushes queries to the queue of certain "db" thread. The queue size can be limited with --queue-depth command line option. If the limit is reached the engine will stop "input" plugin thread until the size of the queue becomes less then the limit.

Input data can be played several times in a row. The number of repeates can be set with --loop command line options(NYI).

===Tcpdump plugin===

The main purpose of this plugin is to parse mysql queries from tcpdump files. Currently this plugin doesn't support work with "prepare" and "execute" statements. Also it doesn't parse mysql threads id because they are passed only during handshake, but tcpdump tool can be started at the middle of session. That's why thread id which can be seen in the report of "report" plugins is some hash from client ip-port pair. Currently only parsing of ipv4 connections is implemented.

The plugin has two modes of work:
"accurate" - preserves queries execution time and pauses between queries, it's possible to playback the same load that was recorded on production with some accuracy.
"fast" - play queries as fast as possible

The example of usage:

Playback percona_playback/test/tcpdump_accuracy.dump on mysql server in "accurate" mode and queries queue limit of 10k elements:
---
bin/percona_playback --input-plugin=tcpdump --tcpdump-file=percona_playback/test/tcpdump_accuracy.dump --tcpdump-mode=accurate --db-plugin=libmysqlclient --mysql-host=some_host --mysql-port=3307 --mysql-username=test_user --mysql-password=blablabla --mysql-schema=test1 --queue-depth 10000
---

===Query_log plugin===

Parses queries from query log files. It can preseve query execution time with --query-log-preserve-query-time option. The --query-log-read-count options allows to replay query log file several times(NYI). The difference between this and --loop options should be that --loop reports at the end of each execution whereas --query-log-read-count reports once after all executions.

The example of usage:

Run ./percona_playback/test/basic-slow.log on default libmysqlplugin settings:
---
bin/percona_playback --db-plugin=libmysqlclient --slow-query-log-file=./percona_playback/test/basic-slow.log
---

The other options description can be found in "help" message.

===Future development===

My vision of the future development is the following:

Foremost tasks that should be done in the nearest release:
1) Obtain clean jenkins builds on all target platforms
2) Implement repeating execution
3) Lead tcpdump and query_log plugin options to some sameness. I mean if tcpdump plugin has work mode selection option the query_log should have the same.
4) Implement tests that whould verify whole "input-dispatch-output-report" chain.
5) Test the tool on real data, fix bugs.

The tasks for the next releases:
1) Support of ipv6 in tcpdump plugin.
2) Support of "online" playback for tcpdump plugin (I mean fetch data from working interace and copy load to another server)
3) Tcpdump plugin refactoring (I think it's possible to implement FSM instead of set of conditions)
4) Go away from "thread-per-connection" engine architecture. I think we could significantly increase the tool performance if we would have several worker threads with asynchronous IO. We could play with "green" threads as well to simplify work with db connections.
5) It would be grate if input plugins don't be suspended by dispatcher in the case if some db-thread has a lot of slow queries in it queue.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.