Redesign pt-query-digest

Registered by Baron Schwartz

SUPERSEDED: We did almost all this in 2.2.

This spec is to be researched and drafted more fully by Daniel. Here is a start:

1. Simplify.

Remove features that we don't need, such as Postgres log parsing, --daemonize, --execute, --execute-throttle, --log, --mirror, --pid, --report-all, --table-access, --apdex-threshold

2. Make log format auto-detected.

See the Postgres and Syslog parsers for examples of how this can be done.

3. Remove legacy things.

We have a lot of legacy stuff such as a feature that attempts to cope with bugs in Percona Server that caused very large Query_time values. Re-assess these things and remove them if they are not beneficial.

4. Clarify the "pipeline" architecture.

It's a nightmare of complexity right now, and the tail of it (the reports) are just awful code. Figure out something maintainable for the future.

5. Simplify reporting options.

I don't think we need timeline reporting, for example. We basically need the tool to do one type of reporting, and do it well. Also, some things like apdex scores can probably go away.

6. Make the tcpdump parser respect sequence numbers

This will prevent problems such as spurious long-running queries when tcpdump drops packets.

7. Fix processlist polling.

8. Implement the ability to get queries from the slow log table.

9. Improve performance if possible.

10. Implement proper "query review" features

The query review feature is ugly to use and the code is ugly. Design a properly normalized schema that will support these requirements: proper entity-relationship between query, fingerprint, a "run" of the tool, and a server; ability to do time-series querying; other things that may come up as Daniel and I talk about it. Basically, right now the tool has a broken data model -- it stores the query review data wrongly in a non-normalized way. It needs to be properly normalized so that I can run the tool many times, store data from many servers into one set of tables, and then answer questions such as "what is the difference between the stuff I did this morning at 10am versus the time I ran the tool at 10:15am" and "what is the difference between this server's queries and that server's" and "what kind of trend does this query have over time" and "when was this query first seen" and so on. Percona customer issue 19683 has some discussion on this, and a Skype call to that customer to discuss wouldn't be out of place.

It would also be good to simplify this. For example, remove the ability to do a "review" and print a report at the same time. Make it either-or, not both-and. That might help keep the code simple, although it is not necessary if we can find a better way to do it.

Blueprint information

Status:
Complete
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
Superseded
Series goal:
None
Implementation:
Not started
Milestone target:
None
Completed by
Daniel Nichter

Related branches

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.