TokuDB - background auto analyze and recount rows

Registered by George Ormond Lorch III

Implement a means to have tables automatically analyzed in the background based on a measured change in data.

Blueprint information

Status:
Complete
Approver:
George Ormond Lorch III
Priority:
High
Drafter:
George Ormond Lorch III
Direction:
Approved
Assignee:
George Ormond Lorch III
Definition:
Approved
Series goal:
Accepted for 5.6
Implementation:
Implemented
Milestone target:
milestone icon 5.6.27-76.0
Started by
George Ormond Lorch III
Completed by
George Ormond Lorch III

Related branches

Sprints

Whiteboard

ST-57094 : Cardinality and estimated rows are inaccurate
FT-684 : estimated # of rows in a table could become inaccurate after deletes
DB-254 : speed up analyze table
DB-306 : tokudb index statistics are not computed automatically
DB-825 : Statistics, analyze table, and data change
DB-848 : NULL cardinality on partitioned tables
DB-849 : Index stats only samples left side of fractal trees

Overview:
---------
* Implement background job manager that can perform operations on a background thread.
  - Jobs must be identifiable and cancellable.
  - Jobs must be 'testable' to prevent multiple schedulings of the same job on the same target (table).
  - Jobs must be reported with their complete scheduling and status informtion within an information_schema table.

* Implement a means of automatically performing a table status analysis based on some measurable table change value, allow the analysis to run either in the context of the calling thread or in a background thread.

* Implement variable index status scaling factor similar to the hard coded InnoDB '/2'.

* Implement a means of manually recounting the logical row set within a table and updating the persisted row counts within the PerconaFT library.

* Fix table analysis functionality:
  - Report status values to server immediately upon completion of any analysis. Current behavior only serializes table status values to storage, meaning that new status values will not be used until the table has been closed and re-opened.
  - Fix NULL status, tables/partitions that have no statistics should return a 1 for all status values, not 0.
  - Fix status scaling. InnoDB returns all status values as 50% more unique than it actually measures to coerce the optimizer to favor an index use over a table scan. TokuDB has no such scaling.
  - Implement timed and throttled operation of table analysis.
  - Implement half time direction reversal of analysis.
    = Currently TokuDB uses a PerconaFT cursor, beginning with the first/leftmost row in the table and progresses rightwards through the table until either all rows have been analyzed or the tokudb_analyze_time has been reached.
    = If a tokudb_analyze_time is in effect and the analysis has not reached the half way point of the index by the time tokudb_analyze_time/2 has been reached: stop the forward progress and restart the analyzis from the last/rightmost row in the table, progressing leftwards and keeping/adding to the status information accumulated from the first half of the scan.

* Clean TOKUDB_SHARE, begin morphing it into an InnoDB type data dictionary and C++ize it.

* Implement new tokudb.sys_vars mtr suite, adding basic tests for existing and new analyze variables.

* Implement new mtr tests for cardinality scaling, background job control and auto analyze.

* Correct existing TokuDB mtr results to reflect new cardinality and index selectivity.

New/altered TokuDB system variables:
------------------------------------
* tokudb_analyze_in_background : THDVAR, BOOL, def=FALSE
  - When TRUE, dispatches any ANALYZE TABLE job to a background process and returns immediately, otherwise ANALYZE_TABLE will run in foreground/client context.

* tokudb_analyze_mode : THDVAR, ENUM, def=TOKUDB_ANALYZE_STANDARD
  - Controls the function of ANALYZE TABLE, all control values are mutually exclusive:
    = TOKUDB_ANALYZE_CANCEL : Cancel any running or scheduled job on the specified table.
    = TOKUDB_ANALYZE_STANDARD : Use existing analysis algorithm.
    = TOKUDB_ANALYZE_RECOUNT_ROWS : Recount logical rows in table and update persistent count.

* tokudb_analyze_throttle : THDVAR, ULONGLONG, def=0, min=0, max=~0, 0=no throttle
  - Maximum number of keys to visit per second when performing either a TOKUDB_ANALYZE_STANDARD or TOKUDB_ANALYZE_RECOUNT_ROWS.

* tokudb_analyze_time : THDVAR, ULONGLONG, def=5, min=0, max=~0, 0=indefinite
  - Remains unchanged from current behavior, maximum time in seconds to execute analysis on each index.

* tokudb_auto_analyze : THDVAR, UINT, def=0, min=0, max=~0, 0=disabled
  - Percentage of table change as inserts/updates/deletes to trigger an analyze using the current session tokudb_analyze_in_background, tokudb_analyze_mode, tokudb_analyze_throttle, and tokudb_analyze_time settings.
  - If enabled and tokudb_analyze_in_background=FALSE, analysis will be performed directly within the client thread context that triggered the analysis.
  - Note: InnoDB enabled this functionality by default when they introduced it. Due to the potential unexpected new load it might place on a server, it is disabled by default in TokuDB.

* tokudb_cardinality_scale_percent : SYSVAR, INT, def=50, min=0, max=100
  - Percentage to scale table/index statistics when sending to the server to make an index appear to be either more or less unique than it actually is.
  - InnoDB has a hard coded scaling factor of 50%. So if a table of 200 rows had an index with 40 unique values, InnoDB would return 200/40/2 or 2 for the index. The new TokuDB formula is the same but factored differently to use percent, for the same table.index (200/40 * tokudb_cardinality_scale) / 100, for a scale of 50% the result would also be 2 for the index.

TOKUDB_ANALYZE_STANDARD vs. TOKUDB_ANALYZE_RECOUNT_ROWS
-------------------------------------------------------
* TOKUDB_ANALYSE_RECOUNT_ROWS is a new mechanism that is used to perform a logical recount of all rows in a table and persist that as the basis value for the table row estimate.
  This mode was added for tables that have been upgraded from an older version of TokuDB/PerconaFT that only reported physical row counts and never had a proper logical row count.
  Newly created tables/partitions will begin counting logical rows correctly from their creation and should not need to be recounted unless some odd edge condition causes the logical count to become inaccurate over time.
  This analysis mode has no effect on the table cardinality counts.
* TOKUDB_ANALYZE_STANDARD is the standard table cardinality analysis mode used to obtain cardinality statistics for a tables and its indices.

Auto analysis:
--------------
* A ‘delta’ value will be maintained in memory for each table.

* The delta value will NOT be persisted anywhere and will be reset to 0 on server start.

* The delta will be incremented for each insert/update/delete and ignore the impact of transactions (rollback specifically).

* When this delta value exceeds the tokudb_auto_analyze percentage of rows in the table an analysis is performed according to the current session's settings. Other analysis for this table will be disabled until this analysis completes. When this analysis completes, the delta is reset to 0 to begin recalculating table changes for the next potential analysis.

* For small ratios of table_rows / tokudb_auto_analyze, auto analysis will be run for almost every change.

* The trigger formula is: if (table_delta >= ((table_rows * tokudb_auto_analyze) / 100)) { analyze }

* If a user manually invokes an ANALYZE TABLE and tokudb_auto_analyze is enabled and there are no conflicting background jobs, the users ANALYZE TABLE will behave exactly as if the delta level has been exceeded in that the analysis is executed and delta reset to 0 upon completion.

Background jobs:
----------------
* TOKUDB_ANALYZE_STANDARD will take the currently set session values for tokudb_analyze_time, tokudb_analyze_in_background, and tokudb_analyze_throttle at the time of its scheduling, either via a user invoked ANALYZE TABLE or an auto schedule as a result of tokudb_auto_analyze threshold being hit. Changing the global or session instances of these values after scheduling will have no effect on the scheduled job.

* TOKUDB_ANALYZE_RECOUNT_ROWS will take the currently set session values for tokudb_analyze_in_background, and tokudb_analyze_throttle. Changing the global or session instances of these values after scheduling will have no effect on the job.

* Scheduled jobs that not been started or are currently running can be cancelled by setting the tokudb_analyze_mode to TOKUDB_ANALYZE_CANCEL and issuing an ANALYZE TABLE on the table for which you want to cancel all jobs for.

* There will be a single thread for handling these background jobs.

* Scheduled and running background jobs can be viewed by querying INFORMATION_SCHEMA.tokudb_background_job_status.

* Background jobs and schedule are transient in nature and will not be persisted anywhere. Any currently running job will be terminated on shutdown and all scheduled jobs will be forgotten about on server restart.

* There can not be two jobs on the same table same table scheduled or running at any one point in time. If a user manually invokes an ANALYZE TABLE that conflicts with either a pending or running job, the running job will be canceled and the users task will run immediately in the foreground.

* Any DDL will any running or pending background job on the table.

information_schema.tokudb_background_job_status:
------------------------------------------------
Table definition:
  id bigint(0) NOT NULL DEFAULT '0', # Simple monotonically incrementing job id, resets to 0 on server start.
  database_name varchar(256) NOT NULL DEFAULT '', # Database name
  table_name varchar(256) NOT NULL DEFAULT '', # Table name
  job_type varchar(256) NOT NULL DEFAULT '', # Type of job, either TOKUDB_ANALYZE_STANDARD or TOKUDB_ANALYZE_RECOUNT_ROWS
  job_params varchar(256) NOT NULL DEFAULT '', # Param values used by this job in string format. Ex: “TOKUDB_ANALYZE_DELETE_TIME=1.0; TOKUDB_ANALYZE_TIME=5; TOKUDB_ANALYZE_THROTTLE=2048;”
  scheduler varchar(32) NOT NULL DEFAULT '', # Either "USER" or "AUTO" to indicate if the job was explicitly scheduled by a user or if it was scheduled as an automatic trigger
  scheduled_time datetime NOT NULL DEFAULT '0000-00-00 00:00:00', # Time the job was scheduled
  started_time datetime DEFAULT NULL, # Time the job was started
  status varchar(256) DEFAULT NULL # Current job satus if running. Ex: "analyze table standard db.tbl.idx 3 of 5 50% rows 10% time scanning forward"
If upgrading an existing TokuDB installation, the TokuDB plugin must be disabled and re-enabled via ps_tokudb_admin to register this new plugin. Alternately the plugin may be enabled via `INSTALL PLUGIN tokudb_background_job_status SONAME 'ha_tokudb.so'`

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.