TokuDB - Merge status files into a single file

Registered by George Ormond Lorch III on 2015-10-20

Tracking here

For each table in TokuDB, there is 1 status/metadata database, 1 primary/main database and 1 database for each index. I propose an in place/dirty merging of theses scattered status files into a single file with the status/metadata for all tables.

Blueprint information

George Ormond Lorch III
Needs approval
George Ormond Lorch III
Series goal:
Accepted for 5.7
Milestone target:
Completed by
George Ormond Lorch III on 2017-03-22

Related branches



The TokuDB status files contain the following information:
- old version
- capabilities flags
- max auto increment
- 'create table' auto increment
- key name
- .frm file data copy
- new version
- cardinality statistics

These files are PerconaFT files and this data is rather small for most tables, usually well under 8K.
For systems with large numbers of tables it can create a bloat in the file system.
Due to the fact that these files are extremely small and contain small amounts of frequently accessed data, they may also cause unnecessary problems for the PerconaFT library as it tries to maintain this mix of tiny nodes and huge nodes in the cachetable.

There are also several unsubstantiated reports of these status files going missing. There is no direct indication of this being possible within the code, but due to the way these files are handled transactionally I can imagine that it is possible for a crash at just the right time to destroy these files.

I believe that by combining this data into a single PerconaFT file for all tables, we can increase the data locality and selectability within the cachetable and reduce the number of disk files required for TokuDB.

* The current FT key structure for these files is a simple uint `id` or enum of the fields defined above.

* I propose changing the FT key structure to be character array of `id::database::table`

* The current code for managing this data is loosely scattered around the TokuDB code, as part of this I would encapsulate all of this status/metadata functionality into a single class for access/mutation.

* Reading data - when any field values are to be read, we will first check to see if a status file exists, if so, it will be noted for later and the data read from the status file; if not, the data will be read from the new combined metadata file.

* Writing data - when any fields are written or updated, we will also check to see if the data was originally read from an individual status file, if so, the entire contents of the individual status file will be written to the new master metadata file and the individual status file removed. Over time, this should migrate all status data into the master metadata file without much visible overhead.

* Upgrades are handled in place as described above.

* Downgrades would not be 'directly' possible but a very simple tool could be provided that would perform some basic maintenance such as a downgrade split, export as CSV, import, gather and merge into a new metadata file, etc...

* This metadata could also become the basis for the implementation of InnoDB like information_schema.innodb_sys_* tables.


Work Items

This blueprint contains Public information 
Everyone can see this information.