Column Compression with optional Predefined Dictionary

Registered by Yura Sorokin

https://github.com/percona/percona-server/pull/735

Column-level compression for Percona Server

Full spec:
https://docs.google.com/a/percona.com/document/d/1Dg5Cq1s5A2Pt9LpNY2mF6awTJqBBNW7DmUZ6arKvb-E/edit?usp=sharing

****************************************
Feature highlights:

SET @dictionary_data = 'one' 'two' 'three' 'four';

CREATE COMPRESSION_DICTIONARY numbers (@dictionary_data);

CREATE TABLE t1(
  id INT,
  a BLOB COLUMN_FORMAT COMPRESSED,
  b BLOB COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY numbers
) ENGINE=InnoDB;

SET @json_value =
  '[\n'
  ' {\n'
  ' "one" = 0,\n'
  ' "two" = 0,\n'
  ' "three" = 0,\n'
  ' "four" = 0\n'
  ' },\n'
  ' {\n'
  ' "one" = 0,\n'
  ' "two" = 0,\n'
  ' "three" = 0,\n'
  ' "four" = 0\n'
  ' },\n'
  ' {\n'
  ' "one" = 0,\n'
  ' "two" = 0,\n'
  ' "three" = 0,\n'
  ' "four" = 0\n'
  ' },\n'
  ' {\n'
  ' "one" = 0,\n'
  ' "two" = 0,\n'
  ' "three" = 0,\n'
  ' "four" = 0\n'
  ' }\n'
  ']\n'
;
INSERT INTO t1 VALUES(0, @json_value, @json_value);

SELECT COUNT(*) FROM t1 WHERE a = @json_value;
SELECT COUNT(*) FROM t1 WHERE b = @json_value;

DROP TABLE t1;
DROP COMPRESSION_DICTIONARY numbers;

****************************************
Limitations:
* 'COLUMN_FORMAT COMPRESSED' can only be specified for 'BLOB' / 'TEXT' / 'VARCHAR' / 'VARBINARY' columns (and their 'LONG' / 'MEDIUM' variants).
* Compressed columns are not allowed in the key list
* 'ALTER TABLE ... DISCARD/IMPORT TABLESPACE' is not supported for tables with compressed columns.

****************************************
Added two new server variables:
- innodb_compressed_columns_zip_level -
  "Compression level used for compressed columns. 0 is no compression,"
  "1 is fastest and 9 is best compression. Default is 6."
- innodb_compressed_columns_threshold -
  "Compress column data if its length exceeds this value. Default is 96"

****************************************
'mysqldump' extension:
1. By default, with no additional options, 'mysqldump' will generate 'compatible' SQL output.
All "/*!50633 COLUMN_FORMAT COMPRESSED */" and "/*!50633 COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY <dictionary> */" will be stripped out.

2. When a new option '--enable-compressed-columns' is specified, all "/*!50633 COLUMN_FORMAT COMPRESSED */" will be left intact and all "/*!50633 COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY <di
ctionary> */" will be transformed into "/*!50633 COLUMN_FORMAT COMPRESSED */". In this mode no "CREATE/DROP COMPRESSION_DICTIONARY ..." statements will ever be generated.

3. When a new option '--enable-compressed-columns-with-dictionaries' is specified, all compressed column attributes and compression dictionary references will be preserved.
Moreover, the following dictionary creation fragments will be added before "CREATE TABLE" statements which are going to use these dictionaries for the first time.
-------------------------------------------------
/*!50633 DROP COMPRESSION_DICTIONARY IF EXISTS <dictionary>; */
/*!50633 CREATE COMPRESSION_DICTIONARY <dictionary>(...); */
-------------------------------------------------

4. Two new options '--add-drop-compression-dictionary' / '--skip-add-drop-compression-dictionary' will control if "/*!50633 DROP COMPRESSION_DICTIONARY IF EXISTS <dictionary> */" from part from (3) will
l be skipped or not. By default, '--add-drop-compression-dictionary' mode will be used.

5. When both '--enable-compressed-columns-with-dictionaries' and '--tab=<dir>' (separate file for each table) options are specified, necessary compression dictionaries will be created in each output file using the following fragment (regardless of the value of '--add-drop-compression-dictionary' / '--skip-add-drop-compression-dictionary' option)
-------------------------------------------------
/*!50633 CREATE COMPRESSION_DICTIONARY IF EXISTS <dictionary>(...); */
-------------------------------------------------

****************************************
Downgrade scenario
If it is necessary to perform Percona Server downgrade from a version >= 5.6.33 to a version <5.6.33 and user databases have one or more table with compressed columns, there are two options to do this safely:
1. Use 'mysqldump' in 'compatible' mode (no compressed columns extensions must be specified).
2. Manually remove COMPRESSED attribute from all columns which have it via
"ALTER TABLE ... MODIFY ... COLUMN_FORMAT DEFAULT" before updating server binaries.
In this case, the downgraded server can start safely with old data files.

Blueprint information

Status:
Complete
Approver:
None
Priority:
High
Drafter:
Yura Sorokin
Direction:
Approved
Assignee:
Yura Sorokin
Definition:
Approved
Series goal:
Accepted for 5.6
Implementation:
Implemented
Milestone target:
milestone icon 5.6.33-79.0
Started by
Yura Sorokin
Completed by
Yura Sorokin

Related branches

Sprints

Whiteboard

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.