Drizzledump support/conversion for MySQL

Registered by Andrew Hutchings

Modify drizzledump to connect to MySQL servers and obtain the dump in a drizzle compatible format.

1. Have drizzledump connect to the drizzle server to copy MySQL->drizzle
2. Convert types using regex on SHOW CREATE TABLE

Other things to consider:
1. Verify copy is correct
2. Have the parser accept types and convert them
3. Using MySQL's Information Schema

Blueprint information

Needs approval
Andrew Hutchings
Series goal:
Accepted for 7.0
Milestone target:
milestone icon 2010-09-27
Started by
Andrew Hutchings
Completed by
Andrew Hutchings


I plan to do this in two phases. First to modify drizzledump to detect and parse MySQL tables/data, second to connect to a remote server and push the content.

As a possible future expantion of this it could easily be made multi-threaded in a similar way to mydumper.

Phase 1:
MySQL detect/convert support

This will be a conversion broken down into several stages, beginning with the MySQL detection. We will only work with MySQL 5.1 onwards due to information_schema tables provided. To detect MySQL version we will use server_version_string() and parse the results. If for some reason this fails (such as a hacked/modified MySQL or Drizzle) then a force option can be used --force-server-type=[mysql|drizzle].

If MySQL is detected (or forced) the following process will be used:

2. Use "set names" to set to utf8 so output from tables in other charsets work.
3. Use information_schema tables and columns to obtain the table schemas
3a. convert to drizzle types, output stderr warning if default date of 0000-00-00 and do the conversion
3b. engine conversion. MyISAM -> InnoDB
3c. data conversion of date (0000-00-00) and time data.
4. Use information_schema table_constraints to get foreign key data
5. Dump tables schemas to output
6. Dump table data to output, convert dates if needed

During step 3 the following conversions are required:

MyISAM -> InnoDB
FullText -> drop it (with stderr warning)
int unsigned -> bigint
tinyint -> int
smallint -> int
mediumint -> int
tinytext -> text
mediumtext -> text
longtext -> text
tinyblob -> blob
mediumblob -> blob
longblob -> blob
time -> int (of seconds)
year -> int
set -> text (can't guess length so safer than varchar, may need to normalise in later versions)
date/datetime default 0000-00-00 -> default NULL *(It currently appears that ALL date columns have their DEFAULT set to NULL on migration)
date/datetime NOT NULL columns -> NULL
any date data containing 0000-00-00 -> NULL

Phase 2:
Setting the destination

The whole idea of this is that drizzledump will connect to a drizzle server as a destination and do all the conversion on the fly. Therefore I propose the following option:


The "stdout" option will be the default, "files" will be for a file-per-table output and "database" will be the database injector output.

The following option will only work with "files" to set a path for the output:


These options will be to set the destination drizzle database when "database" is used:

--destination-database=... (for single database)

I would like the destination stuff to use a separate thread using some form of job buffer so the reader is not hanging around waiting for the writer.

The future:
There are several possible things we could do after this:

1. After phase 2 the framework is already there to talk directly to a drizzle database server as an output. We could use this tool to read from a file instead of another database.
2. Multi-threaded support. We could add a pool of thread to grab from multiple tables simultaneously from multiple connections.
3. As an extension of 1, import dump files from other databases (ie. Postgres).


Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.