Drizzledump support/conversion for MySQL

Registered by Andrew Hutchings

Modify drizzledump to connect to MySQL servers and obtain the dump in a drizzle compatible format.

1. Have drizzledump connect to the drizzle server to copy MySQL->drizzle
2. Convert types using regex on SHOW CREATE TABLE

Other things to consider:
1. Verify copy is correct
2. Have the parser accept types and convert them
3. Using MySQL's Information Schema

Blueprint information

Status:
Complete
Approver:
None
Priority:
Medium
Drafter:
None
Direction:
Needs approval
Assignee:
Andrew Hutchings
Definition:
Approved
Series goal:
Accepted for 7.0
Implementation:
Implemented
Milestone target:
milestone icon 2010-09-27
Started by
Andrew Hutchings
Completed by
Andrew Hutchings

Whiteboard

I plan to do this in two phases. First to modify drizzledump to detect and parse MySQL tables/data, second to connect to a remote server and push the content.

As a possible future expantion of this it could easily be made multi-threaded in a similar way to mydumper.

Phase 1:
MySQL detect/convert support

This will be a conversion broken down into several stages, beginning with the MySQL detection. We will only work with MySQL 5.1 onwards due to information_schema tables provided. To detect MySQL version we will use server_version_string() and parse the results. If for some reason this fails (such as a hacked/modified MySQL or Drizzle) then a force option can be used --force-server-type=[mysql|drizzle].

If MySQL is detected (or forced) the following process will be used:

1. START TRANSACTION WITH CONSISTENT SNAPSHOT (or LOCK TABLES?)
2. Use "set names" to set to utf8 so output from tables in other charsets work.
3. Use information_schema tables and columns to obtain the table schemas
3a. convert to drizzle types, output stderr warning if default date of 0000-00-00 and do the conversion
3b. engine conversion. MyISAM -> InnoDB
3c. data conversion of date (0000-00-00) and time data.
4. Use information_schema table_constraints to get foreign key data
5. Dump tables schemas to output
6. Dump table data to output, convert dates if needed

During step 3 the following conversions are required:

MyISAM -> InnoDB
FullText -> drop it (with stderr warning)
int unsigned -> bigint
tinyint -> int
smallint -> int
mediumint -> int
tinytext -> text
mediumtext -> text
longtext -> text
tinyblob -> blob
mediumblob -> blob
longblob -> blob
time -> int (of seconds)
year -> int
set -> text (can't guess length so safer than varchar, may need to normalise in later versions)
date/datetime default 0000-00-00 -> default NULL *(It currently appears that ALL date columns have their DEFAULT set to NULL on migration)
date/datetime NOT NULL columns -> NULL
any date data containing 0000-00-00 -> NULL
enum-> DEFAULT NULL

Phase 2:
Setting the destination

The whole idea of this is that drizzledump will connect to a drizzle server as a destination and do all the conversion on the fly. Therefore I propose the following option:

--destination=[stdout|files|database]

The "stdout" option will be the default, "files" will be for a file-per-table output and "database" will be the database injector output.

The following option will only work with "files" to set a path for the output:

--destination-path=...

These options will be to set the destination drizzle database when "database" is used:

--destination-host=...
--destination-port=...
--destination-user=...
--destination-password=...
--destination-database=... (for single database)

I would like the destination stuff to use a separate thread using some form of job buffer so the reader is not hanging around waiting for the writer.

The future:
There are several possible things we could do after this:

1. After phase 2 the framework is already there to talk directly to a drizzle database server as an output. We could use this tool to read from a file instead of another database.
2. Multi-threaded support. We could add a pool of thread to grab from multiple tables simultaneously from multiple connections.
3. As an extension of 1, import dump files from other databases (ie. Postgres).

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.