Drizzledump support/conversion for MySQL
Modify drizzledump to connect to MySQL servers and obtain the dump in a drizzle compatible format.
1. Have drizzledump connect to the drizzle server to copy MySQL->drizzle
2. Convert types using regex on SHOW CREATE TABLE
Other things to consider:
1. Verify copy is correct
2. Have the parser accept types and convert them
3. Using MySQL's Information Schema
- Needs approval
- Andrew Hutchings
- Series goal:
- Accepted for 7.0
- Milestone target:
- Started by
- Andrew Hutchings on 2010-09-10
- Completed by
- Andrew Hutchings on 2010-09-25
I plan to do this in two phases. First to modify drizzledump to detect and parse MySQL tables/data, second to connect to a remote server and push the content.
As a possible future expantion of this it could easily be made multi-threaded in a similar way to mydumper.
MySQL detect/convert support
This will be a conversion broken down into several stages, beginning with the MySQL detection. We will only work with MySQL 5.1 onwards due to information_schema tables provided. To detect MySQL version we will use server_
If MySQL is detected (or forced) the following process will be used:
1. START TRANSACTION WITH CONSISTENT SNAPSHOT (or LOCK TABLES?)
2. Use "set names" to set to utf8 so output from tables in other charsets work.
3. Use information_schema tables and columns to obtain the table schemas
3a. convert to drizzle types, output stderr warning if default date of 0000-00-00 and do the conversion
3b. engine conversion. MyISAM -> InnoDB
3c. data conversion of date (0000-00-00) and time data.
4. Use information_schema table_constraints to get foreign key data
5. Dump tables schemas to output
6. Dump table data to output, convert dates if needed
During step 3 the following conversions are required:
MyISAM -> InnoDB
FullText -> drop it (with stderr warning)
int unsigned -> bigint
tinyint -> int
smallint -> int
mediumint -> int
tinytext -> text
mediumtext -> text
longtext -> text
tinyblob -> blob
mediumblob -> blob
longblob -> blob
time -> int (of seconds)
year -> int
set -> text (can't guess length so safer than varchar, may need to normalise in later versions)
date/datetime default 0000-00-00 -> default NULL *(It currently appears that ALL date columns have their DEFAULT set to NULL on migration)
date/datetime NOT NULL columns -> NULL
any date data containing 0000-00-00 -> NULL
enum-> DEFAULT NULL
Setting the destination
The whole idea of this is that drizzledump will connect to a drizzle server as a destination and do all the conversion on the fly. Therefore I propose the following option:
The "stdout" option will be the default, "files" will be for a file-per-table output and "database" will be the database injector output.
The following option will only work with "files" to set a path for the output:
These options will be to set the destination drizzle database when "database" is used:
I would like the destination stuff to use a separate thread using some form of job buffer so the reader is not hanging around waiting for the writer.
There are several possible things we could do after this:
1. After phase 2 the framework is already there to talk directly to a drizzle database server as an output. We could use this tool to read from a file instead of another database.
2. Multi-threaded support. We could add a pool of thread to grab from multiple tables simultaneously from multiple connections.
3. As an extension of 1, import dump files from other databases (ie. Postgres).
* Blueprints in grey have been implemented.