Bulk Unload

Registered by khaled Bouaziz on 2014-10-13

Bulk Unload is a tool to unload data from trafodion tables into an hdfs location specified by the user. Extracted data can be either compressed or uncompressed based on what the user chooses.

Blueprint information

Status:
Started
Approver:
Suresh Subbiah
Priority:
Medium
Drafter:
khaled Bouaziz
Direction:
Approved
Assignee:
khaled Bouaziz
Definition:
Approved
Series goal:
None
Implementation:
Beta Available
Milestone target:
None
Started by
Suresh Subbiah on 2014-11-19

Related branches

Sprints

Whiteboard

Unload commands can be issued from any existing SQL client such as sqlci or TRAFCI. Data will be extracted to a HDFS location.

UNLOAD [WITH <options-list>] INTO <target-location> <query>;

<target-location> : HDFS folder to which extracted data is written. The name of folder must be enclosed in single quotes. The folder name must be given as a full pathname and not as a relative path. The Trafodion user should have write permissions on this folder. If run in parallel multiple files will be produced under <target-location>. The number of files created will equal the number of ESPs.
<options-list> : Unload options. The WITH clause is optional. If an option is specified more than once, an error with SQLCODE -4489 will be raised.
<options-list> : <option> <options-list>
<option> : DELIMITER ‘<delimiter-literal>’ | <delimiter-ascii-value>
                    RECORD_SEPARATOR ‘<separator-literal>’ | <separator-ascii-value>
                    NULL_STRING ‘<string-literal>’
                    PURGEDATA FROM TARGET
                    COMPRESSION GZIP
                    MERGE FILE <merged_file-path> [OVERWRITE]
                    NO OUTPUT

- The DELIMITER option can be used to specify one character that separates consecutive fields in the same row. If this option is not specified the character '|' will be used. The delimiter can also be specified as an ASCII value. Valid values range from 1 to 255. The value must be specified in decimal notation. In later releases we will support specifying the delimiter value in hexadecimal or octal notation. When the ASCII value is specified, the delimiter can only be one character wide. No quotes should be used when the delimiter is specified through an ASCII value. Escape sequences such as '\a', '\b', '\f', '\n', '\r', '\t', or '\v' are also allowed.
- The RECORD_SEPARATOR option can be used to specify the character that will be used to separate consecutive records or rows in the output file. The default value is a newline character. The record_separator can also be specified as an ascii value. Valid values range from 1 to 255. The value must be specified in decimal notation. In later releases we will support specifying the record_separator value in hexadecimal or octal notation. When the ASCII value is specified, the record_separator can only be one character wide. No quotes should be used when the record_separator is specified through an ASCII value. Escape sequences such as '\a', '\b', '\f', '\n', '\r', '\t', or '\v' are also allowed.
- The NULL_STRING option can be used to specify the string that will be used to indicate a NULL value. The default value is the empty string ''.
- PURGEDATA FROM TARGET When this option specified the files under the <hdfs-folder> folder are deleted
- COMPRESSION GZIP: when this option is specified the Gzip compression is used. the compression takes place in the extract node and data is written to disk in compressed format (Gzip i the only supported compression for now)
- MERGE FILE <merged-file-path> [OVERWRITE]: When this option is specified the files unloaded are merged into one single file <merged-file-path>. if compression is specified the data unloaded in compressed format and the merged file will be in compressed format also. It he optional OVERWRITE keyword is specified the file is overwritten other an error is raised if the file already exists
- NO OUTPUT: If this option is specified then no status message is displayed
- <query>:a query that can be a simple query or complex one that contains GROUP BY, JOIN or UNION clauses. The ORDER BY Clause is not currently supported and using will produce a syntax error.

Example
>>UNLOAD
+>WITH PURGEDATA FROM TARGET
+>MERGE FILE 'merged_customer_demogs.gz' OVERWRITE
+>COMPRESSION GZIP
+>INTO '/bulkload/customer_demographics'
+>select * from trafodion.hbase.customer_demographics
+><<+ cardinality 10e10 >>;
Task: UNLOAD Status: Started
Task: EMPTY TARGET Status: Started
Task: EMPTY TARGET Status: Ended ET: 00:00:00.014
Task: EXTRACT Status: Started
       Rows Processed: 200000
Task: EXTRACT Status: Ended ET: 00:00:04.743
Task: MERGE FILES Status: Started
Task: MERGE FILES Status: Ended ET: 00:00:00.063

--- 200000 row(s) unloaded.

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.