Add SQL syntax to pre-split unsalted tables into regions

Registered by Hans Zeller

Currently, Trafodion creates tables as a single-region HBase table, unless the table is salted. Salted tables have one region per salt bucket. We would like to add SQL syntax to allow pre-splitting of unsalted tables as well. We would like to offer two new table options, SPLIT BY and PARTITION BY. Both allow specification of split keys. SPLIT BY will simply pre-split the table, with no special split policy. PARTITION BY will add a prefix split policy that ensures that all rows with common PARTITION BY column values will remain in the same table.

Example:

create table lineitem(
   l_orderkey int not null,
   l_linenum int not null,
   ...
   primary key (l_orderkey, l_linenum))
PARTITION BY (l_orderkey)
(add first key (10000),
 add first key (20000)
);

This will create three regions containing for values [<min>, 10000), [10000, 20000) and [20000, <max>) it will add a prefix split policy for a 4 byte prefix, the length of a non-nullable INT column. HBase will still be able to split the regions further, but we will have a guarantee that all rows for a given l_orderkey are located in the same region.

If SPLIT BY is used instead of PARTITION BY, the same regions will be created, but no custom split policy will be added. Note that the PARTITION BY / SPLIT BY column(s) have to be a prefix of the clustering key of the table. PARTITION BY and SPLIT BY are not allowed for salted tables. We may or may not support pre-splitting of divisioned tables in the first release.

Blueprint information

Status:
Not started
Approver:
Suresh Subbiah
Priority:
High
Drafter:
Hans Zeller
Direction:
Approved
Assignee:
Hans Zeller
Definition:
Review
Series goal:
None
Implementation:
Unknown
Milestone target:
milestone icon r2.0

Related branches

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.