Trafodion

Table group for data co-location and bridge the gap between objects and relational tables

Registered by Haifeng Li on 2015-05-06

Data co-location is an efficient way to speed up joins in MPP relational database. If the schema is designed properly, joins could be mostly local. With HBase column family, it may be possible to support this feature effectively. If (a big if) hbase can support a large number of column families efficiently, this feature can also bridge the gap between objects and relational tables.

Suppose we have two tables:

CREATE TABLE person (
  person_id char(16) NOT NULL,
  name varchar(64),
  PRIMARY KEY (id)
);

CREATE TABLE phone (
  person_id char(16) NOT NULL REFERENCE person(person_id),
  phone_id char(4) NOT NULL,
  number char(16),
  phone_type char(8),
  PRIMARY KEY (person_id, phone_id)
);

If we have a table group person_object including table person and phone, we may map these two tables to one hbase table of two column families, where row key is person(id).

The mapping of person to one column family is as usual. On the other hand, in the mapping of phone, we use phone_id-1 as the column qualifier for column "number" and phone_id-2 for column "phone_type". Note that multiple rows of relational table "phone" will resident in the same row of hbase column family as long as they share the same person_id. Queries on each table is as efficient as before. Queries of joining these two tables will be much more efficient:

1. Data are always co-located on the same region, which avoids moving data around.
2. No traditional join operations are needed at all.
3. We could add a hidden counter column in the column family of person, which counts the number of phones for that person. This makes COUNT(*) operator fast.

HBase currently does not do well with anything above two or three column families (http://hbase.apache.org/book.html#number.of.cfs). The reason is that flushing and compactions are done on a per Region basis so if one column family is carrying the bulk of the data bringing on flushes, the adjacent families will also be flushed even though the amount of data they carry is small. When many column families exist the flushing and compaction interaction can make for a bunch of needless i/o.

To make the idea of table group work well in practice, we have to take an initiative to change flushing and compaction to work on a per column family basis.

Note that this proposal is to improve the join between two large tables. In case of join between large and small tables, hash join will always enjoy a fast interconnection by broadcasting the small table.

The idea of multiple column families is also helpful for separate hot/cold columns in the same table. Suppose we have a table with hot but small columns and also a cold LOB column. Put the LOB column into a separate column family will make the query of hot columns much more efficient.