Mutlti-tenancy

Registered by Haifeng Li

In the era of SaaS and cloud computing, mutli-tenancy will be the new normal. One simple solution for multi-tenancy is to create a tenant_id column for each table. However, it is troublesome, especially for large schema with thousands tables. It is even worse in semantics. Usually tenant_id is a natural choice to shard/partition tables. Therefore, it should be part of primary key (and combined with "salt by", "partition by", or "division by"). However, tenant_id should not logically be part of primary key as it is only for sharding and security, not part of business logic. Sometimes it also creates problems in DML. For example, because of tenant_id as part of primary key, we will have to add UNIQUE constraint to the true logic primary key (e.g. person_id). UPSERT will fail because of UNIQUE check if the logic primary key person_id already exists in current design.

One possible approach is to define a tenant_id in “CREATE SCHEMA” (or “CREATE TABLE”. But given thousand tables, we would like both “CREATE SCHEMA” and "CREATE TABLE" to support it). This hidden column (just like ROWNUM in Oracle) will prefix to the row key for all tables for sharding and also security. For all DML, SQL compiler will automatically insert a predict on tenant_id. The tenant_id can be set on connection or per transaction. It should be also okay to query tenant_id.

Because of this request, we would also like that CREATE SCHEMA supports the clause "SALT BY", "PARTITION BY", "DIVISION BY", and "HBASE_OPTIONS". They will be default for all tables in the schema while can be override for each table.

For example,

CREATE SCHEMA myschema (
  tenant_id char(16)
)
  MULTI_TENANT BY (tenant_id)
  HBASE_OPTIONS
  (
    DATA_BLOCK_ENCODING = 'FAST_DIFF',
    COMPRESSION = 'SNAPPY',
    SPLIT_POLICY = 'org.apache.hadoop.hbase.regionserver.KeyPrefixRegionSplitPolicy',
    PREFIX_LENGTH_KEY = '16'
  );

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
Haifeng Li
Direction:
Needs approval
Assignee:
None
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.