trafodion and snapshot scan integration

Registered by khaled Bouaziz

This blueprint describes the integration of Trafodion with the HBase snapshot scan (TableSnapshotScanner class) which performs a scan over snapshot files. Using this class requires a temporary space where the snapshot references are copied.
The snapshot scan was first implemented in Trafodion to work with Bulk unload and then more changes were made to make it work independently. The current implementation integrates the snapshot scan with Trafodion and also sets up the temporary space and folders before running the query. Once the query is completed the temporary space and folders are cleaned up.

In order to use snapshot scan with Trafodion we need to issue the below CQDs:

*TRAF_TABLE_SNAPSHOT_SCAN:
This CQD can be set to :
**NONE: (default)Snapshot scan is disabled and regular scan is used ,
**SUFFIX: Snapshot scan is enabled for the bulk unload.
**LATEST: Snapshot Scan is enabled independently from bulk unload and the latest snapshot is used if it exists. If no snapshot exists the regular scan is used and a warning is issued. For this phase of the project the user needs to create the snapshots using hbase shell or other tools. And in the next phase of the project new commands to create, delete and manage snapshots will be add. Snapshots are cached in NATable to optimize compilation time. when the user sets TRAF_TABLE_SNAPSHOT_SCAN to LATEST we flush the metadata and then we set the caching back to on so that metadata get cached again. If newer snapshots are created after setting the CDQ they won't be seen if they are already cached unless the user issue a command/cqd to invalidate or flush the cache. One way for doing that can be to issue "cqd TRAF_TABLE_SNAPSHOT_SCAN 'latest';" again
There are some cases where snapshot scan is not supported yet in spite of setting TRAF_TABLE_SNAPSHOT_SCAN CQD to 'LATEST' . These cases are:
*** There is no snapshot associated with the table we are trying to scan.
***The optimizer chooses to use an index table instead of the the base table.
***The table is smaller than the threshold defined by TRAF_TABLE_SNAPSHOT_SCAN_TABLE_SIZE_THRESHOLD CQD.

*TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX:
  This CQD is used with bulk unload and its value is used to build the snapshot name as the table name followed by the suffix string
*TRAF_TABLE_SNAPSHOT_SCAN_TABLE_SIZE_THRESHOLD
  When the estimated table size is below the threshold (in MBs) defined by this CQD the regular scan is used instead of snapshot scan and a warning is issued. This CQD does not apply to bulk unload.
*TRAF_TABLE_SNAPSHOT_SCAN_TIMEOUT
  The timeout beyond which we give up trying to create the snapshot scanner
*TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION
  Location for temporary links and files produced by snapshot scan. Its default value is curretly set to '/bulkload/'.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
khaled Bouaziz
Direction:
Needs approval
Assignee:
khaled Bouaziz
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.