Monasca Cassandra Time Series DB

Registered by Deklan Dieterly

Monasca Cassandra Time Series DB

Blueprint information

Status:
Started
Approver:
Roland Hochmuth
Priority:
High
Drafter:
Deklan Dieterly
Direction:
Approved
Assignee:
Shinya Kawabata
Definition:
Approved
Series goal:
Accepted for trunk
Implementation:
Needs Code Review
Milestone target:
None
Started by
Roland Hochmuth

Related branches

Sprints

Whiteboard

This blueprint has been implemented - see 20337572e309439061ff8dc0bdc341d8656858fa

-----------

Initial proposed data model for the Monasca Cassandra Time Series DB.

Premise:

All unique sets of dimensions/metric names are hashed using SHA-1 hash function to derive a unique identifier. This identifies a unique “resource” that is being measured.

There will be three tables.

I. A lookup table comprised of the following:

    Tenant ID
    Region
    Dimensions/metric name sha-1 hash
    set of dimension names/values, metric name

Partition key is tenant id, region, dimensions/metric name sha-1 hash.

Inserts into this table will be insert/update depending on the existence of the value. Cassandra will handle the insert/update automatically. So, there will be one row and one column for each unique set of dimensions/metric name per tenant per region.

There will be an index on the map. The index allows queries of the form where metric_map['dimension_name_1'] = 'value_1' and metric_map['dimension_name_2'] = 'value_2'… and metric_map contains key 'dimension_name_3'

The identifier 'name' will need to be unique and no other dimension name will be able to use the identifier 'name'. We could make this a very strange identifier like 'xxx_name_xxx' to allow a dimension to have the identifier 'name'.

The set will be formed by URL encoding name/value pairs and concatenating them with an equals sign ("=").

This allows fast look ups of unique resources given a partial map of dimension name/values and a metric name. The results from this query will be used to query the measurements table.

II. A measurements table comprised of the following:

    Tenant ID
    Region
    Dimensions/metric name hash
    Time_stamp
    Value
    value_meta

Partitions key is tenant id, region, dimensions/metric name sha-1 hash
Primary key is tenant id, region, dimensions/metric name hash, time_stamp

Another approach would be to leave the metric name out of the hash. Then the metric name would be a column in the measurement table. However, this could lead to too much data in one row.

III. An alarm state history table comprised of the following:

    Tenant ID
    Region
    Alarm ID
    Metric
    New state
    Old state
    Reason
    Reason data
    Sub alarms
    Time_stamp

Example Schema

drop table if exists monasca.metric_map;

drop table if exists monasca.measurements;

drop table if exists monasca.alarm_state_history;

drop schema if exists monasca;

create schema monasca
with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

use monasca;

create table monasca.metric_map (
    tenant_id text,
    region text,
    metric_hash blob,
    metric_map map<text, text>,
primary key ((tenant_id, region), metric_hash)
);

create index on monasca.metric_map (entries(metric_map));

create table monasca.measurements (
    tenant_id text,
    region text,
    metric_hash blob,
    time_stamp timestamp,
    value float,
    value_meta text,
primary key ((tenant_id, region, metric_hash), time_stamp)
);

create table monasca.alarm_state_history (
    tenant_id text,
    region text,
    alarm_id text,
    metrics text,
    new_state text,
    old_state text,
    reason text,
    reason_data text,
    sub_alarms text,
    time_stamp timestamp,
primary key ((tenant_id, region), alarm_id, time_stamp)
);

Example query to get a list of metric_hash.

select * from monasca.metric_map where tenant_id = 'hpe' and region = 'uswest' and metric_map['host'] = 'host1';

Example with data.

 insert into monasca.metric_map(tenant_id, region, metric_hash, metric_map) values ('hpe', 'uswest', 0xaa, {'name':'cpu', 'host':'host1'});

 insert into monasca.metric_map(tenant_id, region, metric_hash, metric_map) values ('hpe', 'uswest', 0xbb, {'name':'cpu', 'host':'host2'});

select * from monasca.metric_map;

  tenant_id | region | metric_hash | metric_map
-----------+--------+-------------+----------------------------
       hpe | uswest | 0xaa | {'host': 'host1', 'name': 'cpu'}
       hpe | uswest | 0xbb | {'host': 'host2', 'name': 'cpu'}

(2 rows)

select * from monasca.metric_map where tenant_id = 'hpe' and region = 'uswest' and metric_map['host'] = 'host1';

 tenant_id | region | metric_hash | metric_map
-----------+--------+-------------+----------------------------
       hpe | uswest | 0xaa | {'host': 'host1', 'name': 'cpu'}

(1 rows)

 select * from monasca.metric_map where tenant_id = 'hpe' and region = 'uswest' and metric_map['host'] = 'host1' and metric_map['name'] = 'cpu' allow filtering;

 tenant_id | region | metric_hash | metric_map
-----------+--------+-------------+----------------------------
       hpe | uswest | 0xaa | {'host': 'host1', 'name': 'cpu'}

(1 rows)

"allow filtering" is needed on the query if there are more than one contains clauses.

See http://docs.datastax.com/en/cql/3.0/cql/ddl/ddl_using_multiple_indexes.html for more info.

cqlsh:monasca> select * from monasca.measurements where tenant_id = 'hpe' and metric_hash in (0xaa);

 tenant_id | metric_hash | time_stamp | value | value_meta
-----------+-------------+--------------------------+-------+------------
       hpe | 0xaa | 1970-01-17 18:42:41+0000 | 0.99 | {}
       hpe | 0xaa | 1970-01-17 18:42:41+0000 | 0.99 | {}
       hpe | 0xaa | 1970-01-17 18:42:41+0000 | 0.79 | {}
       hpe | 0xaa | 1970-01-17 18:42:41+0000 | 0.69 | {}

(4 rows)

cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xaa, 1449761730, 1.01, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xaa, 1449761730, 1.01, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xaa, 1449761730, 1.02, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xaa, 1449761730, 1.03, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xaa, 1449761730, 1.04, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xbb, 1449761730, 1.11, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xbb, 1449761730, 1.12, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xbb, 1449761730, 1.13, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xbb, 1449761730, 1.14, '{}');
cqlsh> select * from monasca.measurements;

 tenant_id | region | metric_hash | time_stamp | value | value_meta
-----------+--------+-------------+--------------------------+-------+------------
       hpe | uswest | 0xaa | 1970-01-17 18:42:41+0000 | 1.04 | {}
       hpe | uswest | 0xbb | 1970-01-17 18:42:41+0000 | 1.14 | {}

(2 rows)

Only 2 rows (columns) are in the table because the same date was used. If a column already exists, then Cassandra will update the values for that column.

cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xaa, 1449761730001, 1.01, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xaa, 1449761730002, 1.02, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xaa, 1449761730003, 1.03, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xaa, 1449761730004, 1.04, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xbb, 1449761730011, 1.11, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xbb, 1449761730012, 1.12, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xbb, 1449761730013, 1.13, '{}');
cqlsh> insert into monasca.measurements (tenant_id, region, metric_hash, time_stamp, value, value_meta) values ('hpe', 'uswest', 0xbb, 1449761730014, 1.14, '{}');
cqlsh> select * from monasca.measurements;

 tenant_id | region | metric_hash | time_stamp | value | value_meta
-----------+--------+-------------+--------------------------+-------+------------
       hpe | uswest | 0xaa | 1970-01-17 18:42:41+0000 | 1.04 | {}
       hpe | uswest | 0xaa | 2015-12-10 15:35:30+0000 | 1.01 | {}
       hpe | uswest | 0xaa | 2015-12-10 15:35:30+0000 | 1.02 | {}
       hpe | uswest | 0xaa | 2015-12-10 15:35:30+0000 | 1.03 | {}
       hpe | uswest | 0xaa | 2015-12-10 15:35:30+0000 | 1.04 | {}
       hpe | uswest | 0xbb | 1970-01-17 18:42:41+0000 | 1.14 | {}
       hpe | uswest | 0xbb | 2015-12-10 15:35:30+0000 | 1.11 | {}
       hpe | uswest | 0xbb | 2015-12-10 15:35:30+0000 | 1.12 | {}
       hpe | uswest | 0xbb | 2015-12-10 15:35:30+0000 | 1.13 | {}
       hpe | uswest | 0xbb | 2015-12-10 15:35:30+0000 | 1.14 | {}

(10 rows)

Now when using different dates with milliseconds we see that all the expected rows (columns) are there.

cqlsh> select * from monasca.measurements where tenant_id = 'hpe' and region = 'uswest' and metric_hash in (0xaa);

 tenant_id | region | metric_hash | time_stamp | value | value_meta
-----------+--------+-------------+--------------------------+-------+------------
       hpe | uswest | 0xaa | 1970-01-17 18:42:41+0000 | 1.04 | {}
       hpe | uswest | 0xaa | 2015-12-10 15:35:30+0000 | 1.01 | {}
       hpe | uswest | 0xaa | 2015-12-10 15:35:30+0000 | 1.02 | {}
       hpe | uswest | 0xaa | 2015-12-10 15:35:30+0000 | 1.03 | {}
       hpe | uswest | 0xaa | 2015-12-10 15:35:30+0000 | 1.04 | {}

(5 rows)

cqlsh> select * from monasca.measurements where tenant_id = 'hpe' and region = 'uswest' and metric_hash in (0xaa, 0xbb);

 tenant_id | region | metric_hash | time_stamp | value | value_meta
-----------+--------+-------------+--------------------------+-------+------------
       hpe | uswest | 0xaa | 1970-01-17 18:42:41+0000 | 1.04 | {}
       hpe | uswest | 0xaa | 2015-12-10 15:35:30+0000 | 1.01 | {}
       hpe | uswest | 0xaa | 2015-12-10 15:35:30+0000 | 1.02 | {}
       hpe | uswest | 0xaa | 2015-12-10 15:35:30+0000 | 1.03 | {}
       hpe | uswest | 0xaa | 2015-12-10 15:35:30+0000 | 1.04 | {}
       hpe | uswest | 0xbb | 1970-01-17 18:42:41+0000 | 1.14 | {}
       hpe | uswest | 0xbb | 2015-12-10 15:35:30+0000 | 1.11 | {}
       hpe | uswest | 0xbb | 2015-12-10 15:35:30+0000 | 1.12 | {}
       hpe | uswest | 0xbb | 2015-12-10 15:35:30+0000 | 1.13 | {}
       hpe | uswest | 0xbb | 2015-12-10 15:35:30+0000 | 1.14 | {}

(10 rows)

If we turn on tracing ('tracing on') we can see that the index is used for lookups using the dimensions/metric name set.

cqlsh> select * from monasca.metric_map where tenant_id = 'hpe' and region = 'uswest' and metric_map['name'] = 'cpu' and metric_map['host'] = 'host1' allow filtering;

 tenant_id | region | metric_hash | metric_map
-----------+--------+-------------+----------------------------
       hpe | uswest | 0xaa | {'host': 'host1', 'name': 'cpu'}

(1 rows)

Tracing session: f12650e0-a27d-11e5-aa35-9f90a57422e0
 activity | timestamp | source | source_elapsed
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                                                                                                                            Execute CQL3 query | 2015-12-14 09:15:50.510000 | 127.0.0.1 | 0
 Parsing select * from monasca.metric_map where tenant_id = 'hpe' and region = 'uswest' and metric_map['name'] = 'cpu' and metric_map['host'] = 'host1' allow filtering; [SharedPool-Worker-1] | 2015-12-14 09:15:50.511000 | 127.0.0.1 | 166
                                                                                                                                                     Preparing statement [SharedPool-Worker-1] | 2015-12-14 09:15:50.511000 | 127.0.0.1 | 320
                                                                                                                                               Computing ranges to query [SharedPool-Worker-1] | 2015-12-14 09:15:50.511000 | 127.0.0.1 | 344
                                                                      Index mean cardinalities are metric_map_metric_map_idx:1. Scanning with metric_map_metric_map_idx. [SharedPool-Worker-1] | 2015-12-14 09:15:50.511000 | 127.0.0.1 | 439
                                                                             Submitting range requests on 1 ranges with a concurrency of 1 (0.0 rows per range expected) [SharedPool-Worker-1] | 2015-12-14 09:15:50.511000 | 127.0.0.1 | 484
                                                                                                                                   Submitted 1 concurrent range requests [SharedPool-Worker-1] | 2015-12-14 09:15:50.511000 | 127.0.0.1 | 571
                                                                                              Executing read on monasca.metric_map using index metric_map_metric_map_idx [SharedPool-Worker-2] | 2015-12-14 09:15:50.511001 | 127.0.0.1 | 674
                                                                                                Executing single-partition query on metric_map.metric_map_metric_map_idx [SharedPool-Worker-2] | 2015-12-14 09:15:50.511001 | 127.0.0.1 | 729
                                                                                                                                            Acquiring sstable references [SharedPool-Worker-2] | 2015-12-14 09:15:50.511001 | 127.0.0.1 | 751
                                                                                               Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-12-14 09:15:50.511001 | 127.0.0.1 | 812
                                                                                                                              Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-12-14 09:15:50.511001 | 127.0.0.1 | 839
                                                                                                                          Executing single-partition query on metric_map [SharedPool-Worker-2] | 2015-12-14 09:15:50.511001 | 127.0.0.1 | 975
                                                                                                                                            Acquiring sstable references [SharedPool-Worker-2] | 2015-12-14 09:15:50.511001 | 127.0.0.1 | 1002
                                                                                               Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-12-14 09:15:50.512000 | 127.0.0.1 | 1047
                                                                                                                              Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-12-14 09:15:50.512000 | 127.0.0.1 | 1071
                                                                                                                                       Read 2 live and 0 tombstone cells [SharedPool-Worker-2] | 2015-12-14 09:15:50.512000 | 127.0.0.1 | 1271
                                                                                                                                                                              Request complete | 2015-12-14 09:15:50.516535 | 127.0.0.1 | 6535

cqlsh>

All aggregation functions will need to be implemented and performed by the API.

It appears that group by functionality was recently added to Cassandra. https://issues.apache.org/jira/browse/CASSANDRA-10707

We will need do inserts efficiently using non-batches. https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.d3lxvmq08

Questions:
1. It wasn't clear what the schema was in I.

Look at the monasca_metric_map table shown above.

2. What is the possibility of a collision with SHA-1 hashes based on the number of metrics?

Please see this webpage for the calculation of the probability. http://stackoverflow.com/questions/1867191/probability-of-sha1-collisions

For a more concrete number, we would need to have seen about 2^80 (10^24) unique sets of dimensions/metric name before we had a 50% chance of a collision.

To put that into perspective, if we saw 1,000,000 new unique sets per second per tenant for 10 years, we would only have seen less than 10^16 unique sets. That is well below 10^24.

So, in practical terms, the chances that there will be a collision are so low as to not need to be considered. You are more likely to be struck by lightning on your birthday.

QED

3. Is the date field used for partitioning

Yes, the date field is for partitioning. We need to do a back-of-the-envelope calculation to determine if we need this, but I put it in for now so that we could consider it.

Doing the calculation shows that we do not need to partition using date.

If we assume 1 measurement per second for each metric, then it would take 63 years before we reached the 2 billion maximum allowed columns in Cassandra.

4. Why isn't the metric name just a column in the metric lookup table?

Good question. There are 2 ways to include the metric name as a separate field (column) in the metric_map table (column family). In either case, metric name would need to be part of the primary key for fast look ups.

1. Make metric name part of the partition key. This would require too many partitions. Searching across partitions is expensive. Queries that needed to look for a particular set of dimension would be required to search across all partitions.

2. Make metric name part of the primary key, but not part of the partition key. This would make metric name a clustering key. Again clustering around metric name does not help in searches unless the search is done with the metric name. In many cases, a metric name will not be given in the search; only a set of dimensions. So, the extra overhead of maintaining the clustering does not seem to buy anything on the search.

Including metric name in the metric_map.metric_map seems to strike the right balance between indexing on name as well as dimensions while giving the correct look up guarantees.

5. I don't see anything that would handle Brad Klein's change: https://review.openstack.org/241626" Add optional start_time and end_time for metrics list" Would that require a separate table?

We will implement that as we do currently. I.e., we will get a list of metrics and then query for date ranges. The alternative is to try to keep track of first seen dates and last seen dates in another table. But this reduces to the way it is currently implemented.

Gerrit topic: https://review.openstack.org/#q,topic:bp/monasca-cassandra,n,z

Addressed by: https://review.openstack.org/349751
    Add Cassandra support to devstack

Addressed by: https://review.openstack.org/273845
    Enable Cassandra Database

Addressed by: https://review.openstack.org/345045
    Revival of Cassandra repository

Addressed by: https://review.openstack.org/344059
    Add java implementation of Cassandra support

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.