Cassandra Tasks

Registered by Witek Bedyk

Based on the email discussion, here is the list of possible tasks and points to consider or investigate as well as the list of resources.

Blueprint information

Status:
Complete
Approver:
None
Priority:
Medium
Drafter:
Witek Bedyk
Direction:
Needs approval
Assignee:
None
Definition:
Obsolete
Series goal:
None
Implementation:
Unknown
Milestone target:
None
Completed by
Roland Hochmuth

Related branches

Sprints

Whiteboard

Tasks

1. NetworkTopologyStrategy: The keyspace for the metric should use NetworkTopologyStrategy instead of SimpleStratege. The SimpleStratege is only used in one data center scenario and is not even recommended in production.

2. DataCenter aware connection policy: As the consequence of NetworkTopologyStrategy adoption, the Cassandra connection code in application client should use existing DataCenter aware policy to make connection to database.
This is to make sure the client only connect to its local data center first, and only try the remote datacenter in the case of local data center is down for some time (this trying remote has to be implemented in the client code).

3. Time in partition key: We should consider use time in the partition key of the metric table so to guarantee the query efficiency and table maintenance efficiency (delete old data base on partition will be easy and fast). Although the number of columns for one table can be two billion maximum, it is recommended to keep number of column less than 10 million. The smaller number of column makes query more efficient, keyspace repair fast ((which should be done daily), new node coming up fast (because there can be more nodes streaming data to new node because of more partitions mean more nodes) and other benefit like replacing a node will be fast, etc. I saw people using month, 3 weeks, or day to partition table. However, this will lead to the query statement changes.

4. Metric map in measurement table: Consider also have metric set map data in measurement table I am not sure how often it is required to join metric table with measurement table to show where the measurement is from. If we have the use case, it is better to unnormalize the measurement table so the joining operation can be avoided.

5. Async: Investigate use of async. Use of the async API in the Cassandra client such that new queries are issued while results from previous ones are in-flight

6. UDF and UDA: To support the statistics functions in Monasca, the measurements are queried and then the statistics are calculated client side. See, https://review.openstack.org/#/c/273845/22/monasca_api/common/repositories/cassandra/metrics_repository.py. Investigate of the User-Definition Functions (UDF), <https://docs.datastax.com/en/cql/3.3/cql/cql_using/useCreateUDF.html>, and User-Defined Aggregate Functions (UDA), https://docs.datastax.com/en/cql/3.3/cql/cql_using/useCreateUDA.html, would be more efficient.

7. Compression: Add support for compression. Compression is extremely important. The method described in the Facebook Gorilla paper, http://www.vldb.org/pvldb/vol8/p1816-teller.pdf, has kinda become the standard for several databases over the past year including, Prometheus and InfluxDB. Investigate adding the Gorilla/Facebook compression to Cassandra.

8. Group_by: Add support for group_by query parameter in measurements and statistics.

9. dimensions/names resources: Add support for dimensions names resource.

10. Prepared statements: Add use of prepared statements. This will greatly reduce the per-query overhead for a single partition key in a wide row, and in general for any Cassandra query.

11. BATCH operations: Add possible use of BATCH operations to group together queries going for the same partition key (but different clustering keys or columns in that wide row). This will pack together queries in the same request. Note that prepared queries can also be batched.

12. Multiple-threads: Possibly add support for multiple threads client-side.

13. Wide-rows: In DT's experience, Cassandra does scale quite well with respect to wide rows, although indeed it depends on what we mean by "extremely wide rows" and whether or not deletions are also performed. Ultimately the "wide row" (partition vs clustering key) mechanism gives us a way to trade off per-node overhead vs the number of nodes visited by a query, which is useful if we have many servers in the distributed database. Not having wide rows is great if we only retrieve data by primary key, on the other hand wide rows can help tremendously if we need to constrain secondary index queries (or indeed aggregation operations) to fewer nodes. In our experience, it is best to decouple the "maximum wideness" of the row from the data size, by making use of arbitrarily sized "time shards" as partition keys. For instance, we can use some arbitrary size time interval (1 day, 5 days, 37 hours) as a "partition key ID" (rather than, say the year), and include all entries in that time interval as clustering keys under that partition. This way we can put a bound on the expected size of the wide row. The only down side to this is that, while writes are simple, read (or aggregation) queries need to first perform some computation to decide which time shards / partition keys they need to visit.

14. Investigate and potentially implement storing all the config DB information which is currently in MySQL/PostGRES to Cassandra. See presentation at Strata + Hadoop by Netflix where they moved 99% of their Oracle Relational Data to Cassandra.

15. Add "WITH CLUSTERING ORDER BY ... " clause to cql schema.

16. Should we consider use of SparkSQL or will the UDFs and UDAs be sufficient? Note, the Monasca Transform Engine uses Spark Streaming, so there is a precedent for using Spark within Monasca already.

Questions:

1. How does each write of each metric get stored? For example, if you are inserting a single metric, does the coordinator require that each write of each metric for each replica be acknowledged, prior to responding to the client? If so, that would take a lot of round-trip processing time per metric. Is there a way that multiple metrics can be inserted without having to wait for the round-trip acknowledgment from each metric. For example, is it possible to do inserts on a batch basis to aromatize the round-trip latency over multiple metrics?

2. How to ensure that queries in Cassandra that we've optimized for in Vertica already using multiple projections such that different queries, such as by ID, or metric name, or dimension name/values, run fast. Depending on the type of query the way that the data is grouped and ordered by on disk can significantly impact performance. In Vertica, this was handled by using multiple projections, which are like indices in mysql, that get targeted for queries. Seems like if we need to do something similar in Cassandra we need to store the data on disk in multiple ways. Unfortunately, Cassandra doesn't support projects/indices. The Cassandra way would be to write the data multiple times. In effect, building up something similar to what projections provides in-terms of data layout on disk.

3. How does Cassandra search in a row for data? For example, if a row stores 30 days of data, how does Cassandra grab the data for just the last hour, if that was all that was specified in the query as the start/end timestamp? Does Cassandra need to scan the entire row of data, or does is it possible for Cassandra to do a binary search for the data?

4. Cassandra has the concept of a TTL. My assumption is that there is a TTL stored for each column in a row. Is that the case, and if so, that does sound like a bit of overhead in=terms of on-disk storage.

5. What are all the types of queries that need to be supported? We probably need to enumerate all the queries that we've optimized for.

6. How is the data stored on disk? Just wondering about the low-level internals of how Cassandra does it's job and how that is or is not optimal for our type of problem.

Resources:

1. http://www.rubyscale.com/post/143067470585/basic-time-series-with-cassandra

2. http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra

2. Cassaandra Summit 2016: http://myeventagenda.com/sessions/1CBFC920-807D-41C1-942C-8D1A7C10F4FA/5/5#

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.