Add an "uptime" measurement.

Registered by Cyril Roelandt

It would be great to be able to determine the uptime of a machine in a given period of time.

Currently, an "instance" measurement is available. It doesn't exactly address this issue, since it does not give information about the uptime/downtime of a VM. It is more of an "instance existence time".

Possible solution:
-------------------
Nova sends notifications when a vm is started/stopped/suspended/resumed (this patch is needed : https://review.openstack.org/#/c/38485/). Ceilometer could keep this information, and upon a user request, compute the result expected by the user.

Currently, a user may compute the uptime of a VM during a [t1;t2] period of time using something like this pseudo-code:

    active = None
    latest_start_timestamp = None # Last time the VM was brought up.
    total_uptime = 0.0 # In seconds.

    for sample in samples:
        state = sample['resource_metadata']['state']
        if state in ['active']:
            if not active:
                active = True
                latest_start_timestamp = sample['timestamp']
        elif state in ['stopped', 'suspendend']:
            if active:
                active = False
                total_uptime += sample['timestamp'] - latest_start_timestamp
        else:
            raise ValueError(state) # This state is not handled yet.

    if active:
        total += t2 - samples[-1]['timestamp']

Note that here, the time spent between t1 and sample[0]['timestamp'] is not taken into account. We have to find a way to know the state of the VM in [t1;sample[0]['timestamp']]. This can probably be done by looking at the 'event_type' field and/or by retrieving one older statement.

This could probably be nicely implemented in Ceilometer. The downtime could be computed the same way if necessary.

Another way of solving this issue would be to add a more generic meter (such as 'durations'), that would return the time spent in each state:

    {
        total: 42,
        active: 30,
        building: 4,
        scheduling: 2,
        stopped: 4,
        suspended: 2
    }

The computation of each state duration could be done as previsouly described.

Technical details
---------------------
* Add q.metadata_field, to be able to run queries that use a filter one a resource_metadata[X] field:
        q.metadata_field=state, q.op=eq, q.value=active

* The stats should return a "duration with resource_metadata[X] == Y" column when q.metadata_field is specified.

* Caching the results.
It seems hard to keep the current value, since it depends on the time interval given in each query. We should probably not worry too much about caching right away. Anyway, this would only be a problem when querying statistics over a very large amount of time, and with a lot of notifications to process.

* Other use cases
For how long has a VM been using a given kernel ?
    q.metadata_field=kernel_id&q.op=eq&q.value=HASH
    In order to do this, we would need Nova to send us a notification when a
    new kernel is installed.

For how long has a VM been using N cpus ?
    q.metadata_field=vcpus&q.op=eq&q.value=N
    In order to do this, we would need Nova to send us a notification when the
    number of vcpus changes

Blueprint information

Status:
Complete
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
Obsolete
Series goal:
None
Implementation:
Unknown
Milestone target:
None
Completed by
gordon chung

Related branches

Sprints

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:bp/uptime-measurement,n,z

Addressed by: https://review.openstack.org/41899
    Statistics: group results by a value

Addressed by: https://review.openstack.org/42355
    Implement 'reset_on' for mongodb

see https://bugs.launchpad.net/ceilometer/+bug/1425322 - gordc(2017-12)

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.