Send Sensor Data to Ceilometer

Registered by devananda on 2013-11-13

This blueprint will define the sensor data collection interface and implement a driver based on IPMI to collect sensor data and send them to Ceilometer.

Blueprint information

Status:
Complete
Approver:
devananda
Priority:
High
Drafter:
Haomeng,Wang
Direction:
Approved
Assignee:
Haomeng,Wang
Definition:
Approved
Series goal:
Accepted for juno
Implementation:
Implemented
Milestone target:
milestone icon 2014.2
Started by
Haomeng,Wang on 2014-02-13
Completed by
Haomeng,Wang on 2014-08-01

Related branches

Sprints

Whiteboard

Discussed with Ceilometer team, they suggest we can follow Nova event notification message format to send IPMI Data notification to Ceilometer Collector directly.

And we will design a driver-based framework to collect these hardware sensor data such as Temperature/FAN Speed/Volt etc and send them to Ceilometer Collector for extending more drivers for different hardware systems which has no IPMI enabled.

Define these base class as absolute layer:

ironic.drivers.base.BaseDriver:sensor
ironic.drivers.base.SensorInterface

Implement this interface as follow for our IPMI sensor data collecting:

ironic.drivers.modules.ipmitool:IPMISensor

However we have to verify this solution in code level to double confirm if it works or not, else we have to discuss with Ceilometer team with new round to work out new solutions. Based on the testing code, we can refine the data-model/interface with Ceilometer and have the finalized design before coding.

@Devananda, one question here, I remembered that you mentioned this blueprint is not in our icehouse release scope, can you help to confirm? Thank you.

--Haomeng, 2013-12-09

-------------------

@Haomeng, it is not an essential feature for coming out of Incubation, which is our priority for Icehouse time frame. However, it is still an important feature to several other teams, and it is fine to work it.

Your proposed code changes above look good. There is one additional specification that I would like to see -- what does the ironic.drivers.base.SensorInterface API look like, and what is the type of data it returns.

Thanks!
Devananda, 2013-12-10

--------------------

@Devananda, I proposed the SensorInterface interface definition as below three abstract methods to support the common sensors such as Fan Speed, Volatage and Temperature sensors. Why we return a dict as result for these sensor data retrieving, the main idea here is that as you know these sensor data names are dynamic, depends on the different hardware systems, return different names, but the sensor data value type are fixed with 3 types - rpm/volts/degree, so we can support these three common types sensor in our first version, and send them to Ceilometer Collector via notification bus, then Ceilometer will recognize these 3 common sensor types, process and store them into Ceilometer database for retrieving in future.

Based on Ceilometer data model, resource->meters->samples, the meter-name is required, so we think Fan Speed, Volatage and Temperature as three common meters in Ceilometer, but for each sensor, we have multi-instance value with different resource_id.

For example, for a physical node, we have 4 FAN sensors, FAN1/2/3/4, so after these IPMI data are sent to Ceilometer, we can see one meter and 4 resource instances in Ceilometer:

1 Meter Name: fan
4 Resources IDs: <node-id>-FAN1, <node-id>-FAN3, <node-id>-FAN2 and <node-id>-FAN4
4 Samples(one sampling):

+--------------------------------------+-------------+
| Resource ID | Volume |
+--------------------------------------+-------------+
| <node-id>-FAN 1 | 4652 |
| <node-id>-FAN 2 | 4860 |
| <node-id>-FAN 3 | 4995 |
| <node-id>-FAN 4 | None |
+--------------------------------------+------------+

The <node-id> can be our Ironic node mac-address as the identifier of a node in Ceilometer scope.

ironic.drivers.base.SensorInterface

@abc.abstractmethod
def get_fanspeed_sensor_data(self, node):
    """Return the Fan Speed sensor data(in rpm) dict for a node.

       The return data example:
        {"FAN 1":4652,
         "FAN 2":4860,
         "FAN 3":4995,
         "FAN 4":None,
         "FAN A":4725
        }

    TODO
    """

@abc.abstractmethod
def get_voltage_sensor_data(self, node):
    """Return the System Voltage sensor data(in volts) dict for a node.

       The return data example:
        {"Vcore ":0.81,
         "3.3VCC":3.36,
         "12V":11.98,
         "VDIMM":1.51,
         "5VCC ":5.06,
         "VDIMM":1.51,
         "-12V":-11.87,
         "VBAT":3.12,
         "VSB":3.34,
         "AVCC":3.36
        }

    TODO
    """

@abc.abstractmethod
def get_temperature_sensor_data(self, node):
    """Return the System Temperature sensor data(in degrees C) dict for a node.

       The return data example:
        {"System Temp":31,
         "CPU Temp":3.36
        }

    TODO
    """

Another idea I think, we have to match Ceilometer Sample data model, so the project_id, tenant_id and user_id are required for these sensor data if our node has association instance_uuid from nova side,we can retrieve these nova instance data and copy them to our Node object, which can be reused for our Sensor data collection to populate these context data items to align to Ceilometer Sample data model, else if the node has no instance_uuid that means it has no nova instance, so will set these three items project_id, tenant_id and user_id to None value and sent to Ceilometer Collector,

One more thing is that we will define a periodic task(Inherits from our ironic.common.service.PeriodicService ) to poll IPMI data by interval, the default value is 30 minutes, for our first version, I think, we can only support the node which driver is PXEAndIPMIToolDriver first, then we can support more driver such as PXEAndIPMINativeDriver and others.

These ideas are my draft design for this bp, how do you think? Welcome your more ideas.

Thanks!
Haomeng, 2013-12-26

--------------------

Haomeng,

You should use node.uuid as the identifier, not the Node's MAC address because a Node may have many MAC addresses.

I would like to get comments from members of the Ceilometer team regarding the resource->meter->sample and resource name aspects of the proposal. It looks good to me.

The SensorInterface proposal looks good, and I think caching project_id, tenant_id, user_id within the ironic.node.driver_info dict is reasonable. This is where Nova will cache information such as glance image id's, so there is already precedent for it.

Cheers,
Devananda, 2013-12-26

--------------------

@Lianhao,

Would you please help to review my proposed solution, welcome your comments from Ceilometer view, thank you.

And not sure who will work on Ceilometer part for this bp, I understand Ceilometer have to add new notification handler to consume IPMI data sent from Ironic.

Thanks,
Haomeng, 2013-12-31

--------------------
Hi Haomeng,

   I have a few questions and commetns.

   It seems to me that the SensorInterface is flexible to allow vendor specific BMC method to collect sensor data. Is it correct?

  In terms of sensor types, I think fan,voltage and temperature sensors are good start but there are more sensor data, such as hardware health status, min/max power and temperature values, upper thresholds of critical temperature and fan speed, ...etc., will be very useful. Is there a plan to add such sensor types or Is there a mechanism for vendor to plugin more sensor types?

For periordic polling, wil it work with vendor specific driver, e.g., non-PXE based Deploy and non-IPMI based Power drivers?

Lastly, is there a plan to support asynchronous events or allow vendor to plugin asynchronous events?

Thanks!
Wanyen 01/24/14

--------------------

Hi Wanyen

Yes, SensorInterface is abstract interface for all vendor drivers, however for first version, we just want to plan to implement the IPMI data collector.

And for the sensor we supported, we want to support all sensor data which can be retrieved from IPMI node, I have another proposal to implement more sensors you mentioned, such as the min/max and thresholds values, maybe a more general solution is required for us to support all sensors with simple key-value pairs, a common interface method can be defined as below:

def get_sensor_data(self, node)
 """Return all the sensor data retrieved from node

       The return data example:
        {"Vcore ":0.81,
         "3.3VCC":"3.36",
         "12V":"11.98",
         "VDIMM":"1.51",
         "5VCC ":"5.06",
         "VDIMM":"1.51",
         "-12V":"-11.87",
         "VBAT":"3.12",
         "VSB":"3.34",
         "AVCC":"3.36",
         "FAN 1":"4652",
         "FAN 2":"4860",
         "FAN 3":"4995",
         "FAN 4":"None",
         "FAN A":"4725",
         "...":"..."
        }

        """

For this solution, from Ceilometer view, this is common data model for all sensors, so have to check with Ceilometer team to see if this solution is fine for Ceilometer Collector, how we store these into Ceilometer resource->meters->samples data structure, I will try to discuss with Ceilometer team for this common solution.

Thanks
Haomeng 2014-1-25

--------

I summary above two proposed solutions as below:

Solution 1 - sent the ipmi data to ceilometer by ipmi sensor category in specific, the meter names areclear for ceilometer, that is pre-defined already:

    Common field:
        timestamp
        publisher_id
        message_id

    Category:
        FanSpeed
        Voltage
        Temperature

    Meter Names:
        fanspeed, fanspeed.min, fanspeed.max, fanspeed.status
        voltage, voltage.min, voltage.max, voltage.status
        temperature, temperature.min, temperature.max, temperature.status

    resource-id: node-uuid

    An message example with one ipmi node sensor data:

    message = {
        'event_type': 'ipmidata',
        'timestamp': '2013-12-1706: 12: 11.554607',
        'user_id': 'admin',
        'publisher_id': 'ipmidata-os26-control01.localdomain',
        'message_id:' '3eca2746-9d81-42cd-b0b3-4bdec52e109x',
        'tenant_id: 'c1921aa2216846919269a17978408476',
        'instance_uuid: '96e11f69-f12a-485e-abfa-526cd04169c4' # nova instance uuid
        'id': '1329998e8183419794507cd6f0cc121a' # node's uuid
        'payload': {
            'fanspeed': {
                'FAN 1': {
                    'current_value': '4652',
                    'min_value': '4200',
                    'max_value': '4693',
                    'status': 'ok'
                }
                'FAN 2': {
                    'current_value': '4322',
                    'min_value': '4210',
                    'max_value': '4593',
                    'status': 'ok'
            },
            'voltage': {
                'Vcore': {
                    'current_value': '0.81',
                    'min_value': '0.80',
                    'max_value': '0.85',
                    'status': 'ok'
                },
                '3.3VCC': {
                    'current_value': '3.36',
                    'min_value': '3.20',
                    'max_value': '3.56',
                    'status': 'ok'
                },
            ...
        }
    }

Solution 2- sent the ipmi data to ceilometer on the common sensor meter level, we have one 'sensor' as common meter, so all the sensor data will have more detail level to define the sensor name and attributes - current/min/max/status values:

    Common field:
        timestamp
        publisher_id
        message_id

    Common sensor meter name:
        sensor

    An message example with one ipmi node sensor data:

    message = {
        'event_type': 'ipmidata',
        'timestamp': '2013-12-1706: 12: 11.554607',
        'user_id': 'admin',
        'publisher_id': 'ipmidata-os26-control01.localdomain',
        'message_id:' '3eca2746-9d81-42cd-b0b3-4bdec52e109x',
        'tenant_id: 'c1921aa2216846919269a17978408476',
        'instance_uuid: '96e11f69-f12a-485e-abfa-526cd04169c4' # nova instance uuid
        'id': '1329998e8183419794507cd6f0cc121a' # node's uuid
        'payload': {
            'FAN 1': {
                'current_value': '4652',
                'min_value': '4200',
                'max_value': '4693',
                'status': 'ok'
            }
            'FAN 2': {
                'current_value': '4322',
                'min_value': '4210',
                'max_value': '4593',
                'status': 'ok'
            },
            'Vcore': {
                'current_value': '0.81',
                'min_value': '0.80',
                'max_value': '0.85',
                'status': 'ok'
            },
            '3.3VCC': {
                'current_value': '3.36',
                'min_value': '3.20',
                'max_value': '3.56',
                'status': 'ok'
            },
            ...
        }
    }

So not sure what is more acceptable for our ceilometer, need more input/comments from ceilometer team guys who will own the ceilometer bp which handle this bp output.

Thanks
Haomeng 1/27/2014

-----------------------------------------------------------------------------------------------------------------
Hi Haomeng,

  Thank you for taking my input into consideration. I think the common sensor solution will give more flixibility to add new sensors in general and also allows vendors to provide vendor specific sensors. We would need a mechansim (e.g., extra_sensors) for vendors to extend the sensor types if specific sensor approach (solution1) is taken. Thanks!

Wanyen 1/29/14

-----------------------------------------------------------------------------------------------------------------
Hi Wanven,

Welcome your input/comments, that is valuable for us to design this bp.
Yes, common sensor solution should be a flexible framework for us, however it is difficult to be accepted by Ceilometer, this is because our Ceilometer accepts the pre-defined meters only, so for common sensor solution, we will just pre-define one single 'huge' meter 'sensor', for more detail level data, such as the sensor name and values, it is hard to map to ceilometer resource-meter-sample data model, and for the sensor names they are variable names for different IPMI hardware implementation, so ceilometer team will choose the first solution, that match ceilometer data model easily, so we propose the first solution as the first version implementation, and we can extend the sensors by category in the future, and support them from both ironic and ceilometer, I have sent out the discussion about these solutions with our openstack-dev mail list, the title is "[openstack-dev][Ironic][Ceilometer]bp:send-data-to-ceilometer" welcome you reply the mail threads directly to discuss with your comments, welcome you, thanks.

Haomeng 1/30/14

--------------------------------------------------------------------------------------------------------------------
Hi Haomeng,

  I posted the following message on the mailing list. Hope you you have seen it and consider my input. If there is a desire to go for solution one, please add an exta_sensor with key+ value pairs to allow hardware to report addtional sensors. Thanks!

Hi,

  I sent this message on 01/31 but I did not see it got posted on the mailing list. So, I am sending it again...
  Given that different hardwares will expose differnt sensors, I am hoping that we will have a flexible and extensible interface and data structures to accomodate different hardwares. For instance, some hardware can report additional power and thermal information (such as average power wattage, critical upperthreshold of temperature, ...etc) than basic current/min/max wattages and temperature. Some hardwre exposes NICs and storage sensors as well. IMO. solution2 gives more flexibility to accomodate more sensors. If there is a desire to define a set of common sensors such as power, fan, and thermal...etc as proposed by solution1, then I think we will need an additional data structure such as extra_sensors with key and value pair to allow hardwares to report additional sensors. Thanks!
Regards,
Wanyen

Hi Wanyen,

Thanks for your input first, I have worked out solution 1 enhanced version, that is very flexible framework from our Ironic part, will not change any data returned by 'ipmitool' command and just format them as JSON string and sent to ceilometer, I think this should be the final solution, I will implement it with python code in our Ironic.

We run the ipmitool command with 'sdr -v' options, so we get details for each sensor, see the command line and out put as below link:

http://paste.openstack.org/show/63267/

Our Ironic will parse these output to JSON string by 'Sensor Type', check the JSON string which will be sent to Ceilometer:

http://paste.openstack.org/show/68808/

So from our Ironic part, we will support all sensors which returned from 'ipmitool sdr -v' command, that is flexible framework I think. For this my testing case result, we get a lot of below sensor types, including 'Fan', 'Voltage', 'Temperature' these three common sensors:

['Cable / Interconnect', 'Physical Security', 'System Firmwares', 'Temperature', 'Drive Slot / Bay', 'Battery', 'Unknown (0xC1)', 'Memory', 'Power Supply', 'System Event', 'Module / Board', 'Version Change', 'Fan', 'Voltage', 'Event Logging Disabled', 'Critical Interrupt', 'Watchdog', 'Processor', 'Entity Presence']

However, from Ceilometer part, have to define the 'Meter' data model with these JSON input from our Ironic, so for first version, I think our Ceilometer will support 'Fan', 'Voltage', 'Temperature' first, and will check with Ceilometer team guys how to model/map these ipmi sensor data as ceilometer resource->meter->samples and support more flexibility to accomodate more sensors like our Ironic:)

Thanks
Haomeng 2/8/2014
-------------------------------------------------------------------------------------------------------------------
Hi Haomeng,

    I actually prefer Ironic sensor data format to be ipmitool agonostic (i.e., not tied to ipmitool output format). I understand that the Ironic reference implementation will be ipmi based. However, Ironic should allow vendors to use their own BMC interface (not ipmitool) to collect sensor data from bare-metal nodes. Ideally, Ironic driver interface and data structures should be neutral to ipmitool or any specific BMC interface so that the same interface can be used by Ironic reference implemenation as well as vendor specifc implemenation. It would be good for Ironic adopters as well. Even though ipmitool collects a good set of sensors, vendor's hardware may report more than what ipmitool currently supports. That's the reason why I am proposign adding an extra-sensors structure with key+ value pairs to allow additional vendor-specific sensor data to be reported. I am fine with your original solution1 + extra-sensors or any other solution that does not tie to ipmitool output format and not limited to just what ipmitool can report. Thanks!

Regards,
Wanyen

-------------------------------------------------------------------------------------------------------------------
Hi Wanyen,

Yes, agree with you, we should not depends on the output format of ipmitool command, we just define the common data model for sensor data as below:

{
    'Sensor Type 1': {
      'Sensor ID 1': {
        'key1': 'value1',
        'key2': 'value2'
      },
      'Sensor ID 2': {
        'key1': 'value1',
        'key2': 'value2'
      }
    },
    'Sensor Type 2': {
      'Sensor ID 3': {
        'key1': 'value1',
        'key2': 'value2'
      },
      'Sensor ID 4': {
        'key1': 'value1',
        'key2': 'value2'
      }
    }
}

So we not tied to ipmitool output, these common data structure should be fine with any sensor data I think.

For now, we are using ipmitool to get the sensor data for our 'pxe_ipmitool' driver now, once our ipminative lib [1] is ready(the new release version>0.5.8) for getting the sensor data, we will have 'pxe_ipminative' driver implementation, the above sensor data structure also be used for new

[1] https://github.com/stackforge/pyghmi/blob/master/pyghmi/ipmi/command.py#L313

BTW, you can discuss this bp with our maillist directly, let more guys involved this discussion and we will get more feedback/comments, thank you.

Haomeng 2/11/14

-------------------------------------------------------------------------------------------------------------------

Gerrit topic: https://review.openstack.org/#q,topic:bp/for,n,z

Addressed by: https://review.openstack.org/72538
    Implements Send Sensor Data to Ceilometer

Gerrit topic: https://review.openstack.org/#q,topic:bp/-,n,z

Gerrit topic: https://review.openstack.org/#q,topic:bp/send-data-to-ceilometer,n,z

Addressed by: https://review.openstack.org/73971
    sync oslo rpc to ironic

-------------------------------------------------------------------------------------------------------------------

Is it possible to consider a pluggable model for this integration? In my case, I would like to use a tool other than ceilometer to monitor my Ironic installation. If we made the collection interface common, an then the distribution interface pluggable, it would permit me to write a driver that could ship the interface to statsd, or graphite, or any other monitoring configuration as I saw fit.

Even if out of scope for this blueprint specifically, I'd be curious as to how open you'd be to making it pluggable in the future.

Thanks,
Jay Faulkner

-------------------------------------------------------------------------------------------------------------------

Hi Jay,

We have common message struct grouped by sensor type as below:

{
    'Sensor Type 1': {
          'Sensor ID 1': {
            'key1': 'value1',
            'key2': 'value2'
          },
          'Sensor ID 2': {
            'key1': 'value1',
            'key2': 'value2'
          }
    },
    'Sensor Type 2': {
          'Sensor ID 3': {
            'key1': 'value1',
            'key2': 'value2'
          },
          'Sensor ID 4': {
            'key1': 'value1',
            'key2': 'value2'
          }
    }
}

please check this link, which is the sample data which will sent to Ceilomter via the message bus.

http://paste.openstack.org/show/68808/

So this just a data source, you can create a notification receiver to consume the message also. I think that make sense to have more driver to consume these ipmi sensor data.

And we have get_sensors_data interface method, which can be called to retrieve the ipmi sensor data directly without notification bus supporting.

https://review.openstack.org/#/c/72538/9/ironic/drivers/base.py#L352

Thanks
Haomeng

Addressed by: https://review.openstack.org/102435
    Send sensor data to Ceilometer

---------------------------------------------------------------

Ironic spec - https://review.openstack.org/102435
Ceilometer spec - https://review.openstack.org/#/c/100657/

Ironic change the spec, the notification[1] to be sent to Ceilometer will not cover the sensors which has no 'Sensor Reading' field, because it is non-varying data, will not be recorded by Ceilometer.

[1] http://paste.openstack.org/show/85053/

Haomeng 6/27/14

Addressed by: https://review.openstack.org/105076
    Sync Oslo notifier module to Ironic

Addressed by: https://review.openstack.org/106522
    Sync Oslo notifier module to Ironic

Addressed by: https://review.openstack.org/106552
    Sync Oslo rpc module to Ironic

Gerrit topic: https://review.openstack.org/#q,topic:bp/will,n,z

Addressed by: https://review.openstack.org/112486
    WIP: Add send-data-to-ceilometer support for pxe_ipminative driver

Gerrit topic: https://review.openstack.org/#q,topic:send-data-to-ceilometer,n,z

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.