Opt-In Stats Tracking of OpenStack Deployments

Registered by Joshua McKenty

The OpenStack foundation has agreed to operate a confidential stats tracking service, which will provide reporting only in the aggregate. This blueprint is for a common client that can be used within each openstack project to report install and usage data, on an opt-in basis. The aggregated data is shown by <tag>, only when there are at least 6 unique reporters for class <tag>

Blueprint information

Status:
Not started
Approver:
mark collier
Priority:
Undefined
Drafter:
Joshua McKenty
Direction:
Needs approval
Assignee:
Monty Taylor
Definition:
Drafting
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

TB:
This would be great. A few thoughts
- Not all users would need anonymisation. We should make it optional. This would allow a 'Welcome New Users' section of the newsletter for example.
- Some users would have multiple instances. We should have a mechanism to identify the user and an instance id.
- Some form of expiration would also be needed. If you do not refresh the instance statistics once a year, for example, you would be no longer included. This would be more of a backend implementation than a client change.
- We should be able to post the data without needing a direct connection from the OpenStack instance to the internet. Private clouds may not have this connectivity.

JMC:
That last one seems ideal, but probably too complicated for an initial implementation.
We're going to need to get very crisp about our terminology, and probably NOT use the term 'user':

 - Vendor: The provider of either an openstack-powered service (a service provider vendor), or a product that exposes openstack APIs (a software product or distribution vendor)
 - Operator: The individual or group with administrator credentials and privileges. (In the case of a service provider, the same as the vendor.)
 - Consumer: An individual or group with API credentials and/or Horizon dashboard credentials, as well as a non-zero quota for one or more openstack resource pools.

So regarding anonymous 'users', I agree that Consumers and Operators may be happy to be counted individually and by name - but anecdotally, I believe most Vendors would be concerned about enabling that feature, at least initially. In order to get the most buy-in from Vendors (who will after all need to include and enable this feature), I would propose strongly-anonymous stats gathering as the initial implementation - both for the protection of the Vendors, as well as the Operators and Consumers.

TB:
For the terminology, I completely agree. 'User' is too overloaded. Ideally, the terms would be used for the user survey and profiling also.

For the non-internet connected cases, not necessary for 1st round but if we keep the input data to a single structured text call, it should be possible to add at a later stage.

No problem to make the default anonymous for posting, there are many of the research sites which would be willing to declare themselves. However. Vendors would clearly choose anonymous as the default.

Would there need to be some unique id for each instance so that updated postings would replace previous data ?

TB:
ceph has started implementation of a tool for anonymous reporting of capacity etc. See http://wiki.ceph.com/01Planning/02Blueprints/Firefly/Ceph-Brag. This could be a model for implementation.

Loic Dachary: Ceph-Brag is still in the blueprint stage and I think it would be great to have a common tool useable by both Ceph & OpenStack

Leseb: first version of the client/server available: https://github.com/enovance/ceph-brag

(?)

Work Items