Implement a Userbase Counter

Registered by Otto Robba on 2011-09-21

It should be optional (but checked by default at install time). The counter sends a unique hash (a hash based on the MAC address, but not the MAC address itself) along with the version of the OS at the beginning of every month. From the server end, we can see the IP, which would let us approximate the origin.

What we'd use the info for:
- Anonymous hash: to keep stats accurate/unique
- IP: We'd grab the approximated region data, then toss the IP. This lets us see regional interest.
- OS Version: so we can track what versions are popular and whatnot.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Medium
Drafter:
None
Direction:
Approved
Assignee:
None
Definition:
Discussion
Series goal:
None
Implementation:
Not started
Milestone target:
None

Related branches

Sprints

Whiteboard

# Original Blueprint Text

It might be a good idea, to gauge interest, trends and growth, to have a way to measure, even if roughly, the number of users of elementary. This kind of information should be completely anonymous - user information must at all times be secure. Statistics derived from this could help if, in the future, eOS is to be sold by OEMs and the like. Or maybe if eOS ever wants to, I don't know, run some ads.

Now, the way to go about this is not necessarily easy. One might want to count the monthly unique hits to the update repos or maybe a simple elementary package that pings the update server saying "Hey, I'm a unique user. Count me!". Maybe even tell the origin (country) of the ping. I do think there is an application that does the couting in Ubuntu - but it is a non-default package and a native solution is always best.

This website discuss the issue http://mrooney.blogspot.com/2009/05/counting-number-of-ubuntu-users.html
Fedora does the unique hits statistics thing http://fedoraproject.org/wiki/Statistics#Yum_Data

Anyway. Thoughts?

# Discussion

The OS will poll our servers to check for distribution updates. That gives an estimation of the user activity rather than the number of installs, but the number of installs by itself means absolutely nothing (I had Sabayon installed and booted it once a month, the installation was there but I was no way a Sabayon user).

Canonical has already developed an open-source unique ping untility, called "canonical-census". It's available from the partner repository. It should be very easy to adapt it for our needs. ~shnatsel

Oh, and Ubuntu used to pre-install a system called "Popularity Contest" (popcon) and it was even enabled by default until Lucid. It collected package installation statistics and submitted them to ubuntu servers. We can revive it if we want to.

Regarding that article - look at Michael's estimation of the number of Ubuntu users in 2009: he says it's about 24 million. I hear Mark announcing 400 million users at UDS-N. Which of these statistic approaches do you prefer? ~shnatsel

If canonical-census can be adjusted for eOS that would be great. The idea of polling the update servers is precisely what I meant in the second paragraph. I agree that it is easy to have an OS installed that is not really used. I have a mac partition but I haven't booted that in ages so I agree. :)

If both installs and update-pings are accounted for, one could even see how many people tried eOS and how many of those stick with it. The path between Download >> Install >> "Actual Use" could be, more or less, be mapped. One could see the efficiency of Bittorrent and physical medias too. Meaning, if the number of downloads from the eOS website is smaller than the number of installs, it means people are sharing it over BT and/or cds. Just spitballing here but this kind of information might be useful down the road.

As for 'popcon', does it measure all kinds of package installation stats? If it does, it might be interesting to see, for instance, if people are using Postler/Dexter/Maya/something or replacing it with something else. Could help on a targeted study to improve any application that is being replaced.

As for which statistics approch, frankly, I prefer whichever gives the most accurate picture ;)
400 millions users? Didn't he just set out to reach 200 million in 4 years? Would that be 400 million installs?
--ottorobba

We've discussed this a bit at a previous Council meeting. Here's what we discussed: Optional (but checked by default), sends a unique hash (perhaps based on the MAC address, but not the MAC address itself) at the beginning of every month, and sends the version of the OS. From the server end, we can see the IP, which would let us approximate the origin. ~cassidyjames

how about using the users timezone to guess the location as GeoIP is not very accurate (Proxy/ISP/...). like we could make a small script (/program) to send the data like
{
     hash: "ThisIsAUniqueHash",
     timez: "userTimeZone",
     osData: "Luna,!@#$%",
     lastUp: "Last Updated"
}
----voldyman

On the local machine it's quite easy to set it up, it's sufficient to make a small script or program (that simply sends the info to the server) that is then scheduled via anacron the first day of every month or so.. ~spinatelli

(?)

Work Items

Work items:
Set up the web infrastructure: TODO
Code/adapt the desktop utility: TODO
Test test test: TODO

This blueprint contains Public information 
Everyone can see this information.