Get state of User Activity

Registered by Seif Lotfy on 2010-10-27

Imagine the scenario of the following activity stream:
( A(a) == Access subject a ; A(b) == Access subject b ; L(a) == Leave subject b; etc..)
(Timestamp are in minutes)

Timestamp - Activity
1) 10 - A(a)
2) 20 - A(b)
3) 30 - L(a)
4) 40 - A(c)
5) 50 - A(a)
6) 60 - L(b)
7) 70 - L(c)
8) 80 - A(d)
9) 90 - L(a)
10) 100 - L(d)

Now lets say I want to know what was state of my activity at timestamp 55 (keep in mind I am not using the term what happened around 55 but rather specific what was the state)

By state we mean "What was active, What was open?"

Currently if the user wants to know what he was doing he has to use FindEvents with a given timeframe lets say -10 minutes from 55 as in (45, 55). Then map all ACCESS events to LEAVE events. All subjects of events that are left open (without a LEAVE event) will be considered "still active".
The result would be in our case event 5 with "50 - A(a)"
Yet this is wrong. because we can clearly see that there is event 4 and event 2 who were open at the time and who's LEAVE events occurred later. Yet since the query did not include the timeframe of their occurrence they were left out.

So my solution was to have a new DB called state.sql which has one table called "state" the looks the following

access_timestamp || access_event_id || leave_timestamp || leave_event_id || actor_id || subject_id

Where every Access event gets registered and every Leave event updates the last Access row of the same subj_id and actor_id

In the case of the dataprovider not pushing closed events it seems like its pushing ACCESS events over and over.
In that case if we get an access event of a subject and actor that has an orphaned access(no leave event another access was pushed) we simple mark the orphaned one with close_timestamp = -1 and close_event_id = -1.

This allows us to estimate which subjects were open at any given timepoint by just querying the DB

SELECT open_event_id FROM state WHERE open_timestamp <= ? AND (close_timestamp >= ? OR close_timestamp IS NULL)

How can we use this?
Well it will be nice for unity and GNOME Do to actually take a "state" every 5 minutes of what is actually open. Then get the "related_uris" over Zeitgeist for everything that is currently open and keep them in a cache. So when searching stuff can be sorted by relevancy to the current context of work.

I hope you like the idea. Please take time to read it through. The implementation is not hard and I already started an extension for it that works just fine.

Blueprint information

Status:
Not started
Approver:
Zeitgeist Framework Team
Priority:
Undefined
Drafter:
Zeitgeist Framework Team
Direction:
Needs approval
Assignee:
Seif Lotfy
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

--- seif 2010-10-27 10:28pm ---
Lets discuss this idea.
Do we want it as a supported extension or do we want to have in the engine. Or do we want it as a community extension? How relevant is that to our current work? How could it help Unity and others?

--- thekorn 2010-10-28 09:09 am ---
I like the overall idea of making it easier to query for all open/active subjects at a given timestamp. Without commenting on the code you already have in the branch which is linked to this blueprint, I think we should develop the functionality as an external/community extension first. And once everything is polished, and working well enough we should decide whether we would like to have it as an (in-zeitgeist) extension or as part of the engine API.

--- alexlauni 2010-11-09 10:23 am ---
Being able to get relevancy info for what is happening *now* would be fantastic. It seems like this allows an amount of projection into what the user will be doing in the near future, which is obviously immensely powerful.

--- seif 2010-11-16 11:12 ---
I would very much like to have the opinions of kamstrup and rainct as well as mhr3 and manish

--- kamstrup 2010-11-17 9:37 ---
I'm sorry, but I think this proposal is far from thought through. There are *so* many things that can go wrong when you try to match up Access and Leave events. We have been over this countless times and I've never seen a robust solution. We can try and make up for these shortcomings with all sorts of elaborate heuristics (most of which will have horrible execution times) but I don't like that too much. This is the path of black magic.

Also how do you propose "So when searching stuff can be sorted by relevancy to the current context of work." would work technically? This strikes me as something that is expensive either in CPU or memory unless we are extraordinarily clever about it. - Not saying it's impossible, but that I think it will take more than just some clever SQL (read: lowlevel bit fiddling to implement some auxiliary structures to adapt Xapian/sqlite's sorting routines).

A more realistic, and robust, approach is probably to make a real snapshot of the whole environment and put it on a timeline. Then having some clever routines to look up "similar environment snapshots" and do some time shifting analysis etc.

--- seif 2010-11-17 10:05 ---
 While I know this might go wrong a bit I still think its out best solution if we assume that we get open/close events from everything.
The problem I am trying to solve here is "what was open at a specific timestamp". This call can be solved in on SQL call with my with the extension I am working on.
Now as for sorting by relevancy to the current context, all I do is call find_related_uris for all the uris i get from the extension which makes it another SQL call. I know its a bit expensive but I think its worth testing and playing with at least as an extension.

--- kamstrup 2010-11-17 10:41 ---
There is some access/open and close/leave confusion in your draft that took me a bit to figure out. I think what you need to do is on startup (or lazily when the first query arrives, to not hose login time) is to run UPDATE state SET leave_timestamp = -1 WHERE leave_timestamp IS NULL.

This still doesn't fix the case for long running sessions where you don't get leave events. I know people who have uptimes for ~1year. Your scheme can potentially drift very far off in that case. I'm still challenging your basic assumption "assume that we get open/close events from everything". You can prove me wrong of course :-)

WRT to sorting then you have a misguiding formulation in the draft. You don't mean "So when searching stuff can be sorted by relevancy to the current context of work.". You just mean that you can look stuff up which is relevant in the current context.

Also there is also the unsolved problem of where you actually get the Access/Leave events from. And libwnck is not the answer here if you want good results IMHO. You need support from the apps or toolkits or something.

But all that said - nothing is stopping you from implementing this as a extension an putting my scepticism to shame ;-)

--- seif 2010-11-17 12:34 ---
Sorry for the confusion.
So long running sessions might be an issue however fact is if u had something open at a specific timepoint then u cant change it. If this thing is relevant i another issue.

As for the misguiding formulation, there is a difference to what I am suggesting and what you are. I am talking about sorting results based on what you have open. You are talking about a dashboard if I understood correctly.

As for where I get the data from, lets start with "no wnck". I am relying on out current dataproviders who all do deliver us access/leave events. I know they are not much yet but I hope to have this solved in the near future where we will have all default ubuntu as well as GNOME apps covered with dataproviders.

I will continue working on this as an extension, maybe if I get it working nicely I could convince you to add it as a main extension.

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.