Extend group methods to optionally allow a group that doesn't need to broadcast

Registered by kwgoodman

The larry group methods (group_mean etc) currently take one input: a 1d larry called group. The group larry assigns each row to a group. When calculating the group mean of a 2d larry, for example, the group membership of each row cannot be changed across columns. Add the option to input a 2d larry where group membership can change. If group membership information is missing for a column (or one element) use NaN to fill. (Can this be generalized to arbitrary dimension?)

Let's say we have a 2d larry, y, and a 2d larry of groups, g. The extended group method would basically do:

for i in range(y.shape[0]):
    y[:,i] = y[:,i].group_mean(g[:,i])

where, for simplicity, I have assumed that the columns are already aligned. If g is 1d we would just do the normal broadcasting that group_mean already does.

Should we add axis as input?

Proof of concept:

import numpy as np

from la import larry

def getdata():
    y = larry(np.arange(20).reshape(5,4))
    group = larry([[1, 1, 3, 2, 2],
                   [2, 2, 1, 3, 3],
                   [1, 2, 3, 4, 5],
                   [1, 2, 1, 2, 1]]).T
    return y, group

def groups_mean(y, group):
    if group.ndim == 1:
        return y.group_mean(group)
    elif group.ndim == 2:
        if y.ndim == 2:
            z = y.copy() # NOTE: because of this line, int input will become int output
            for i, label in enumerate(y.getlabel(1)):
                idx = group.labelindex(label, axis=1)
                z[:,i] = y[:,i].group_mean(group[:,idx]).x
            return z
        else:
            raise ValueError, 'If group is 2d, then y must be 2d.'
    else:
        raise ValueError, 'group must be 1d or 2d.'
    raise RuntimeError, 'Dropped off the end of the function'

Output:

>> from group2d import getdata, groups_mean
>> y, group = getdata()
>> y

label_0
    0
    1
    2
    3
    4
label_1
    0
    1
    2
    3
x
array([[ 0, 1, 2, 3],
       [ 4, 5, 6, 7],
       [ 8, 9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])
>>
>> group

label_0
    0
    1
    2
    3
    4
label_1
    0
    1
    2
    3
x
array([[1, 2, 1, 1],
       [1, 2, 2, 2],
       [3, 1, 3, 1],
       [2, 3, 4, 2],
       [2, 3, 5, 1]])
>>
>>
>>
>> groups_mean(y, group)

label_0
    0
    1
    2
    3
    4
label_1
    0
    1
    2
    3
x
array([[ 2, 3, 2, 11],
       [ 2, 3, 6, 11],
       [ 8, 9, 10, 11],
       [14, 15, 14, 11],
       [14, 15, 18, 11]])

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
milestone icon 0.2

Related branches

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.