Should group be an optional input to e.g. mean?

Registered by kwgoodman

There are currently 3 group functions: group_ranking, group_mean, group_median. But there are many more we could add: group_zscore, group_sum, group_std, etc.

Instead of filling up the namespace, would it be better to get rid of the group methods and instead add group as an optional input to the corresponding non-group methods? For example, we could get rid of group_mean and the signature for mean could become:

larry.mean(self, axis=None, group=None)

One implementation detail is that the group methods do not currently allow an axis argument. That would have to be fixed.

Is this design better or worse?

Another possibility is to add group_reduce as an optional input. For example if y is N by T and there are G groups, then

larry.mean(self, axis=None, group=group, group_reduce=True)

would be G by T.

Groups are not a central feature of larry. So I don't know how I feel about cluttering up the signatures of a lot of larry methods with group options.

Another design alternative: Remove all group methods from larry. Convert the group methods to functions and place them in a module called group. So

la.group.mean(lar, group, axis=None, group_reduce=False)

see also: extend-group-methods

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

mean, demean, ... don't look so complicated that an additional keyword would be confusing

some statistical packages (which ?) have a "by" keyword, or groupby

eg.

lar.demean(axis=1, groupby=None)
lar.demean(axis=1, groupby=industrylar)

default groupby=None means no groups
groupby larray is checked for dimension, 1d or same as the larry (for other blueprint on enhancement for changing group membership)

--------

A generic function like

apply_along_groups(lar, attr, axis, group)

might help. The attr in your example would be 'demean'. Or

apply_along_groups(lar, lar.method, axis, group)

We could then use apply_along_groups for all methods that only take an axis. We could add a **kwargs input to pass to lar.method for methods, like movingsum, that take more input than just axis.

Would that work?
---
I think this should work and would be very flexible. Do you want it as substitute for a groupby keyword, or for the generic implementation?
I like it if the most common groupby methods are available directly e.g. mean, demean, maybe sum, count
I expect that some of the implementations for specific groupby methods can be made faster than a generic function.

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.