class for Factor or CatData

Registered by joep

Create a class to make working with labels for groups and dummy variables easier. should produce a exog matrix to be used in regression, for descriptive statistics and anova

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

What I wanted to do with the class for catdata or factor:

2 representations: label array (either (n by 1 or 1d) and dummy matrix (n by k)
labels are converted to integers starting from 0, and keep "original" labels around e.g. names of states

allow conversion between both representations

define 2 simple operations add (+) and multiply (*) and maybe power (**) (not sure about subtract, but not at first)

x1 = Factor(x1labelarray)
x2 = Factor(x2labelarray)

x = x1 + x2 + x1*x2 similar to formula, but I never looked carefully enough
or x=(1 + x1)*(1+x2)

there are more operations in SAS, e.g. for nested, but this would be the basic start.
But no fancy dict and name spaces and so on, just a simple class with datadummy and datalabel and some metainformation.

That's roughly what I wanted to do to make working with dummy variables or categorical data a bit more easy.

In pivot table ptable_1 and the groupstats and anova, I wanted to do the basic summary statistics and statistical analysis.

On the mailing list, I also wrote recently some recipes how the label array can be created from histogram and sorted data indices.

With label arrays and np.bincount it is very fast and easy for example to subtract the group means from each observation to remove the fixed effects.

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.