statsmodels

class for Factor or CatData

Registered by joep on 2009-08-28

Create a class to make working with labels for groups and dummy variables easier. should produce a exog matrix to be used in regression, for descriptive statistics and anova

Read the full specification

Blueprint information

Status:: Not started

Approver:: None

Priority:: Undefined

Drafter:: None

Direction:: Needs approval

Assignee:: None

Definition:: New

Series goal:: None

Implementation:: Unknown

Milestone target:: None

Related branches

Related bugs

Sprints

Whiteboard

What I wanted to do with the class for catdata or factor:

2 representations: label array (either (n by 1 or 1d) and dummy matrix (n by k)
labels are converted to integers starting from 0, and keep "original" labels around e.g. names of states

allow conversion between both representations

define 2 simple operations add (+) and multiply (*) and maybe power (**) (not sure about subtract, but not at first)

x1 = Factor(x1labelarray)
x2 = Factor(x2labelarray)

x = x1 + x2 + x1*x2 similar to formula, but I never looked carefully enough
or x=(1 + x1)*(1+x2)

there are more operations in SAS, e.g. for nested, but this would be the basic start.
But no fancy dict and name spaces and so on, just a simple class with datadummy and datalabel and some metainformation.

That's roughly what I wanted to do to make working with dummy variables or categorical data a bit more easy.

In pivot table ptable_1 and the groupstats and anova, I wanted to do the basic summary statistics and statistical analysis.

On the mailing list, I also wrote recently some recipes how the label array can be created from histogram and sorted data indices.

With label arrays and np.bincount it is very fast and easy for example to subtract the group means from each observation to remove the fixed effects.

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.